Please use this identifier to cite or link to this item:
Authors: Gupta, Abhijeet
Title: Distributional analysis of entities
Issue Date: 2022 Dissertation xxix, 227
Abstract: Arguably, one of the most important aspects of natural language processing is natural language understanding which relies heavily on lexical knowledge. In computational linguistics, modelling lexical knowledge through distributional semantics has gained considerable popularity. However, the modelling is largely restricted to generic lexical categories (typically common nouns, adjectives, etc.) which are associated with coarse-grained information i.e., the category country has a boundary, rivers and gold deposits. Comparatively, less attention has been paid towards modelling entities which, on the other hand, are associated with fine-grained real-world information, for instance: the entity Germany has precise properties such as, (GDP - 3.6 trillion Euros), (GDP per capita - 44.5 thousand Euros) and (Continent - Europe). The lack of focus on entities and the inherent latency of information in distributional representations warrants greater efforts towards modelling entity related phenomena and, increasing the understanding about the information encoded within distributional representations. This work makes two contributions in that direction: (a) We introduce a semantic relation – Instantiation, a relation between entities and their categories, and distributionally model it to investigate the hypothesis that distributional distinctions do exist in modelling entities versus modelling categories within a semantic space. Our results show that in a semantic space: 1) entities and categories are quite distinct with respect to their distributional behaviour, geometry and linguistic properties; 2) Instantiation relation is recoverable by distributional models; and, 3) for lexical relational modelling purposes, categories are better represented by the centroids of their entities instead of their distributional representations constructed directly from corpora. (b) We also investigate the potential and limitations of distributional semantics for the purpose of Knowledge Base Completion, starting with the hypothesis that fine-grained knowledge is encoded in distributional representations of entities during their meaning construction. We show that: 1) fine-grained information of entities is encoded in distributional representations and can be extracted by simple data-driven supervised models as attribute-value pairs; 2) the models can predict the entire range of fine-grained attributes, as seen in a knowledge base, in one go; and, 3) a crucial factor in determining success in extracting this type of information is contextual support i.e., the extent of contextual information captured by a distributional model during meaning construction. Overall, this thesis takes a step towards increasing the understanding about entity meaning representations in a distributional setup, with respect to their modelling and the extent of knowledge inclusion during their meaning construction.
Appears in Collections:05 Fakultät Informatik, Elektrotechnik und Informationstechnik

Files in This Item:
File Description SizeFormat 
GuptaAbhijeet_PhD.pdf1,56 MBAdobe PDFView/Open

Items in OPUS are protected by copyright, with all rights reserved, unless otherwise indicated.