Abstract:
This thesis presents a strategy for the acquisition of thematic role relations (such as AGENT, PATIENT, or INSTRUMENT) by means of statistical corpus analysis, for the purpose of semi-automatically extending lexical-semantic nets. In particular, this work focuses on resources in the style of WordNet (Fellbaum 1998) and EuroWordNet (Vossen 1999). Lexical-semantic nets represent the meanings of words via semantic relations between words and/or word concepts. Semantic (thematic) role relations are conceptual relations which hold between verbs and their nominal arguments (e.g. <eat>--AGENT--<human> or <eat>--PATIENT--<food>). Such relations capture selectional restrictions of verbs. Therefore, the task of acquiring thematic role relations is intrinsically related to the task of acquiring selectional restrictions.
Consequently, the core of a strategy for learning role relations consists in a method for learning selectional restrictions (or, more precisely, selectional preferences). For the latter task, a number of methods have been proposed which utilise syntactically analysed corpora and WordNet. To acquire the selectional preferences of a certain verb for a certain argument, the respective complement nouns of that verb are extracted from the corpus, and statistical methods are applied to generalise over these nouns; these generalisations are expressed as a set of WordNet noun concepts. One of these approaches, namely the method proposed by (Abe & Li 1996), constitutes the starting point of my research. However, this approach is not immediately applicable for learning role relations, but requires modifications and extensions for that task. In particular, two aspects have to be taken into account. Firstly, it is crucial that the WordNet concepts acquired to represent selectional preferences of a verb are located at an appropriate level of generalisation (e.g. <food> as PATIENT of <eat>, rather than <cake> or <physical_object>). I develop a modification of the approach which substantially improves its performance in this respect. Secondly, as the existing methods generalise over syntactic complements, they acquire selectional preferences for syntactic rather than semantic arguments. To learn selectional preferences for semantic roles, the syntactic arguments provided by the parsed corpus have to be linked to their underlying roles so that the statistical learning method can be applied to generalise, for example, over all (semantic) Agents of the examined verb rather than over all its (syntactic) subjects. Therefore, I develop a method for linking syntactic to semantic arguments. A further aspect of the overall strategy I present is an appropriate method for mapping the verbs and nouns in the training data to the corresponding WordNet concepts, which is a prerequisite for applying the preference acquisition algorithm.
To evaluate the role acquisition approach developed in this thesis, I extract a gold standard from the EuroWordNet database and propose detailed evaluation criteria. Overall, the evaluation results (accuracy rates of up to 84%) show that the approach works effectively.