Thesis
Automatic Extraction of Lexical Functions
Autor
M. en C. KOLESNIKOVA, OLGA
Institución
Resumen
Lexical function is a concept which formalizes semantic and syntactic relations between lexical
units. Relations between words are a vital part of any natural language system. Meaning of an
individual word largely depends on various relations connecting it to other words in context.
Collocational relation is a type of institutionalized lexical relations which holds between the base
and its partner in a collocation (examples of collocations: gives a lecture, make a decision, lend
support where the bases are lecture, decision, support and the partners, termed collocates, are
give, make, lend). Collocations are opposed to free word combination where both words are used
in their typical meaning (for example, give a book, make a dress, lend money). Knowledge of
collocation is important for natural language processing because collocation comprises the
restrictions on how words can be used together. There are many methods to extract collocations
automatically but their result is a plain list of collocations. Such lists are more valuable if
collocations are tagged with semantic and grammatical information. The formalism of lexical
functions is a means of representing such information. If collocations are annotated with lexical
functions in a computer readable dictionary, it will allow effective use of collocations in natural
language applications including parsers, high quality machine translation, periphrasis system and
computer-aided learning of lexica. In order to create such applications, we need to extract lexical
functions from corpora automatically.
It is our intent to extract Spanish verb-noun collocations belonging to a given lexical function
from corpora. To achieve this task, it has been proposed to represent the lexical meaning of a
given word with a set of all its hyperonyms and to use machine learning techniques for
predicting lexical functions as values of the class variable for unseen collocations. Hyperonyms
are extracted from the Spanish WordNet. We evaluate many machine learning algorithms on the
training set and on an independent test set. The obtained results show that machine learning is
feasible to achieve the task of automatic detection of lexical functions.