alwaysaditi's picture
End of training
dc78b20 verified
semantic knowledge for particular domains isincreasingly important in nlp. many applications such as word-sense disambiguation, in formation extraction and speech recognitionall require lexicons. the coverage of handbuilt lexical resources such as wordnet (fellbaum, 1998) has increased dramatically in re cent years, but leaves several problems andchallenges. coverage is poor in many criti cal, rapidly changing domains such as current affairs, medicine and technology, where much time is still spent by human experts employed to recognise and classify new terms. mostlanguages remain poorly covered in compari son with english. hand-built lexical resourceswhich cannot be automatically updated can of ten be simply misleading. for example, using wordnet to recognise that the word apple refers to a fruit or a tree is a grave error in the many situations where this word refers to a computer manufacturer, a sense which wordnet does notcover. for nlp to reach a wider class of appli cations in practice, the ability to assemble andupdate appropriate semantic knowledge auto matically will be vital. this paper describes a method for arranging semantic information into a graph (bolloba?s, 1998), where the nodes are words and the edges(also called links) represent relationships be tween words. the paper is arranged as follows. section 2 reviews previous work on semanticsimilarity and lexical acquisition. section 3 de scribes how the graph model was built from the pos-tagged british national corpus. section 4 describes a new incremental algorithm used to build categories of words step by step from thegraph model. section 5 demonstrates this algo rithm in action and evaluates the results againstwordnet classes, obtaining state-of-the-art re sults. section 6 describes how the graph modelcan be used to recognise when words are polysemous and to obtain groups of words represen tative of the different senses.semantic knowledge for particular domains isincreasingly important in nlp. section 6 describes how the graph modelcan be used to recognise when words are polysemous and to obtain groups of words represen tative of the different senses. so far we have presented a graph model built upon noun co-occurrence which performs much better than previously reported methods at the task of automatic lexical acquisition. 2 1http://infomap.stanford.edu/graphs 2http://muchmore.dfki.defigure 1: automatically generated graph show ing the word apple and semantically related nouns this isan important task, because assembling and tuning lexicons for specific nlp systems is increas ingly necessary. many applications such as word-sense disambiguation, in formation extraction and speech recognitionall require lexicons. section 5 demonstrates this algo rithm in action and evaluates the results againstwordnet classes, obtaining state-of-the-art re sults. this research was supported in part by theresearch collaboration between the ntt communication science laboratories, nippon tele graph and telephone corporation and csli,stanford university, and by ec/nsf grant ist 1999-11438 for the muchmore project. acknowledgements the authors would like to thank the anonymous reviewers whose comments were a great help inmaking this paper more focussed: any short comings remain entirely our own responsibility. we now take a step furtherand present a simple method for not only as sembling words with similar meanings, but for empirically recognising when a word has several meanings.