alwaysaditi's picture
End of training
dc78b20 verified
automatic acquisition of domain knowledge for information extraction in developing an information extraction (ie) system for a new class of events or relations, one of the major tasks is identifying the many ways in which these events or relations may be expressed in text. this has generally involved the manual analysis and, in some cases, the annotation of large quantities of text involving these events. this paper presents an alternative approach, based on an automatic discovery procedure, exdisco, which identifies a set; of relevant documents and a set of event patterns from un-annotated text, starting from a small set of "seed patterns". we evaluate exdisco by comparing the performance of discovered patterns against that of manually constructed systems on actual extraction tasks. we propose an algorithm for learning extraction patterns for a small number of examples which greatly reduced the burden on the application developer and reduced the knowledge acquisition bottleneck. we choose an approach motivated by the assumption that documents containing a large number of patterns already identified as relevant to a particular ie scenario are likely to contain further relevant patterns. exdisco uses a bootstrapping mechanism to find new extraction patterns using unannotated texts and some seed patterns as the initial input.