alwaysaditi's picture
End of training
dc78b20 verified
semantic parsing of sentences is believed to be animportant task toward natural language understand ing, and has immediate applications in tasks such information extraction and question answering. we study semantic role labeling(srl). for each verb in a sentence, the goal is to identify all constituents that fill a semantic role, and to determine their roles,such as agent, patient or instrument, and their ad juncts, such as locative, temporal or manner. the propbank project (kingsbury and palmer, 2002) provides a large human-annotated corpus of semantic verb-argument relations. specifically, we use the data provided in the conll-2004 shared task of semantic-role labeling (carreras and ma`rquez, 2003) which consists of a portion of thepropbank corpus, allowing us to compare the per formance of our approach with other systems. previous approaches to the srl task have madeuse of a full syntactic parse of the sentence in or der to define argument boundaries and to determine the role labels (gildea and palmer, 2002; chen and rambow, 2003; gildea and hockenmaier, 2003;pradhan et al, 2003; pradhan et al, 2004; sur deanu et al, 2003). in this work, following the conll-2004 shared task definition, we assume thatthe srl system takes as input only partial syn tactic information, and no external lexico-semantic knowledge bases. specifically, we assume as input resources a part-of-speech tagger, a shallow parser that can process the input to the level of basedchunks and clauses (tjong kim sang and buch holz, 2000; tjong kim sang and de?jean, 2001), and a named-entity recognizer (tjong kim sang and de meulder, 2003). we do not assume a full parse as input. srl is a difficult task, and one cannot expecthigh levels of performance from either purely man ual classifiers or purely learned classifiers. rather, supplemental linguistic information must be used to support and correct a learning system. so far,machine learning approaches to srl have incorpo rated linguistic information only implicitly, via theclassifiers? features. the key innovation in our ap proach is the development of a principled method tocombine machine learning techniques with linguistic and structural constraints by explicitly incorpo rating inference into the decision process. in the machine learning part, the system we present here is composed of two phases. first, a set of argument candidates is produced using twolearned classifiers?one to discover beginning po sitions and one to discover end positions of each argument type. hopefully, this phase discovers a small superset of all arguments in the sentence (foreach verb). in a second learning phase, the candi date arguments from the first phase are re-scored using a classifier designed to determine argument type, given a candidate argument.unfortunately, it is difficult to utilize global prop erties of the sentence into the learning phases.however, the inference level it is possible to incorporate the fact that the set of possible rolelabelings is restricted by both structural and lin guistic constraints?for example, arguments cannotstructurally overlap, or, given a predicate, some ar gument structures are illegal. the overall decision problem must produce an outcome that consistent with these constraints. we encode the constraints aslinear inequalities, and use integer linear programming(ilp) as an inference procedure to make a final decision that is both consistent with the con straints and most likely according to the learningsystem. although ilp is generally a computationally hard problem, there are efficient implementations that can run on thousands of variables and constraints. in our experiments, we used the commer cial ilp package (xpress-mp, 2003), and were able to process roughly twenty sentences per second.in our experiments, we used the commer cial ilp package (xpress-mp, 2003), and were able to process roughly twenty sentences per second. semantic parsing of sentences is believed to be animportant task toward natural language understand ing, and has immediate applications in tasks such information extraction and question answering. we study semantic role labeling(srl). although ilp is generally a computationally hard problem, there are efficient implementations that can run on thousands of variables and constraints. as more constraints are considered, we ex pect the overall performance to improve. see the details of the definition in kingsbury and palmer (2002) and carreras and ma`rquez (2003). we show that linguistic information is useful for se mantic role labeling, both in extracting features and dist. prec. the goal of the semantic-role labeling task is to dis cover the verb-argument structure for a given input sentence. we encode the constraints aslinear inequalities, and use integer linear programming(ilp) as an inference procedure to make a final decision that is both consistent with the con straints and most likely according to the learningsystem. for each verb in a sentence, the goal is to identify all constituents that fill a semantic role, and to determine their roles,such as agent, patient or instrument, and their ad juncts, such as locative, temporal or manner.