large context-free grammars extracted from tree banks achieve high coverage and accuracy, but they are difficult to parse with because of their massive ambiguity. the application of standard chart-parsing techniques often fails due to excessive memory and runtime requirements.treebank grammars are mostly used as probabilis tic grammars and users are usually only interested in the best analysis, the viterbi parse. to speed up viterbi parsing, sophisticated search strategies havebeen developed which find the most probable anal ysis without examining the whole set of possible analyses (charniak et al, 1998; klein and manning,2003a). these methods reduce the number of gener ated edges, but increase the amount of time needed for each edge. the parser described in this paper follows a contrary approach: instead of reducing the number of edges, it minimises the costs of building edges in terms of memory and runtime.the new parser, called bitpar, is based on a bit vector implementation (cf. (graham et al, 1980)) of the well-known cocke-younger-kasami (cky) algorithm (kasami, 1965; younger, 1967). it buildsa compact ?parse forest? representation of all anal yses in two steps. in the first step, a cky-style recogniser fills the chart with constituents. in the second step, the parse forest is built top-down from the chart. viterbi parses are computed in four steps. again, the first step is a cky recogniser which is followed by a top-down filtering of the chart, the bottom-up computation of the viterbi probabilities, and the top-down extraction of the best parse.the rest of the paper is organised as follows: sec tion 2 explains the transformation of the grammar to chomsky normal form. the following sectionsdescribe the recogniser algorithm (sec. 3), improvements of the recogniser by means of bit-vector op erations (sec. 4), and the generation of parse forests(sec. 5), and viterbi parses (sec. 6). section 7 discusses the advantages of the new architecture, sec tion 8 describes experimental results, and section 9 summarises the paper.(the rule a section 7 discusses the advantages of the new architecture, sec tion 8 describes experimental results, and section 9 summarises the paper. the cky algorithm requires a grammar in chom sky normal form where the right-hand side of eachrule either consists of two non-terminals or a single terminal symbol. large context-free grammars extracted from tree banks achieve high coverage and accuracy, but they are difficult to parse with because of their massive ambiguity. 5), and viterbi parses (sec. the application of standard chart-parsing techniques often fails due to excessive memory and runtime requirements.treebank grammars are mostly used as probabilis tic grammars and users are usually only interested in the best analysis, the viterbi parse. boring symbols on the right-hand sides of rules. bitpar uses a modified ver sion of the cky algorithm allowing also chain rules (rules with a single non-terminal on the right-handside). 4), and the generation of parse forests(sec. to speed up viterbi parsing, sophisticated search strategies havebeen developed which find the most probable anal ysis without examining the whole set of possible analyses (charniak et al, 1998; klein and manning,2003a). 3), improvements of the recogniser by means of bit-vector op erations (sec. these methods reduce the number of gener ated edges, but increase the amount of time needed for each edge. |