alwaysaditi's picture
End of training
dc78b20 verified
language models (lm) are applied in many natural language processing applications, such as speech recognition and machine translation, to encapsulate syntactic, semantic and pragmatic information. for systems which learn from given data we frequently observe a severe drop in performance when moving to a new genre or new domain. in speech recognition a number of adaptation techniques have been developed to cope with this situation. in statistical machine translation we have a similar situation, i.e. estimate the model parameter from some data, and use the system to translate sentences which may not be well covered by the training data. therefore, the potential of adaptation techniques needs to be explored for machine translation applications. statistical machine translation is based on the noisy channel model, where the translation hypothesis is searched over the space defined by a translation model and a target language (brown et al, 1993). statistical machine translation can be formulated as follows: )()|(maxarg)|(maxarg* tptspstpt tt ?== where t is the target sentence, and s is the source sentence. p(t) is the target language model and p(s|t) is the translation model. the argmax operation is the search, which is done by the decoder. in the current study we modify the target language model p(t), to represent the test data better, and thereby improve the translation quality. (janiszek, et al 2001) list the following approaches to language model adaptation: ? linear interpolation of a general and a domain specific model (seymore, rosenfeld, 1997). back off of domain specific probabilities with those of a specific model (besling, meier, 1995). retrieval of documents pertinent to the new domain and training a language model on-line with those data (iyer, ostendorf, 1999, mahajan et. al. 1999). maximum entropy, minimum discrimination adaptation (chen, et. al., 1998). adaptation by linear transformation of vectors of bigram counts in a reduced space (demori, federico, 1999). smoothing and adaptation in a dual space via latent semantic analysis, modeling long-term semantic dependencies, and trigger combinations. (j. bellegarda, 2000). our approach can be characterized as unsupervised data augmentation by retrieval of relevant documents from large monolingual corpora, and interpolation of the specific language model, build from the retrieved data, with a background language model. to be more specific, the following steps are carried out to do the language model adaptation. first, a baseline statistical machine translation system, using a large general language model, is applied to generate initial translations. then these translations hypotheses are reformulated as queries to retrieve similar sentences from a very large text collection. a small domain specific language model is build using the retrieved sentences and linearly interpolated with the background language model. this new interpolated language model in applied in a second decoding run to produce the final translations. there are a number of interesting questions pertaining to this approach: ? which information can and should used to generate the queries: the first-best translation only, or also translation alternatives. how should we construct the queries, just as simple bag-of-words, or can we incorporate more structure to make them more powerful. how many documents should be retrieved to build the specific language models, and on what granularity should this be done, i.e. what is a document in the information retrieval process. the paper is structured as follows: section 2 outlines the sentence retrieval approach, and three bag-of-words query models are designed and explored; structured query models are introduced in section 3. in section 4 we present translation experiments are presented for the different query. finally, summary is given in section 5.in this paper, we studied language model adaptation for statistical machine translation. this might be especially useful for structured query models generated from the translation lattices. finally, summary is given in section 5. language models (lm) are applied in many natural language processing applications, such as speech recognition and machine translation, to encapsulate syntactic, semantic and pragmatic information. in section 4 we present translation experiments are presented for the different query. for systems which learn from given data we frequently observe a severe drop in performance when moving to a new genre or new domain. the paper is structured as follows: section 2 outlines the sentence retrieval approach, and three bag-of-words query models are designed and explored; structured query models are introduced in section 3. in speech recognition a number of adaptation techniques have been developed to cope with this situation. our language model adaptation is an unsupervised data augmentation approach guided by query models. on the other side the oracle experiment also shows that the optimally expected improvement is limited by the translation model and decoding algorithm used in the current smt system. this also means tmq is subject to more noise. experiments are carried out on a standard statistical machine translation task defined in the nist evaluation in june 2002.