Abstract: | A language model assigns a probability to a sequence of words. It is useful for many Natural Language Processing (NLP) tasks such as machine translation, spelling, speech recognition, optical character recognition, parsing, and information retrieval. For Vietnamese, although several studies have used language models in some NLP systems, there is no independent study of language modeling for Vietnamese on both experimental and theoretical aspects. In this paper we will experimently investigate various Language Models (LMs) for Vietnamese, which are based on different smoothing techniques, including Laplace, Witten-Bell, Good-Turing, Interpolation Kneser-Ney, and Back-off Kneser-Ney.These models will be experimental evaluated through a large corpus of texts. For evaluating these language models through an application we will build a statistical machine translation system translating from English to Vietnamese. In the experiment we use about 255 Mb of texts for building language models, and use more than 60,000 parallel sentence pairs of English-Vietnamese for building the machine translation system. |