Skip navigation
Please use this identifier to cite or link to this item: http://192.168.1.231:8080/dulieusoDIGITAL_123456789/6217
Title: Building and Evaluating Vietnamese Language Models
Authors: Cao Van Viet
Issue Date: 2020
Publisher: Đại học Quốc gia Hà Nội
Abstract: A language model assigns a probability to a sequence of words. It is useful for many Natural Language Processing (NLP) tasks such as machine translation, spelling, speech recognition, optical character recognition, parsing, and information retrieval. For Vietnamese, although several studies have used language models in some NLP systems, there is no independent study of language modeling for Vietnamese on both experimental and theoretical aspects. In this paper we will experimently investigate various Language Models (LMs) for Vietnamese, which are based on different smoothing techniques, including Laplace, Witten-Bell, Good-Turing, Interpolation Kneser-Ney, and Back-off Kneser-Ney.These models will be experimental evaluated through a large corpus of texts. For evaluating these language models through an application we will build a statistical machine translation system translating from English to Vietnamese. In the experiment we use about 255 Mb of texts for building language models, and use more than 60,000 parallel sentence pairs of English-Vietnamese for building the machine translation system.
URI: http://192.168.1.231:8080/dulieusoDIGITAL_123456789/6217
Appears in Collections:Các chuyên ngành khác

Files in This Item:
File Description SizeFormat 
1522-1-2978-1-10-20160726.pdf233.39 kBAdobe PDFView/Open
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.