Building and Evaluating Vietnamese Language Models

Please use this identifier to cite or link to this item: http://192.168.1.231:8080/dulieusoDIGITAL_123456789/6217

Title:	Building and Evaluating Vietnamese Language Models
Authors:	Cao Van Viet
Issue Date:	2020
Publisher:	Đại học Quốc gia Hà Nội
Abstract:	A language model assigns a probability to a sequence of words. It is useful for many Natural Language Processing (NLP) tasks such as machine translation, spelling, speech recognition, optical character recognition, parsing, and information retrieval. For Vietnamese, although several studies have used language models in some NLP systems, there is no independent study of language modeling for Vietnamese on both experimental and theoretical aspects. In this paper we will experimently investigate various Language Models (LMs) for Vietnamese, which are based on different smoothing techniques, including Laplace, Witten-Bell, Good-Turing, Interpolation Kneser-Ney, and Back-off Kneser-Ney.These models will be experimental evaluated through a large corpus of texts. For evaluating these language models through an application we will build a statistical machine translation system translating from English to Vietnamese. In the experiment we use about 255 Mb of texts for building language models, and use more than 60,000 parallel sentence pairs of English-Vietnamese for building the machine translation system.
URI:	http://192.168.1.231:8080/dulieusoDIGITAL_123456789/6217
Appears in Collections:	Các chuyên ngành khác

Files in This Item:

File	Description	Size	Format
1522-1-2978-1-10-20160726.pdf		233.39 kB	Adobe PDF	View/Open