Please use this identifier to cite or link to this item: http://hdl.handle.net/1880/46172
Title: SOURCE MODELS FOR NATURAL LANGUAGE
Authors: Bell, Timothy C.
Witten, Ian H.
Keywords: Computer Science
Issue Date: 1-Oct-1988
Abstract: A model of natural language is a collection of information that approximates the statistics and structure of the language being modeled. The purpose of the model may be to give insight into rules which govern how text is generated, or to predict properties of future samples of the language. This paper studies models of natural language from three different, but related, viewpoints. First, we examine the statistical regularities that are found empirically, based on the natural units of words and letters. Second, we study theoretical models of language, including simple random generative models of letters and words whose output, like genuine natural language, obeys Zipf's law. Innovation in text is also considered by modeling the appearance of previously unseen words as a Poisson process. Finally, we review experiments that estimate the information content inherent in natural text.
URI: http://hdl.handle.net/1880/46172
Appears in Collections:Witten, Ian

Files in This Item:
File Description SizeFormat 
1988-326-38.pdf5.39 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.