A model of natural language is a collection of information that
approximates the statistics and structure of the language being modeled.
The purpose of the model may be to give insight into rules which govern
how text is generated, or to predict properties of future samples of
the language. This paper studies models of natural language from
three different, but related, viewpoints. First, we examine the
statistical regularities that are found empirically, based on the
natural units of words and letters. Second, we study theoretical
models of language, including simple random generative models of
letters and words whose output, like genuine natural language, obeys
Zipf's law. Innovation in text is also considered by modeling the
appearance of previously unseen words as a Poisson process.
Finally, we review experiments that estimate the information content
inherent in natural text.
We are currently acquiring citations for the work deposited into this collection. We recognize the distribution rights of this item may have been assigned to another entity, other than the author(s) of the work.If you can provide the citation for this work or you think you own the distribution rights to this work please contact the Institutional Repository Administrator at email@example.com