THE ZERO FREQUENCY PROBLEM: ESTIMATING THE PROBABILITIES OF NOVEL EVENTS IN ADAPTIVE TEXT COMPRESSION

Date
1989-04-01
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The zero-frequency problem is the problem of estimating the likelihood of a novel event occurring. It is important in adaptive statistical text compression because it is almost always necessary to reserve a small part of the code space for the unexpected (say, the appearance of a new word); the alternative of allocating code space to every possible event (say, a code for each ASCII character) invariably impairs coding efficiency since not all possible events actually occur. This paper reviews approaches that have been taken to the problem in adaptive text compression. Although several methods have been used, their suitability has been based on empirical evaluation rather than a well-founded model. We propose the application of a Poisson process model of novelty. Its ability to predict novel tokens is evaluated, and it consistently outperforms existing methods. It is also applied to a practical statistical coding scheme, where a slight modification is required to avoid divergence. The result is a well-founded zero-frequency model that explains observed differences in the performance of existing methods, and offers a small improvement in the coding efficiency of text compression over the best method previously known.
Description
Keywords
Computer Science
Citation