GREEDY MACRO TEXT COMPRESSION
Date
1987-12-01
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Text compression schemes can be divided into two classes:
statistical, where symbols are assigned codes based on
their probabilities, and macro, where groups of consecutive
symbols (phrases) are replaced by indexes into some dictionary. A
subset of macro coding called Greedy Macro (GM) accounts for
the vast majority of macro schemes in the literature, including all
variations of the popular Ziv-Lempel method. At each coding step,
GM schemes encode as many symbols as possible with a single index
to the dictionary. Although this parsing strategy is not optimal,
no optimal macro scheme can be implemented with a bounded coding
delay.
This paper defines GM coding and establishes an algorithm which takes
any such scheme and constructs a statistical coding method that achieves
exactly the same compression rate. Thus GM schemes can never achieve better
compression than statistical ones. The conclusion is that research
aimed at increasing compression should concentrate on statistical methods,
leaving macro schemes for applications in which compression efficiency
can be sacrificed for speed and modest memory requirements.
Description
Keywords
Computer Science