Please use this identifier to cite or link to this item: http://hdl.handle.net/1880/46158
Title: GREEDY MACRO TEXT COMPRESSION
Authors: Witten, Ian H.
Bell, Timothy
Keywords: Computer Science
Issue Date: 1-Dec-1987
Abstract: Text compression schemes can be divided into two classes: statistical, where symbols are assigned codes based on their probabilities, and macro, where groups of consecutive symbols (phrases) are replaced by indexes into some dictionary. A subset of macro coding called Greedy Macro (GM) accounts for the vast majority of macro schemes in the literature, including all variations of the popular Ziv-Lempel method. At each coding step, GM schemes encode as many symbols as possible with a single index to the dictionary. Although this parsing strategy is not optimal, no optimal macro scheme can be implemented with a bounded coding delay. This paper defines GM coding and establishes an algorithm which takes any such scheme and constructs a statistical coding method that achieves exactly the same compression rate. Thus GM schemes can never achieve better compression than statistical ones. The conclusion is that research aimed at increasing compression should concentrate on statistical methods, leaving macro schemes for applications in which compression efficiency can be sacrificed for speed and modest memory requirements.
URI: http://hdl.handle.net/1880/46158
Appears in Collections:Witten, Ian

Files in This Item:
File Description SizeFormat 
1987-285-33.pdf2.56 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.