In this brief survey, we shall discuss ten milestones in the history of source coding theory. These are as follows:

tex2html_wrap_inline49 Commencement of noiseless source coding theory (1948)

tex2html_wrap_inline49 Discovery of the Huffman algorithm (1952)

tex2html_wrap_inline49 Establishment of the Shannon-McMillan Theorem (1953)

tex2html_wrap_inline49 Discovery of the Lloyd algorithm (1957)

tex2html_wrap_inline49 Systemization of rate distortion theory (1959)

tex2html_wrap_inline49 Birth of the Kolmogorov complexity concept (1964)

tex2html_wrap_inline49 Systemization of universal source coding theory (1973)

tex2html_wrap_inline49 Commencement of multiterminal source coding theory (1973)

tex2html_wrap_inline49 First practical arithmetic coding schemes (1976)

tex2html_wrap_inline49 Discovery of Lempel-Ziv codes (1977)

This ``top ten'' list reflects the subjective opinions of the author, and there may be some who feel that other important events have been slighted. The reader is referred to [11] for a more complete survey of source coding theory.

Commencement of noiseless source coding theory. Claude Shannon initiated this subject (along with other subjects in the foundations of information theory) in the ground-breaking paper [21]. The key concept of entropy was formulated here. Shannon then gave the entropy concept operational significance by proving the first noiseless source coding theorem. Shannon's theorem ([21], Theorem 9) states that the entropy rate of a stationary ergodic finite-alphabet Markov source is the optimum rate at which the source can be encoded using ``almost noiseless'' fixed-rate block codes or using noiseless variable-rate block codes. To prove the fixed-rate half of the theorem, Shannon constructed fixed-rate block codes using an auxiliary theorem which later became known as the Shannon-McMillan Theorem (see below). In the proof of the variable-rate half of the theorem, Shannon devised a noiseless variable-rate block code (nowadays called a Shannon-Fano code) which encodes a block with probability of occurence p into a codeword of length within one bit of -log tex2html_wrap_inline71 . Today, we know that Shannon's noiseless source coding theorem holds more generally for ergodic asymptotically mean stationary source models. The concept of entropy has also proved useful in ergodic theory; we point out Donald Ornstein's celebrated result [17] that two stationary ergodic processes with the same entropy rate are isomorphic via stationary codes.

Discovery of the Huffman algorithm. A problem left open by Shannon in his 1948 paper was the determination of a noiseless prefix code for a random variable taking finitely many values which is optimal in the sense of minimizing the expected codeword length. (The Shannon-Fano code which he introduced in [21] may fail to be optimal.) In 1952, David Huffman published the algorithm, which bears his name, that determines at least one optimal code [9]. Quite a number of papers have appeared since 1952 related to the Huffman algorithm and modifications thereof; one of the most important of these is the paper by Robert Gallager on adaptive Huffman coding [6].

Establishment of the Shannon-McMillan Theorem. Roughly speaking, the Shannon-McMillan Theorem states that for a stationary ergodic finite-alphabet source, there is a set of source strings of length n with probability close to one in which each string has about the same probability of occurence, provided n is sufficiently large. This result was proved in 1953 by Brockway McMillan [16], after Shannon had earlier established a special case of this result for a Markov source ([21], Theorem 3). Extensions and generalizations of the Shannon-McMillan Theorem were later obtained by Breiman, Perez, Moy, Orey, Barron, Algoet-Cover, and others. The Shannon-McMillan Theorem is sometimes referred to as the fundamental theorem of information theory, because of its use as a tool to prove both channel and source coding theorems. This theorem is important in other fields, also (such as ergodic theory, statistics, and statistical mechanics).

Discovery of the Lloyd algorithm. An N-level vector quantizer for k-dimensional blocks of real-valued source samples is designed by determining the N codevectors in k-dimensional Euclidean space into which the source blocks are to be quantized. The goal in vector quantizer design is to find a vector quantizer for which the expected squared-error quantization noise achieves a value close to the minimum. Stuart Lloyd, in an unpublished technical report written in 1957 (eventually published in 1982 [14]) proposed an algorithm for accomplishing this goal. (Although Lloyd stated his algorithm for the scalar quantization case in which k = 1, it trivially extends to the k > 1 case.) The Lloyd algorithm starts with an initial quantizer and modifies it through a sequence of iterations--on each iteration, the codevectors from the preceding iteration are replaced with the centroids of the nearest neighbor regions corresponding to these codevectors. As long as successive iterations of the Lloyd algorithm continue to generate new quantizers, the quantization noise strictly decreases. (This is because of the Lloyd-Max necessary conditions for an optimal quantizer, which appear to have been first discovered not by Lloyd or Max but by the Polish mathematicians Lukaszewicz and Steinhaus [15].) In the scalar quantization case, if the density function of the source samples is log-concave and has finite variance, it is known that the N-level quantizers generated by iterates of the Lloyd algorithm have asymptotically optimal performance, independently of which N-level quantizer is chosen at the start of the iteration process. Unfortunately, the Lloyd algorithm is sensitive to the choice of initial quantizer if k > 1: for some choices, the quantization noise may not decrease to the minimum value. Recent research (cited in [11]) has focused on modifications of the Lloyd algorithm through simulated annealing or stochastic relaxation techniques that avoid this problem. It should be mentioned that there have been other significant contributions to the area of quantization in addition to the Lloyd algorithm--there have been numerous advances in high rate quantization theory and in lattice quantization, for example. The reader is referred to the recent text [7] for a thorough discussion of quantizer theory and design.

Systemization of rate distortion theory. In 1948, Shannon defined the concept of rate distortion function and then sketched the proof of a lossy source coding theorem [21, Theorem 21]. However, it was the year 1959 that saw the formulation of rate distortion theory as we know it today--that year marked the appearance of Shannon's paper [22] in which he laid out basic tenets of rate distortion theory, along with a rigorous proof of a lossy source coding theorem. Shannon's source coding theorem ([22], Theorems 1 and 2) states that the rate distortion function for a discrete memoryless source, evaluated at a distortion level D, gives the optimum rate in bits per source sample at which the source can be encoded by fixed-rate block codes, provided that the resulting distortion in the recovery of the source output must be less than or equal to D. Independently, R. L. Dobrushin [5] also published a lossy source coding theorem in 1959, which is analogous to Shannon's. In the 1960's, Gallager and Berger extended Shannon's coding theorem to the case of stationary ergodic sources. However, the optimum rate for fixed-rate block coding of a non-ergodic stationary source is not given by the rate distortion function [8].

Birth of the Kolmogorov complexity concept. Independently and simultaneously, Kolmogorov [12], Solomonoff [24], and Chaitin [2] proposed the notion of string complexity which has since come to be known as Kolmogorov complexity. The Kolmogorov complexity of a finite string is the length in bits of the shortest program which when run on a universal computer will cause the string to be printed. The Kolmogorov complexity of a string is an invariant in the sense that the difference in complexity as measured by two universal computers is bounded by a constant which does not depend upon the string. If one employs a universal computer in which the set of halting programs forms a prefix set of binary strings, one can envision an unrealizable noiseless coding scheme which encodes any string into a program of shortest length that generates it, and therefore interpret Kolmogorov complexity as codeword length. This unrealizable scheme provides an important benchmark in theoretical data compression performance studies because, applied to any sequence of strings of increasing lengths, it operates at an asymptotic compression rate less than or equal to the compression rate afforded by any realizable noiseless coding scheme. Many papers have studied the implications of Kolmogorov complexity in information theory and in theoretical computer science; a good survey of these works is the paper [13].

Systemization of universal source coding theory. The first papers in universal noiseless source coding (by Lynch, Davisson, Fitingof, Shtarkov-Babkin, and Babkin) were scattered results that appeared in the period 1966-1971. It remained for someone to unify these results and systematize the field. Lee Davisson accomplished this task in 1973 [4]. In Davisson's paper, we see the formulation of the now familiar concepts of weighted universal code, weakly minimax universal code, and strongly minimax universal code. All of these concepts were generalizable to the area of universal source coding with respect to a fidelity criterion--an explosion of papers in this field appeared in the 1970's due to Ziv, Gray, Neuhoff, Shields, Pursley, Davisson, Mackenthun, Leon-Garcia, and Kieffer (among others); a detailed account of these achievements (with citations) may be found in [11].

Commencement of multiterminal source coding theory. In 1973, David Slepian and Jack Wolf published their paper [23] which marked the beginning of the field of multiterminal source coding theory. These authors were able to specify completely the admissible rate region for noiseless coding of a memoryless multiterminal source; in 1975, Thomas Cover extended their results to the case of a stationary ergodic multiterminal source by means of an elegant proof [3]. Papers in the field of multiterminal source coding relative to a fidelity criterion started to appear in 1975; significant early papers in this field are the papers by Wyner-Ziv [25] and Kaspi-Berger [10]. There are many open (but difficult) problems in the field of multiterminal source coding with respect to a fidelity criterion.

First practical arithmetic coding schemes. Arithmetic coding is an important noiseless coding technique in present day commercial data compression systems. It is preferable to Huffman coding since one can easily determine a close-to-optimal arithmetic code for coding long strings generated by a discrete memoryless source once the string probabilities are specifed, whereas the Huffman algorithm may not generate an optimal code for source strings of length n in time O(n). The first arithmetic code was devised by Peter Elias sometime before 1963 in unpublished work; the first published description of the Elias code appears on pages 61-62 of the text [1]. The Elias code is a non-block code which encodes an infinite source sequence into an infinite code sequence. For each finite string appearing at the beginning of the source sequence, Elias determines a subinterval of the unit interval on the real line of length equal to the probability of that string--the infinite code sequence is then the binary expansion of the unique point in the intersection of these subintervals. Although the Elias code encodes at an optimal rate, it is impractical because it requires infinite precision and therefore an infinite buffer size; also, the Elias code is catastropic in the sense that finitely many errors in the code sequence may cause infinitely many errors in the decoded source sequence. However, the Elias code can be modified to remove these drawbacks. Although practical schemes for arithmetic coding now abound, the first practical arithmetic coding schemes did not become available until 1976, in work by Jorma Rissanen [20] and Richard Pasco [19].

Discovery of Lempel-Ziv codes. The well-known universal noiseless source coding technique due to Jacob Ziv and Abraham Lempel was announced in 1977 [26], although the method is based upon a notion of string complexity that had been proposed by these two authors in a paper the year before. With probability one, a stationary ergodic finite-alphabet source generates a sequence which, when encoded using the Lempel-Ziv algorithm, yields a compression rate equal to the entropy rate, asymptotically as the number of source samples goes to infinity. The Lempel-Ziv algorithm is the most important noiseless source coding technique in the entire history of source coding. A spate of papers has been devoted to the theoretical and practical aspects of Lempel-Ziv coding. On the theoretical side, perhaps the most significant of these is the recent paper by Ornstein and Weiss [18].

[1] N. Abramson, Information theory and coding. New York: McGraw-Hill, 1963.

[2] G. J. Chaitin, ``On the length of programs for computing binary sequences,'' J. Assoc. Comp. Mach., vol. 13, pp. 547-569, 1966.

[3] T. Cover, ``A proof of the data compression theorem of Slepian and Wolf for ergodic sources,'' IEEE Trans. Inform. Theory, vol. 21, pp. 226-228, 1975.

[4] L. D. Davisson, ``Universal noiseless coding,'' IEEE Trans. Inform. Theory, vol. 19, pp. 783-795, 1973.

[5] R. L. Dobrushin, ``A general formulation of the fundamental Shannon theorem in information theory,'' Uspehi Mat. Nauk, vol. 14, pp. 3-104, 1959.

[6] R. G. Gallager, ``Variations on a theme by Huffman,'' IEEE Trans. Inform. Theory, vol. 24, pp. 668-674, 1978.

[7] A. Gersho and R. M. Gray, Vector quantization and signal compression. Boston: Kluwer, 1992.

[8] R. M. Gray and L. D. Davisson, ``Source coding theorems without the ergodic assumption,'' IEEE Trans. Inform. Theory, vol. 20, pp. 502-516, 1974.

[9] D. A. Huffman, ``A method for the construction of minimum redundancy codes,'' Proc. IRE, vol. 40, pp. 1098-1101, 1952.

[10] A. H. Kaspi and T. Berger, ``Rate distortion for correlated sources and partially separated encoders,'' IEEE Trans. Inform. Theory, vol. 28, pp. 828-840, 1982.

[11] J. C. Kieffer, ``A survey of the theory of source coding with a fidelity criterion,'' IEEE Trans. Inform. Theory, 1993, to appear.

[12] A. N. Kolmogorov, ``Three approaches to the quantitative definition of information,'' Problems of Information Transmission, vol. 1, pp. 4-7, 1965.

[13] M. Li and P. M. B. Vitányi, ``Kolmogorov complexity and its applications,'' in: J. van Leeuwen, ed., Handbook of Theoretical Computer Science, Vol. A (Elsevier/MIT Press, Amsterdam/London, UK, 1990), pp. 187-254.

[14] S. P. Lloyd, ``Least squares quantization in PCM,'' IEEE Trans. Inform. Theory, vol. 28, pp. 129-137, 1982.

[15] J. Lukaszewicz and H. Steinhaus, ``On measuring by comparison,'' Zastos. Mat., vol. 2, pp. 225-232, 1955.

[16] B. McMillan, ``The basic theorems of information theory,'' Ann. Math. Stat., vol. 24, pp. 196-219, 1953.

[17] D. Ornstein, ``Bernoulli shifts with the same entropy are isomorphic,'' Advances in Math., vol. 4, pp. 337-352, 1970.

[18] D. S. Ornstein and Weiss, ``Entropy and data compression schemes,'' IEEE Trans. Inform. Theory, vol. 39, pp. 78-83, 1993.

[19] R. Pasco, ``Source coding algorithms for fast data compression,'' Ph.d. thesis, Stanford University, 1976.

[20] J. Rissanen, ``Generalized Kraft inequality and arithmetic coding,'' IBM J. Res. Devel., vol. 20, pp. 198-203, 1976.

[21] C. E. Shannon, ``A mathematical theory of communication,'' Bell Sys. Tech. J., vol. 27, pp. 379-423 and pp. 623-656, 1948.

[22] C. E. Shannon, ``Coding theorems for a discrete source with a fidelity criterion,'' IRE Nat. Conv. Rec., part 4, pp. 142-163, 1959.

[23] D. Slepian and J. K. Wolf, ``Noiseless coding of correlated information sources,'' IEEE Trans. Inform. Theory, vol. 19, pp. 471-480, 1973.

[24] R. J. Solomonoff, ``A formal theory of inductive inference,'' Inform. and Control, vol. 7, pp. 1-22 and pp. 224-254, 1964.

[25] A. D. Wyner and J. Ziv, ``The rate distortion function for source coding with side information at the decoder,'' IEEE Trans. Inform. Theory, vol. 22, pp. 1-10, 1976.

[26] J. Ziv and A. Lempel, ``A universal algorithm for sequential data compression,'' IEEE Trans. Inform. Theory, vol. 23, pp. 337-343, 1977.