In this brief survey, we shall discuss ten milestones in the history of source coding theory. These are as follows:

Commencement of noiseless source coding theory (1948)

Discovery of the Huffman algorithm (1952)

Establishment of the Shannon-McMillan Theorem (1953)

Discovery of the Lloyd algorithm (1957)

Systemization of rate distortion theory (1959)

Birth of the Kolmogorov complexity concept (1964)

Systemization of universal source coding theory (1973)

Commencement of multiterminal source coding theory (1973)

First practical arithmetic coding schemes (1976)

Discovery of Lempel-Ziv codes (1977)

This ``top ten'' list reflects the subjective opinions of the author, and there may be some who feel that other important events have been slighted. The reader is referred to [11] for a more complete survey of source coding theory.

**Commencement of noiseless source coding theory.** Claude Shannon initiated this subject (along with other subjects in the
foundations of information theory) in the ground-breaking paper [21]. The
key concept of entropy was formulated here. Shannon then gave the entropy
concept operational significance by proving the first noiseless source coding
theorem. Shannon's theorem ([21], Theorem 9) states that the entropy rate
of a stationary ergodic finite-alphabet Markov source is the optimum rate
at which the source can be encoded using ``almost noiseless'' fixed-rate
block codes or using noiseless variable-rate block codes. To prove the fixed-rate
half of the theorem, Shannon constructed fixed-rate block codes using an
auxiliary theorem which later became known as the Shannon-McMillan Theorem
(see below). In the proof of the variable-rate half of the theorem, Shannon
devised a noiseless variable-rate block code (nowadays called a Shannon-Fano
code) which encodes a block with probability of occurence *p* into a codeword of length within one bit of -log .
Today, we know that Shannon's noiseless source coding theorem holds more
generally for ergodic asymptotically mean stationary source models. The
concept of entropy has also proved useful in ergodic theory; we point out
Donald Ornstein's celebrated result [17] that two stationary ergodic processes
with the same entropy rate are isomorphic via stationary codes.

**Discovery of the Huffman algorithm.** A problem left open by Shannon in his 1948 paper was the determination
of a noiseless prefix code for a random variable taking finitely many values
which is optimal in the sense of minimizing the expected codeword length.
(The Shannon-Fano code which he introduced in [21] may fail to be optimal.)
In 1952, David Huffman published the algorithm, which bears his name, that
determines at least one optimal code [9]. Quite a number of papers have
appeared since 1952 related to the Huffman algorithm and modifications thereof;
one of the most important of these is the paper by Robert Gallager on adaptive
Huffman coding [6].

**Establishment of the Shannon-McMillan Theorem.** Roughly speaking, the Shannon-McMillan Theorem states that for a stationary
ergodic finite-alphabet source, there is a set of source strings of length *n* with probability close to one in which each string has about the same probability
of occurence, provided *n* is sufficiently large. This result was proved in 1953 by Brockway McMillan
[16], after Shannon had earlier established a special case of this result
for a Markov source ([21], Theorem 3). Extensions and generalizations of
the Shannon-McMillan Theorem were later obtained by Breiman, Perez, Moy,
Orey, Barron, Algoet-Cover, and others. The Shannon-McMillan Theorem is
sometimes referred to as the fundamental theorem of information theory,
because of its use as a tool to prove both channel and source coding theorems.
This theorem is important in other fields, also (such as ergodic theory,
statistics, and statistical mechanics).

**Discovery of the Lloyd algorithm.** An *N*-level vector quantizer for *k*-dimensional blocks of real-valued source samples is designed by determining
the *N* codevectors in *k*-dimensional Euclidean space into which the source blocks are to be quantized.
The goal in vector quantizer design is to find a vector quantizer for which
the expected squared-error quantization noise achieves a value close to
the minimum. Stuart Lloyd, in an unpublished technical report written in
1957 (eventually published in 1982 [14]) proposed an algorithm for accomplishing
this goal. (Although Lloyd stated his algorithm for the scalar quantization
case in which *k = 1*, it trivially extends to the *k > 1* case.) The Lloyd algorithm starts with an initial quantizer and modifies
it through a sequence of iterations--on each iteration, the codevectors
from the preceding iteration are replaced with the centroids of the nearest
neighbor regions corresponding to these codevectors. As long as successive
iterations of the Lloyd algorithm continue to generate new quantizers, the
quantization noise strictly decreases. (This is because of the Lloyd-Max
necessary conditions for an optimal quantizer, which appear to have been
first discovered not by Lloyd or Max but by the Polish mathematicians Lukaszewicz
and Steinhaus [15].) In the scalar quantization case, if the density function
of the source samples is log-concave and has finite variance, it is known
that the *N*-level quantizers generated by iterates of the Lloyd algorithm have asymptotically
optimal performance, independently of which *N*-level quantizer is chosen at the start of the iteration process. Unfortunately,
the Lloyd algorithm is sensitive to the choice of initial quantizer if *k > 1*: for some choices, the quantization noise may not decrease to the minimum
value. Recent research (cited in [11]) has focused on modifications of the
Lloyd algorithm through simulated annealing or stochastic relaxation techniques
that avoid this problem. It should be mentioned that there have been other
significant contributions to the area of quantization in addition to the
Lloyd algorithm--there have been numerous advances in high rate quantization
theory and in lattice quantization, for example. The reader is referred
to the recent text [7] for a thorough discussion of quantizer theory and
design.

**Systemization of rate distortion theory.** In 1948, Shannon defined the concept of rate distortion function and then
sketched the proof of a lossy source coding theorem [21, Theorem 21]. However,
it was the year 1959 that saw the formulation of rate distortion theory
as we know it today--that year marked the appearance of Shannon's paper
[22] in which he laid out basic tenets of rate distortion theory, along
with a rigorous proof of a lossy source coding theorem. Shannon's source
coding theorem ([22], Theorems 1 and 2) states that the rate distortion
function for a discrete memoryless source, evaluated at a distortion level *D*, gives the optimum rate in bits per source sample at which the source can
be encoded by fixed-rate block codes, provided that the resulting distortion
in the recovery of the source output must be less than or equal to *D*. Independently, R. L. Dobrushin [5] also published a lossy source coding
theorem in 1959, which is analogous to Shannon's. In the 1960's, Gallager
and Berger extended Shannon's coding theorem to the case of stationary ergodic
sources. However, the optimum rate for fixed-rate block coding of a non-ergodic
stationary source is not given by the rate distortion function [8].

**Birth of the Kolmogorov complexity concept.** Independently and simultaneously, Kolmogorov [12], Solomonoff [24], and
Chaitin [2] proposed the notion of string complexity which has since come
to be known as Kolmogorov complexity. The Kolmogorov complexity of a finite
string is the length in bits of the shortest program which when run on a
universal computer will cause the string to be printed. The Kolmogorov complexity
of a string is an invariant in the sense that the difference in complexity
as measured by two universal computers is bounded by a constant which does
not depend upon the string. If one employs a universal computer in which
the set of halting programs forms a prefix set of binary strings, one can
envision an unrealizable noiseless coding scheme which encodes any string
into a program of shortest length that generates it, and therefore interpret
Kolmogorov complexity as codeword length. This unrealizable scheme provides
an important benchmark in theoretical data compression performance studies
because, applied to any sequence of strings of increasing lengths, it operates
at an asymptotic compression rate less than or equal to the compression
rate afforded by any realizable noiseless coding scheme. Many papers have
studied the implications of Kolmogorov complexity in information theory
and in theoretical computer science; a good survey of these works is the
paper [13].

**Systemization of universal source coding theory.** The first papers in universal noiseless source coding (by Lynch, Davisson,
Fitingof, Shtarkov-Babkin, and Babkin) were scattered results that appeared
in the period 1966-1971. It remained for someone to unify these results
and systematize the field. Lee Davisson accomplished this task in 1973 [4].
In Davisson's paper, we see the formulation of the now familiar concepts
of weighted universal code, weakly minimax universal code, and strongly
minimax universal code. All of these concepts were generalizable to the
area of universal source coding with respect to a fidelity criterion--an
explosion of papers in this field appeared in the 1970's due to Ziv, Gray,
Neuhoff, Shields, Pursley, Davisson, Mackenthun, Leon-Garcia, and Kieffer
(among others); a detailed account of these achievements (with citations)
may be found in [11].

**Commencement of multiterminal source coding theory.** In 1973, David Slepian and Jack Wolf published their paper [23] which marked
the beginning of the field of multiterminal source coding theory. These
authors were able to specify completely the admissible rate region for noiseless
coding of a memoryless multiterminal source; in 1975, Thomas Cover extended
their results to the case of a stationary ergodic multiterminal source by
means of an elegant proof [3]. Papers in the field of multiterminal source
coding relative to a fidelity criterion started to appear in 1975; significant
early papers in this field are the papers by Wyner-Ziv [25] and Kaspi-Berger
[10]. There are many open (but difficult) problems in the field of multiterminal
source coding with respect to a fidelity criterion.

**First practical arithmetic coding schemes.** Arithmetic coding is an important noiseless coding technique in present
day commercial data compression systems. It is preferable to Huffman coding
since one can easily determine a close-to-optimal arithmetic code for coding
long strings generated by a discrete memoryless source once the string probabilities
are specifed, whereas the Huffman algorithm may not generate an optimal
code for source strings of length *n* in time O(*n*). The first arithmetic code was devised by Peter Elias sometime before
1963 in unpublished work; the first published description of the Elias code
appears on pages 61-62 of the text [1]. The Elias code is a non-block code
which encodes an infinite source sequence into an infinite code sequence.
For each finite string appearing at the beginning of the source sequence,
Elias determines a subinterval of the unit interval on the real line of
length equal to the probability of that string--the infinite code sequence
is then the binary expansion of the unique point in the intersection of
these subintervals. Although the Elias code encodes at an optimal rate,
it is impractical because it requires infinite precision and therefore an
infinite buffer size; also, the Elias code is catastropic in the sense that
finitely many errors in the code sequence may cause infinitely many errors
in the decoded source sequence. However, the Elias code can be modified
to remove these drawbacks. Although practical schemes for arithmetic coding
now abound, the first practical arithmetic coding schemes did not become
available until 1976, in work by Jorma Rissanen [20] and Richard Pasco [19].

**Discovery of Lempel-Ziv codes.** The well-known universal noiseless source coding technique due to Jacob
Ziv and Abraham Lempel was announced in 1977 [26], although the method is
based upon a notion of string complexity that had been proposed by these
two authors in a paper the year before. With probability one, a stationary
ergodic finite-alphabet source generates a sequence which, when encoded
using the Lempel-Ziv algorithm, yields a compression rate equal to the entropy
rate, asymptotically as the number of source samples goes to infinity. The
Lempel-Ziv algorithm is the most important noiseless source coding technique
in the entire history of source coding. A spate of papers has been devoted
to the theoretical and practical aspects of Lempel-Ziv coding. On the theoretical
side, perhaps the most significant of these is the recent paper by Ornstein
and Weiss [18].

[1] N. Abramson, Information theory and coding. New York: McGraw-Hill, 1963.

[2] G. J. Chaitin, ``On the length of programs for computing binary sequences,'' *J. Assoc. Comp. Mach.*, vol. 13, pp. 547-569, 1966.

[3] T. Cover, ``A proof of the data compression theorem of Slepian and Wolf
for ergodic sources,'' *IEEE Trans. Inform. Theory*, vol. 21, pp. 226-228, 1975.

[4] L. D. Davisson, ``Universal noiseless coding,'' *IEEE Trans. Inform. Theory*, vol. 19, pp. 783-795, 1973.

[5] R. L. Dobrushin, ``A general formulation of the fundamental Shannon
theorem in information theory,'' *Uspehi Mat. Nauk*, vol. 14, pp. 3-104, 1959.

[6] R. G. Gallager, ``Variations on a theme by Huffman,'' *IEEE Trans. Inform. Theory*, vol. 24, pp. 668-674, 1978.

[7] A. Gersho and R. M. Gray, Vector quantization and signal compression. Boston: Kluwer, 1992.

[8] R. M. Gray and L. D. Davisson, ``Source coding theorems without the
ergodic assumption,'' *IEEE Trans. Inform. Theory*, vol. 20, pp. 502-516, 1974.

[9] D. A. Huffman, ``A method for the construction of minimum redundancy
codes,'' *Proc. IRE*, vol. 40, pp. 1098-1101, 1952.

[10] A. H. Kaspi and T. Berger, ``Rate distortion for correlated sources
and partially separated encoders,'' *IEEE Trans. Inform. Theory*, vol. 28, pp. 828-840, 1982.

[11] J. C. Kieffer, ``A survey of the theory of source coding with a fidelity
criterion,'' *IEEE Trans. Inform. Theory*, 1993, to appear.

[12] A. N. Kolmogorov, ``Three approaches to the quantitative definition
of information,'' *Problems of Information Transmission*, vol. 1, pp. 4-7, 1965.

[13] M. Li and P. M. B. Vitányi, ``Kolmogorov complexity and its applications,'' in: J. van Leeuwen, ed., Handbook of Theoretical Computer Science, Vol. A (Elsevier/MIT Press, Amsterdam/London, UK, 1990), pp. 187-254.

[14] S. P. Lloyd, ``Least squares quantization in PCM,'' *IEEE Trans. Inform. Theory*, vol. 28, pp. 129-137, 1982.

[15] J. Lukaszewicz and H. Steinhaus, ``On measuring by comparison,'' *Zastos. Mat.*, vol. 2, pp. 225-232, 1955.

[16] B. McMillan, ``The basic theorems of information theory,'' *Ann. Math. Stat.*, vol. 24, pp. 196-219, 1953.

[17] D. Ornstein, ``Bernoulli shifts with the same entropy are isomorphic,'' *Advances in Math.*, vol. 4, pp. 337-352, 1970.

[18] D. S. Ornstein and Weiss, ``Entropy and data compression schemes,'' *IEEE Trans. Inform. Theory*, vol. 39, pp. 78-83, 1993.

[19] R. Pasco, ``Source coding algorithms for fast data compression,'' Ph.d. thesis, Stanford University, 1976.

[20] J. Rissanen, ``Generalized Kraft inequality and arithmetic coding,'' *IBM J. Res. Devel.*, vol. 20, pp. 198-203, 1976.

[21] C. E. Shannon, ``A mathematical theory of communication,'' *Bell Sys. Tech. J.*, vol. 27, pp. 379-423 and pp. 623-656, 1948.

[22] C. E. Shannon, ``Coding theorems for a discrete source with a fidelity
criterion,'' *IRE Nat. Conv. Rec.*, part 4, pp. 142-163, 1959.

[23] D. Slepian and J. K. Wolf, ``Noiseless coding of correlated information
sources,'' *IEEE Trans. Inform. Theory*, vol. 19, pp. 471-480, 1973.

[24] R. J. Solomonoff, ``A formal theory of inductive inference,'' *Inform. and Control*, vol. 7, pp. 1-22 and pp. 224-254, 1964.

[25] A. D. Wyner and J. Ziv, ``The rate distortion function for source coding
with side information at the decoder,'' *IEEE Trans. Inform. Theory*, vol. 22, pp. 1-10, 1976.

[26] J. Ziv and A. Lempel, ``A universal algorithm for sequential data compression,'' *IEEE Trans. Inform. Theory*, vol. 23, pp. 337-343, 1977.