One of the motivations to work on the prediction problem was the notion that the ability to predict the next outcome of a sequence is related to the complexity, or the ``randomness'' of the sequence. Classical information theory provides a precise measurement for the amount of information in an ``information source'', namely its entropy. For deterministic sequences we have the Solomonoff-Kolmogorov-Chaitin definition of ``Kolmogorov Complexity'' as the shortest program for a universal Turing machine that outputs the sequence. Unfortunately, this measure is undecidable, but in the same spirit we have the complexity definition of Lempel and Ziv as the shortest code of this sequence obtained by an FS encoder. There is also the stochastic complexity or minimal description length (MDL) definition of Rissanen, which is again associated with the description length or coding with respect to a model class. But how is the prediction ability related to complexity?

The predictability is related to the error probability in guessing the outcome of a variable, while the compressibility is related to its entropy, or equivocation. By a simple observation it is demonstrated that there is no one-to-one relation between the predictability and the compressibility. In this work we have considered the binary case and realized that , where is the compressibility, is the predictability and is the binary entropy function. Recently [14], we have found tight upper and lower bounds (the lower bound actually follows from Fano's inequality) on the predictability in terms of the compressibility for any finite alphabet which define the region of allowable pairs. While the predictability is not uniquely determined by the compressibility in general, at the extreme points the bounds imply the intuitively appealing result that a sequence is perfectly predictable if and only if it is totally redundant and conversely, a sequence is totally unpredictable if and only if it is incompressible.

Since the figure describing the ability to predict
is a distinct feature and not a function of the compressibility
it might be used to define a * prediction error complexity* which is different
from the earlier definitions associated with the description length.
However, a precise definition of a prediction error complexity, analogous to the definition
of the Kolmogorov complexity, is still unclear.
Intuitively, this definition should reflect the ability of a universal Turing
machine, not allowed to look into the future, to predict the next outcome of
the sequence.

Let us pose another open problem. From the work of Lempel and Ziv,
the compressibility of a sequence, or the entropy of an ergodic process, can
be easily assessed as where **c** is the number of ``commas''
in the incremental parsing algorithm. Can one come
up with a similar simple estimate of the predictability?

Tue Aug 8 11:36:42 PDT 1995