Next: Problem Formulation Up: Reflections on ``Universal Prediction Previous: Reflections on ``Universal Prediction

# What is Universal Prediction About?

In the early 50's, at Bell Laboratories, David Hagelbarger built a simple ``mind reading'' machine, whose purpose was to play the ``penny matching'' game. In this game, a player chooses head or tail, while a ``mind reading" machine tries to predict and match his choice. Surprisingly, as Robert Lucky tells in his book ``Silicon Dreams'', Hagelbarger's simple, 8-state machine, was able to match the ``pennies'' of its human opponent 5,218 times over the course of 9,795 plays. Random guessing would lead to such a high success rate with a probability less than one out of 10 billion! Shannon, who was interested in prediction, information, and thinking machines, closely followed Hagelbarger's machine, and eventually built his own stripped-down version of the machine, having the same states, but one that used a simpler strategy at each state. As the legend goes, in a duel between the two machines, Shannon's machine won by a slight margin! No one knows if this was due to a superior algorithm or just a chance happening associated with the specific sequence at that game. In any event, the success of both these machines against ``untrained'' human opponents was explained by the fact that the human opponents cannot draw completely random bits. Certainly, as Shannon himself noted, his machine was beatable by the best possible player by a ratio of three-to-one. This raises the question, can one design a better ``mind reading'' machine? What is the best one can do in predicting the next outcome of the opponent? How is it related to the ``randomness'' of the opponent sequence? Our work tries to shed light on these questions, and most importantly, unlike the machines of Shannon and Hagelbarger, it presents prediction procedures that provably, outperform (asymptotically) any finite state (FS) machine in the penny matching game, for any sequence of choices made by the opponent.

The problem of predicting the next outcome in a sequence of observations has intruiged Shannon and other information theorists ever since the early days of the field. For example, Shannon tried to assess the entropy of the English language by letting humans predict the next symbol in English texts [1]. It was felt that the complexity, or the amount of information in a source, is related to how well it can be predicted. Now, the lossless source coding theorem tells us that the information content of a source is measured by how well it can be compressed. By virtue of the vast work done in the last two decades on universal source coding, we know how to compress an unknown source and to approach the optimal rate, i.e. the entropy of the source. Furthermore, Ziv and Lempel [2], postulated and solved a meaningful universal coding problem for arbitrary individual sequences. Can the results in universal source coding be applied in the universal prediction problem?

Well, in [3], a universal gambling problem was solved by applying, quite directly, results of Rissanen, Langdon and Lempel-Ziv on universal coding. The work on universal gambling utilized the equivalence, pointed out by Kelly [4] and later on extended by Cover [5], between minimizing the codelength in lossless data compression and maximizing the capital in gambling with fair odds on the outcome of the sequence. But what about the prediction problem where the goal is to minimize the number of errors made in guessing the next outcome?