In the early 50's, at Bell Laboratories, David Hagelbarger built a simple ``mind reading''
machine, whose purpose was to play the ``penny matching'' game. In this game, a player
chooses * head* or * tail*, while a ``mind reading" machine tries to predict
and match his choice. Surprisingly, as Robert Lucky tells in his book ``Silicon Dreams'',
Hagelbarger's simple, 8-state machine,
was able to match the ``pennies'' of its human opponent 5,218 times
over the course of 9,795 plays.
Random guessing would lead to such a high success rate with a probability
less than one out of 10 billion! Shannon, who was interested in prediction,
information, and thinking machines, closely followed Hagelbarger's machine, and eventually
built his own stripped-down
version of the machine, having the same states, but one that
used a simpler strategy at each state.
As the legend goes, in a duel between the two machines,
Shannon's machine won by a slight margin! No one knows if this was due to
a superior algorithm or just a chance happening associated with the
specific sequence at that game.
In any event, the success of both these machines against ``untrained''
human opponents was explained by the fact that the
human opponents cannot draw completely random bits.
Certainly, as Shannon himself noted, his machine was beatable by the best
possible player by a ratio of three-to-one.
This raises the question, can one design a better ``mind reading'' machine?
What is the best one can do in predicting the next outcome of the opponent?
How is it related to the ``randomness'' of the opponent sequence?
Our work tries to shed light on these questions, and most importantly,
unlike the machines of Shannon and Hagelbarger, it presents
prediction procedures that provably, outperform (asymptotically) any
finite state (FS) machine in the penny matching game, for any sequence of choices
made by the opponent.

The problem of predicting the next outcome in a sequence of observations has
intruiged Shannon and other
information theorists ever since the early days of the field. For example,
Shannon tried to assess the entropy of the English language by letting
humans predict the next symbol in English texts
[1].
It was felt that the complexity, or the amount of information
in a source, is related to how well it can be predicted.
Now, the lossless source coding theorem tells us that
the information content of a source is measured by how well it
can be compressed. By virtue of the vast work done in the last two decades
on universal source coding, we know
how to compress an unknown source and to approach the optimal rate, i.e. the entropy
of the source. Furthermore, Ziv and Lempel [2], postulated and solved
a meaningful universal coding problem for * arbitrary* individual sequences.
Can the results in universal source coding be applied in the universal prediction problem?

Well, in [3], a universal * gambling* problem was solved by applying,
quite directly, results of Rissanen, Langdon and Lempel-Ziv on universal coding.
The work on universal gambling utilized the equivalence,
pointed out by Kelly [4] and later on extended by Cover [5],
between minimizing the codelength in lossless data compression
and maximizing the capital in gambling with fair odds on the outcome of the sequence.
But what about the prediction problem where the goal is to minimize
the number of errors made in guessing the next outcome?

The problem of universal prediction was also posed in [3], but only a year later, in the Fall of 1990, we started to look at it seriously. The purpose of this article is to present a brief ``roadmap'' of our work on universal prediction. We shall describe the obstacles that we had to overcome, the main results we obtained and their consequences, and finally raise some afterthoughts that we have now, three years after this work has been done. While writing this article, we vividly recalled the time and atmosphere in which our work was carried out. We started our collaboration a little while after Saddam Hussein invaded Kuwait, when tension was rising in the Middle East and gas masks were being distributed in Israel. We came up with most of the results during the Fall semester that preceded the Gulf War. The paper was written during the war, and so the writing was interuppted occasionally by Scud missiles. The first version was completed on the day the ground attack in Iraq and Kuwait began. A month after the war, in April 1991, the paper was submitted for publication, and was eventually published in July 1992.

Tue Aug 8 11:36:42 PDT 1995