In the early 50's, at Bell Laboratories, David Hagelbarger built a simple ``mind reading'' machine, whose purpose was to play the ``penny matching'' game. In this game, a player chooses head or tail, while a ``mind reading" machine tries to predict and match his choice. Surprisingly, as Robert Lucky tells in his book ``Silicon Dreams'', Hagelbarger's simple, 8-state machine, was able to match the ``pennies'' of its human opponent 5,218 times over the course of 9,795 plays. Random guessing would lead to such a high success rate with a probability less than one out of 10 billion! Shannon, who was interested in prediction, information, and thinking machines, closely followed Hagelbarger's machine, and eventually built his own stripped-down version of the machine, having the same states, but one that used a simpler strategy at each state. As the legend goes, in a duel between the two machines, Shannon's machine won by a slight margin! No one knows if this was due to a superior algorithm or just a chance happening associated with the specific sequence at that game. In any event, the success of both these machines against ``untrained'' human opponents was explained by the fact that the human opponents cannot draw completely random bits. Certainly, as Shannon himself noted, his machine was beatable by the best possible player by a ratio of three-to-one. This raises the question, can one design a better ``mind reading'' machine? What is the best one can do in predicting the next outcome of the opponent? How is it related to the ``randomness'' of the opponent sequence? Our work tries to shed light on these questions, and most importantly, unlike the machines of Shannon and Hagelbarger, it presents prediction procedures that provably, outperform (asymptotically) any finite state (FS) machine in the penny matching game, for any sequence of choices made by the opponent.
The problem of predicting the next outcome in a sequence of observations has intruiged Shannon and other information theorists ever since the early days of the field. For example, Shannon tried to assess the entropy of the English language by letting humans predict the next symbol in English texts . It was felt that the complexity, or the amount of information in a source, is related to how well it can be predicted. Now, the lossless source coding theorem tells us that the information content of a source is measured by how well it can be compressed. By virtue of the vast work done in the last two decades on universal source coding, we know how to compress an unknown source and to approach the optimal rate, i.e. the entropy of the source. Furthermore, Ziv and Lempel , postulated and solved a meaningful universal coding problem for arbitrary individual sequences. Can the results in universal source coding be applied in the universal prediction problem?
Well, in , a universal gambling problem was solved by applying, quite directly, results of Rissanen, Langdon and Lempel-Ziv on universal coding. The work on universal gambling utilized the equivalence, pointed out by Kelly  and later on extended by Cover , between minimizing the codelength in lossless data compression and maximizing the capital in gambling with fair odds on the outcome of the sequence. But what about the prediction problem where the goal is to minimize the number of errors made in guessing the next outcome?
The problem of universal prediction was also posed in , but only a year later, in the Fall of 1990, we started to look at it seriously. The purpose of this article is to present a brief ``roadmap'' of our work on universal prediction. We shall describe the obstacles that we had to overcome, the main results we obtained and their consequences, and finally raise some afterthoughts that we have now, three years after this work has been done. While writing this article, we vividly recalled the time and atmosphere in which our work was carried out. We started our collaboration a little while after Saddam Hussein invaded Kuwait, when tension was rising in the Middle East and gas masks were being distributed in Israel. We came up with most of the results during the Fall semester that preceded the Gulf War. The paper was written during the war, and so the writing was interuppted occasionally by Scud missiles. The first version was completed on the day the ground attack in Iraq and Kuwait began. A month after the war, in April 1991, the paper was submitted for publication, and was eventually published in July 1992.