Wyner: Let me begin by extending to you, on behalf of the information theory community, warmest congratulations on your upcoming 65th birthday, and the on the receipt of two very prestigious awards last year - the Marconi International Fellowship, and the IEEE Hamming Medal. I'll begin by asking when and how you first got interested in communication theory?
Ziv: I was always interested in communication. My first job after the Technion was with the Israel Ministry of Defense in a communication group. This was the time of the emergence of the transistor. My first design job was the design a transistorized telemetry system. There I was exposed to noisy signals and the problems of extracting the signals from noise.
At the Israel Ministry of Defense I found Goldman's information theory book, and with a colleague Zvi Neter, studied it carefully. Despite my reservations about the book, I found the subject fascinating.
The Technion had a very small graduate program, and at that time the Israel Ministry of Defense sent young engineers abroad to study. I almost went to the University of London to study with Colin Cherry. But Moshe Zakai advised me to apply to more than one institution, and I ended up at MIT.
Once at MIT, it was obvious that information theory was the subject to study. The leaders of information theory at that time were Bob Fano, Peter Elias, and Jack Wozencraft. Bob Gallager had just graduated.
My first PhD problem was to study the effects of quantization (both input and output quantization) on random code error-exponents. The study of error-exponents for channel coding was a central problem in those days. This was before the appearance of the Gallager bounds. I had heard about some lecture notes of Shannon's which used what we now call Chernoff bounds to obtain error-exponents. But Shannon was not around. Desperate, I persuaded his secretary to let me and Tom Kailath into his office, where we found a pile of the lecture notes. That made a big difference in my work.
Also at MIT I became interested in sequential decoding which had been invented shortly before by Jack Wozencraft. At that time there were still loose ends in the proof of the efficacy of the technique. I found an error in the proof of one of the important theorems, but couldn't fix it. In fact all this was not resolved until Fano wrote his now classic paper. But I did find another sequential decoding algorithm, which I could prove worked, but with a lower "cutoff" rate than is necessary.
In those days, information theorists were the only ones interested in computational complexity. Later John Savage, another MIT PhD, introduced information theoretic ideas to the computer science community.
After my doctoral work, I continued studying complexity in channel coding. I succeeded in using Elias' iterative coding to show the existence of coding schemes with polynomially increasing computation, and vanishing error probability. Both my inner- and outer-codes were random. This work was a precursor of Dave Forney's classic work on concatenated coding.
Wyner: Who were the people with whom you interacted at MIT in those years?
Ziv: It was an all-star cast. My advisor was Jack Wozencraft, and I worked a great deal with Bob Fano. Bob Gallager was a new Assistant Professor. My fellow students included Harry van Trees, Dave Sakrison, Fred Jelinek, Tom Kailath, Len Kleinrock, and Tom Goblick.
Wyner: Did you return to the Israel Ministry of Defense after your PhD?
Ziv: Yes. And, as a side interest, I developed an interest in the Gaussian channel. I worked on the famous (still unsolved) simplex problem. I also collaborated with Moshe Zakai. We used ideas from information theory to derive bounds on the accuracy of estimation. I also did work studying the bias of estimators. Also in those years I began my interest in universal source coding. Shmuel Winograd (who was visiting Israel from IBM) and I ran a seminar on Kolmogorov complexity. I found this concept very unsatisfying. The presence of the large constant in the complexity measure makes it impossible to calculate the Kolmogorov complexity for a specific non-trivial sequence. It was the search for a better complexity measure that began my work on universal data compression. I tried to do this with Shmuel, but with no luck.
I came to Bell Labs in 1968 for an 18-month sabbatical. It was a wonderful year, working with you (Aaron) and Jack Wolf. I worked on a wide variety of information theory problems, including source and channel coding, and the beginnings of work on universal source coding. This was the beginning of a lifelong collaboration with you, Aaron.
Wyner: And after Bell Labs?
Ziv: I returned to a full-time appointment at the Technion in 1970. And it was here that I turned seriously to the problem of characterizing the complexity of an information source, and the related problem of universal data compression. The basic idea behind the earliest formulation of the Lempel-Ziv complexity is the following. Suppose one has a sequence of c distinct letters from some information source. It is quite intuitive to speculate that it requires c*log(c) bits to encode the sequence - using log(c) bits to encode each letter. Now suppose one has a say, binary sequence which can be parsed into c distinct substrings. We can regard the substrings as members of a higher order alphabet, and again use about c*log(c) to encode the sequence. This was the beginning. In the next few years, with Abraham Lempel, I wrote a series of papers on the Lempel-Ziv algorithm. I was always keeping an eye out for the theoretical and probabilistic considerations in what we were doing, while Abraham was more interested in the practical algorithmic issues. Two particularly notable papers were in 1977 and 1978, respectively. In the first, we showed that a version of the LZ algorithm could be used to compress data, as well as compute its complexity. We also proved that the compression was optimal in a certain sense. The algorithm turned out to be fairly simple too. We won the Information Theory Society Prize paper for the 1977 paper. But this paper had two shortcomings. The first was that (so it seemed) it was not efficient to implement, and the second that the optimality conditions were somewhat artificial. The 1978 paper improved in both these directions. In particular we proved that if the source was a sample of a stationary ergodic source, then the algorithm yielded a complexity equal to the source entropy.
Wyner: What about the connection between information theory and statistics?
Ziv: In the years following 1978, I did I great deal of work relating data compression and various aspects of statistics, including classification, detection, and estimation. The pattern matching techniques, derived from the Lempel-Ziv algorithm, proved very helpful here. I was also fortunate to have attracted excellent students, like Neri Merhav.
Wyner: Does your work have a philosophy?
Ziv: Well I have always felt that a "solution" to a problem without a theoretical justification is lacking something important. At the very same time, I always liked to work on problems with a "real-world" application. I also learned at MIT to be happy looking at simple analyzable cases in order to derive intuition.