Editor's Note: This article is not yet fully formatted.

What is now commonly called the information age began with a double "big bang", and this year we are celebrating its fiftieth anniversary. For it was in 1947-48, at a few months' interval and in the same institution (Bell Labs, New Jersey, USA), that William Shockley and his team invented the transistor and Claude Shannon published his theory of communications. This prodigious coincidence saw the twin birth of the semiconductor component, which can represent binary information by its physical state, and the "bit", the mathematical unit measuring this information. Today, we can better appreciate the real importance of these two discoveries which have deeply marked this half-century technically, socially and even politically. So it is a great honor for us to have been invited in this particular year to write these reflections.

The article is divided into two distinct parts. The first part describes the progression, experimentation, observation and reflection which led to a new family of error-correcting codes: turbo codes. What will be noticed in this chronologically narrated experience is the naïveté required to invent this new coding scheme and its associated decoder. In the second part, specific (self-concatenated) turbo codes are presented and the usual turbo decoding rules are summarized.

The "Ecole Nationale Supérieure des Télécommunications de Bretagne" was created by France Télécom in 1978. After some years spent setting up the teaching program, research activities began in electronics, signal processing and computer science. The mainly young researchers were free to define their own fields of investigation, in collaboration with other research centres (CNET, CCETT1, ...) in Brittany, a region where menhirs, church steeples and telecommunications antennae share the skyline. With our colleague Patrick Adde, we decided to focus our attention on the Soft-Output Viterbi Algorithm (SOVA).

Two articles [1,2] marked the beginning of our work on error-correcting codes, not forgetting of course inescapable references like [3,4]. Our aim was to translate the SOVA algorithm described in [1] into MOS transistors, in the simplest possible way. We met Gérard Battail several times and it appeared that there was definitely much to be done in and perhaps something to contribute to this fascinating area of coding2. We eventually arrived at a suitable hardware solution [5]. This took a long time - two years. The next developments went faster, but those two years enabled us to work out a philosophy of soft-output decoding.

Following Joachim Hagenauer and Peter Hoeher [2], we observed that a soft-output decoder could be regarded as a signal-to-noise (SNR) amplifier. This observation stimulated us to consider techniques commonly used with electronic amplifiers, especially feedback. Note however that this parallel with amplifiers is meaningful only if the input and output represent the same signal - that is to say, only if the code is systematic.

Using our version of the SOVA, we were able to cascade SNR amplifiers and do the experiments described in [6], which was our initial plan. That is, we decoded a classical (i.e. serial) concatenation of two ordinary (i.e. nonsystematic, nonrecursive) convolutional codes or even more than two. Concatenation is a simple way to obtain large asymptotic gains [7], but the performance at low SNRs is degraded due to having to share the redundancy energy between the different component codes.

Having simulated the decoding of several simple concatenation schemes, typically two serial codes with rates 3/4 and 2/3 (Fig. 1), we thought that it would be interesting to see what would happen if we reinjected the result of the outer decoding into the inner decoder. As the different levels of the composite decoder do not represent the same pieces of information (the codes are not systematic), it was necessary to build an estimate of symbols x2 and y2 at the output of SOVA2. At first, it was a great surprise to observe that the bit error rate (BER) of these reconstructed symbols after decoding was lower than that of decoded information d. We were unable to find any explanation for this strange behavior in the literature.

It was then a straightforward task to (re)invent recursive systematic convolutional (RSC) codes to take advantage of this heretofore unreported property. We gave Punya Thitimajshima, one of our PhD. students, the work of analyzing the distance properties of these codes [8].

The soft-output Viterbi decoder of a systematic code provides a good estimate of the log likelihood ratio (LLR) relative to any input symbol, which can obviously be seen as the sum of two contributions. The first, intrinsic information (possibly including a priori probability, see part II), is already available before decoding; the second, extrinsic information, is provided by exploiting the dependencies (due to convolution, parity, ...) which exist between this particular symbol and the other symbols processed by the decoder. In any decoding construction, extrinsic information must not be used by the processor which produced it. In a way, this principle had already been stated by Gallager in [9], and in parallel with our work, it was also applied by Lodge et al. [10]. We came up with the scheme depicted in Fig. 2. The decoder is built in such a way that no information produced by a component decoder can feed its input, either directly or indirectly.

The digital processing of information has at least one major drawback: it does not lend itself easily to feedback techniques. An iterative procedure has to be used3 because of the delays (trellis, interleaving, ...). This procedure increases the decoder latency, but ever-continuing progress in microelectronics makes possible today what was unimaginable not so long ago.

Two hardware solutions can be contemplated, depending on the information rate. For low data rates, a single decoding processor working at a high frequency can make all the required iterations with a tolerable added delay. For high rates, a cascade of identical decoding modules can be implemented monolithically to enable high-speed pipe-line processing (in this case, typically that of broadcasting, the latency problems are generally less crucial).

The idea of so-called parallel concatenation did not germinate by analogy with product codes. More trivially, we were seeking to simplify the clock distribution in our envisaged high-data-rate integrated circuit, because classical (serial) concatenation requires different clocks to be generated. Parallel concatenation (Fig. 3) simplifies the architecture of the composite decoder as the two encoders and the two component decoders work with the same clock.

Having said that, what is the general principle of error-correcting coding? It is the distribution of the energy available between different redundant symbols, in such a way that the receiver can take advantage of a diversity effect; the greater the number of redundant symbols, the greater the diversity effect. With the same coding rate, and hence the same transmitting bandwidth, parallel concatenation yields more redundant symbols from the outer code than does serial concatenation. Therefore this coding construction was not inappropriate [13].

The results of our first simulations were both encouraging and frustrating; encouraging because we had unusually good performance at low SNRs; frustrating, because the BER had difficulty in falling below 10-4.

Unlike serial concatenation, parallel concatenation requires the decoder to take particular care over the importance to be given to the different data available. Thus, in the decoder in Fig. 3.b, the reliability of LLR1(d) increases with the number of iterations, whereas that of Y2 always remains the same. Therefore we must take this evolution into careful consideration to avoid the BER stagnating at a value imposed by the noise affecting Y2 [14]. The same is true for the relative reliabilities of X1, Y1 and Z, but this is apparently less important. That is why we now prefer to use the parallel decoder as depicted in Fig. 3.c, which eliminates the weighting problem for Y2. Finally, by replacing the SOVA by the BCJR (APP) algorithm [15], and by manipulating probabilities instead of LLRs [16], weighting problems are removed.

Another possible cause of BER flattening is insufficient minimum code distance. We examined closely the reasons for the error patterns (regular, much more often than not) that the turbo decoding of two parallel concatenated codes could not eliminate. It seemed clear that some disorder could be advantageously introduced in the interleaving permutation. Pseudo-random transposition is a concept of prime importance in cryptology where the greatest distance is sought between the original message and the encrypted message. In a way, the approach is the same with error-correcting coding; it is a quite fascinating concept, associating as it does algebra, geometry and statistics.

With serial concatenation, regular interleaving is generally sufficient to obtain large minimum distances and asymptotic gains, but the convergence threshold of iterative decoding moves away from the Shannon limit. Perhaps a composite code combining the advantageous properties of each concatenation scheme will be found before long [17].

Turbo codes are the result of an empirical, painstaking construction of a global coding/decoding scheme, using existing bricks that had never been put together in this way before. This type of construction was in the air, and other schemes are sure to come with other conceptual approaches. "It is often said that one should experiment without having any preconceived ideas. That is not possible: each person carries within himself his conception of the world that he cannot easily put aside" (H. Poincaré).

In the article originally published at ICC'93, the encoder was made from the parallel concatenation of two RSC codes and an interleaver, as shown in Fig. 3. Although this encoder is well-adapted for continuous data transmission, it poses a few problems when used for block transmission because of the unknown final state after encoding the block. This is what led us to modify its structure, in such a way that the state of the encoder at the end of each block should always be known, without having to add any termination information. This new encoder does of course have properties fairly similar to those of parallel concatenation.

Before presenting this encoder, hereafter called the self-concatenated block encoder, we shall first demonstrate that parallel concatenation has some analogies with pseudo-random coding.

Let us consider a 1/3 rate encoder built from two RSC codes, C1 and C2, with the same polynomial generators G1(D) and G2(D), having values of 15 and 17 in octal notation.

(1)

where D is the delay variable and the additions are modulo 2.

Calling X(D) and Y1(D) the D-transforms of the sequences {Xk} and {Y1,k}, k being the time index, and taking into account the recursive nature of the codes used, we have:

(2)

The parity check equation that code C1 satisfies at each time k is:

(3)

The parity check equation associated with code C2 has an expression analogous to (3) except that for a given time n, the indices of data symbols are not adjacent.

(4)

where the indices i, j, l and p depend on the instant n according to the law of interleaving used.

The two parity-check equations (3) and (4) together characterize the composite 1/3 rate code. To an observer who does not know the interleaving law, it might appear that these two checks are statistically independent. In fact, however, the information symbols in (4) are the same up to permutation as those in (3), so that the composite code can be regarded as a single time-varying code with a very long constraint length depending on the permutation law.

Structure of the self-concatenated block encoder

We shall first recall some of the properties of an RSC code. The recursive part of the encoder is defined by a polynomial G1(D) of the form:

(5)

Let us assume that the encoder is initially in any state. After introducing a certain number of data symbols dk equal to zero, the encoder, behaving like a pseudo-random generator, will return to its initial state. This number of "0"s is called the encoder period and denoted L. It can be shown that L is equal to the smallest degree of any polynomial of the form 1 + Dr, that is a multiple of G1(D). For example, since 1 + D5 is the polynomial 1 + Dr with the smallest degree r that is a multiple of , the period of the recursive encoder of polynomial G1(D) is L = 5.

Let us now assume that the initial state of the encoder is (00.....0). Hence, as shown in [18], any {dk} sequence made up of a symbol "1" followed by (pL - 1) symbols "0" (p = 1,2, ...) then finally by a symbol "1"', will enable the encoder to return to its initial state (this is only true if the initial state is (00...0)). In practice, the sequence at the input of the encoder does not, of course, satisfy the previous condition. However, by judiciously associating a memory and an RSC encoder, it is possible to bring the encoder into the zero state at the end of each block.

Let us consider the block transmission of K data symbols dk, where K is a multiple of L. Each symbol dk is stored twice in a memory of size 2K, the first time at an address 1 ( n ( K and the second time at an address n + pkL where pk is an integer such that K + 1 ( n + pkL ( 2K . In this way, the sequence feeding the encoder can be regarded as the sum of K basic sequences of length 2K of the form given in Fig. 4. If the encoder is initially in the state (00....0), then it will be in this same state again at the end of each basic sequence. Since the encoder is linear and any sequence is composed of a sum of basic sequences, the final state of the encoder (the sum of the basic final states) will therefore always be (00...0).

This block encoder (Fig. 5) thus produces codewords C made up of K data bits dk and 2K redundancy bits ck, which can if necessary be punctured in order to obtain the desired coding rate.

This self-concatenated encoder obviously has many analogies with the encoder built from the parallel concatenation of two RSC codes. For these two encoders, each bit dk produces two redundancy bits, at different instants. The optimal decoding of this block code thus poses the same problems as the code originally considered, when the length of the blocks exceeds several tens of units. However, by using iterative or turbo decoding, it is possible to approach optimal decoding performance very closely.

Since a turbo decoder is built from soft-input/soft-output (SISO) elementary decoders, let us briefly recall the principle of these decoders.

Let us consider the general case of a systematic code, recursive or not, which associates a codeword C of length N to each block of K information bits dk. Each codeword is transmitted on a binary-input channel with additive white Gaussian noise, which gives an observation composed of N samples of the form:

(6)

where the ck represent the redundancy symbols and the nk are the noise samples with variance (2.

A soft-output decoder associates a decision to each bit dk, in the form of an a posteriori probability APPi(dk) or a log likelihood ratio LLR(dk) with:

(7)

(8)

By comparing either of these two quantities with a threshold fixed at 1/2 or 0 respectively, a hard decision on dk can be obtained. The reliability of this decision is a function of the amplitude of APPi(dk) or of ½LLR(dk)½.

Let us first consider the a posteriori probability, which can also be written as follows:

(9)

where cl is a specific codeword and Ck,i represents the set of codewords for which dk = i (i = 0, 1). By using Bayes' rule, relation (9) becomes:

(10)

Each codeword containing K data bits, assumed to be mutually independent, we can write:

(11)

where is the value of the symbol dn of the codeword cl, and is its a priori probability. By taking into account the fact that the components of the observation are mutually independent given the transmitted codeword, the a posteriori probability is finally equal to:

(12)

where (i = Pr{dk = i}. As expression (12) shows, the a posteriori probability consists of the product of three terms: a term which is a function of the observation rk , an a priori probability (i and a term zk, called the extrinsic probability.

Now considering LLRs, the soft output of the decoder can be written as the sum of three terms.

(13)

where the extrinsic information is now equal to the log of a quotient:

(14)

If the trellis diagram of the encoder is of reasonable complexity, which is the case for convolutional codes of moderate constraint length and for certain block codes, then calculating the a posteriori probability can be done by the BCJR algorithm. With this forward/backward type of algorithm, the a posteriori probability is evaluated from two probabilities, and , computed recursively for each instant k and for each state Sk = m of the encoder, from the observation and the a priori probabilities .

(15)

with:

(16)

In general, the a priori probabilities are not known, but they can be estimated from extrinsic information.

The BCJR (APP, MAP) algorithm being relatively complex to implement, several sub-optimal algorithms have been developed to calculate a value close to the a posteriori probability or to the LLR. Among these algorithms can be cited the "Max-Log-MAP" proposed by Robertson et al. [19]. To calculate the two probabilities and , it is necessary to evaluate the logarithm of a sum of exponentials, so the "Max-Log-MAP" algorithm makes the following approximation:

(17)

where fc is a look-up table. With this approximation, the logarithm and exponential functions in the calculation of probabilities and disappear, which greatly simplifies the software or hardware implementation without significantly degrading the performance of the decoder (see also [20]). Other algorithms generally classified in the literature as SOVAs (see part I) also enable us to obtain the LLR of a bit to a good approximation.

To present the iterative decoder, let us consider the self-concatenated encoder of rate 1/3, shown in Fig. 5. The C codewords are transmitted by the encoder on a binary-input channel with additive white Gaussian noise. This channel then delivers at each time 1 ( k ( N a sample rk given by (6).

The samples rk are first stored in pairs in a memory of size 2K, similar to that used in the encoder. Each pair is composed of two samples, one relative to the symbol dk, and the other to the corresponding redundancy (the samples relative to the symbols dk are thus stored twice, which of course is not necessary in practical implementations).

The turbo decoder uses a single SISO decoder if the transmission rate is not too high, that is, if it is possible to operate the SISO decoder at a much higher frequency than the transmission rate of the symbols. If this is not the case, each half-iteration of the turbo decoding is done by a dedicated SISO decoder (see Part I). In that case, the delay introduced by the decoding operation is proportional to the number of half-iterations.

On the first iteration, only the observation , composed of 2K pairs of samples, is available for processing, since the probabilities are unknown (they are arbitrarily preset to 1/2). The decoder first evaluates all the probabilities from the observation by backward processing, then the probabilities for 1 ( k ( K from the observation by forward processing. An initial extrinsic piece of information denoted is then determined for each bit dk. This information is used as the a priori probability of the bit dk for the second half of the block.

Then the calculations for K + 1 ( k ( 2K can continue by using the observation as well as extrinsic information . After calculating each probabilty , the decoder again supplies for each dk another piece of extrinsic information denoted .

For the following iterations, the probabilities , K + 1 ( k ( 2K, are calculated from the observation and from the extrinsic information of the previous iteration. These probabilities allow a new set of extrinsic probabilities to be determined provided that the probabilities of the previous iteration have been stored. The computation of the probabilities , 1 ( k ( K, continues by using the extrinsic probabilities of the current iteration, which simultaneously enables the extrinsic information to be updated. The process is continued until a hard decision is made on the information bits, by comparing the a posteriori probabilities or the LLRs to thresholds at 1/2 or zero, respectively.

The performance of turbo codes today approaches within about 0.35 dB the theoretical limit introduced by Shannon, for a BER of 10-5. Figure 6 gives an example of a code offering such performance [21]. Of course, this result is obtained with a large number of iterations (15 to 20, depending on the encoder used), a BCJR decoder and a sufficiently large interleaving matrix (typically K > 4000).

Should this gap of 0.35 dB be attributed to the encoder or to the iterative decoding method? Can we hope to reduce this gap or, even more important, reduce the number of decoding iterations for a given performance [21,22]? The principle of turbo decoding relies on the diversity effect of extrinsic information. In order for this effect to fully play its part, it is necessary for the noise affecting extrinsic information (which evolves during processing) to remain uncorrelated with the noise of the channel. This is obviously not possible since the graph associated with a composite code of finite size has finite cycles [23,24]. This is a possible (partial) explanation for suboptimal performance.

Iterative processing is commonly used to find solutions to complex problems in many scientific domains: electronics, electromagnetism, mechanics, etc. This type of processing is much less widespread in digital communications. Turbo codes illustrate well the advantage to be gained from iterative processing using all the data available. The turbo principle can be exploited at various levels of the communication chain to improve its performance [25-27].

It would be most welcome if the purely mathematical aspects of the problem could now be studied, as has been done for many iterative methods. The points that deserve to be more fully explained include convergence properties, reducing the number of iterations and combatting correlation effects.

This article was written in French, translated by Janet Ormrod, and then edited by Dave Forney. We give our warm thanks to Janet and Dave for their valued collaboration. We also thank all those who, from the early days, showed interest in our work and we remain indebted to CCETT for their constant financial support.

[1] G. Battail, "Pondération des symboles décodés par l'algorithme de Viterbi" (in French), Ann. Télécommun., Fr., 42, N° 1-2, pp. 31-38, Jan. 1987.

[2] J. Hagenauer and P. Hoeher, "A Viterbi algorithm with soft-decision outputs and its applications", Proc. of Globecom '89, Dallas, Texas, pp. 47.11-47.17, Nov. 1989.

[3] A. J. Viterbi, "Convolutional codes and their performance in communication systems", IEEE Trans. Com. Technology, vol. COM-19, N° 15, pp. 751-772, Oct. 1971.

[4] G. D. Forney, "The Viterbi algorithm", Proc. IEEE, vol. 61, N° 3, pp. 268-278, Mar. 1973.

[5] C. Berrou, P. Adde, E. Angui and S. Faudeil, "A low complexity soft-output Viterbi decoder architecture", Proc. of ICC '93, Geneva, pp. 737-740, May 1993.

[6] J. Hagenauer and P. Hoeher, "Concatenated Viterbi-Decoding", Proc. Int. Workshop on Inf. Theory, Gotland, Sweden, Aug-Sept. 1989.

[7] G. D. Forney, Concatenated Codes, Cambridge, MIT Press, 1966.

[8] P. Thitimajshima, "Les codes convolutifs récursifs systématiques et leur application à la concaténation parallèle" (in French), thèse N° 284, Université de Bretagne Occidentale, Brest, France, Déc. 1993.

[9] R. G. Gallager, Low-Density Parity-Check Codes, p. 45, Cambridge, MIT Press, 1963.

[10] J. Lodge, R. Young, P. Hoeher and J. Hagenauer, "Separable MAP 'filters' for the decoding of product and concatenated codes", Proc. of ICC '93, Geneva, pp. 1740-1745, May 1993.

[11] J. Hagenauer, "Decoding of binary codes with analog networks", Proc. of ITW '98, San Diego, pp. 13-14, Feb. 1998.

[12] H.-A. Loeliger, F. Lustenberger, M. Helfenstein and F. Tarköy, "Probability propagation in analog VLSI", preprint, Mar. 1998.

[13] S. Shamai (Shitz) and S. Verdú, "Capacity of channels with uncoded side information", European Trans. Telecommun., vol. 6, N° 5, pp. 587-600, Sept.-Oct. 1995.

[14] M. Jézéquel, C. Berrou, C. Douillard and P. Pénard, "Characteristics of a sixteen-state turbo-encoder/decoder", Proc. of Int. Symposium on Turbo Codes, Brest, France, pp. 280-283, Sept. 1997.

[15] L.R. Bahl, J. Cocke, F. Jelinek and J. Raviv, : "Optimal decoding of linear codes for minimizing symbol error rate", IEEE Trans. Inform. Theory, IT-20, pp. 248-287, Mar. 1974.

[16] P. Robertson, "Illuminating the structure of parallel concatenated recursive systematic (turbo) codes", Proc. of Globecom '94, San Francisco, pp. 1298-1303, Nov. 1994.

[17] D. Divsalar and F. Pollara, "Serial and hybrid concatenated codes with applications", Proc. of Int. Symposium on Turbo Codes, Brest, France, pp. 80-87, Sept. 1997.

[18] C. Berrou and M. Jézéquel, "Frame-oriented convolutional turbo codes", Elect. Letters, Vol. 32, N° 15, pp. 1362-1364, July 1996.

[19] P. Robertson, P. Hoeher and E. Villebrun, "Optimal and suboptimal maximum a posteriori algorithms suitable for turbo decoding", European Trans. Telecommun., vol. 8, pp. 119-125, Mar-Apr. 1997.

[20] S. S Pietrobon and A. S. Barbulescu, "A simplification of the modified Bahl decoding algorithm for systematic convolutional codes", in Proc. of ISITA '94, pp. 1073-1077, Sydney, Australia, Nov. 1994.

[21] C. Berrou, "Some clinical aspects of turbo codes", Proc. of Int. Symposium on Turbo Codes, Brest, France, pp. 26-31, Sept. 1997.

[22] S. Benedetto and G. Montorsi, "Generalized concatenated codes with interleavers", Proc. of Int. Symposium on Turbo Codes, Brest, France, pp. 32-39, Sept. 1997.

[23] J. Boutros and O. Pothier, "Convergence analysis of turbo decoding", Canadian Workshop on Inf. Theory, Toronto, pp. 25-28, June 1997.

[24] N. Wiberg, "On the performance of the iterative turbo decoding algorithm", Proc. of Int. Symposium on Turbo Codes, Brest, France, pp. 223-226, Sept. 1997.

[25] R. Pyndiah, A. Glavieux, A. Picart and S. Jacq, "Near optimum decoding of product codes", in Proc. of Globecom '94, San Francisco, pp.339-343, Nov.-Dec. 1994.

[26] C. Douillard, A. Picard, P. Didier, M. Jézéquel, C. Berrou and A. Glavieux, "Iterative correction of intersymbol interference: turbo equalization", European Trans. Telecommun., Vol. 6, N° 5, pp. 507-511, Sept.-Oct. 1995.

[27] A. Glavieux, C. Laot and J. Labat, "Turbo equalization over a frequency selective channel", Proc. of Int. Symposium on Turbo Codes, Brest, France, pp. 96-102, Sept. 1997.

Many other references can be found at web addresses http://www.ee.vt.edu/ee/valenti/references.html, thanks to Matt Valenti and http://www.ee.virginia.edu/research/CCSP/turbo_codes/tcodes-bib/tcodes.bib thanks to Eric K. Hall.

1 CNET: Centre National d'Etudes de Télécommunications, CCETT: Centre Commun d'Etudes de Télédiffusion et Télécommunications.

2 We wish Gérard an excellent retirement and much success in his chosen hobby: genetic coding. Read in particular: G. Battail, "Does information theory explain biological evolution ?", Europhysics Letters, 40 (3), pp. 343-348, Nov. 1997.

3 Perhaps analog electronics will remove this handicap one day [11,12].

1