A unified converse for noiseless and noisy channel codes
While reading the book Search Problems by Rudolf Ahlswede and Ingo Wegener (sadly, both deceased), I have come across a neat result by Ahlswede and Péter Gács, which can be thought of as a unified information-theoretic converse for noiseless and noisy channel codes. In particular, it contains as special cases the well-known Kraft’s inequality for prefix codes and the strong converse for discrete memoryless channels (DMC’s). In fact, Ahlswede and Gács establish achievability as well using a Feinstein-type maximal code construction, but I will focus on the converse.
Now, as shown recently by Yury Polyanskiy in his doctoral dissertation (see also the paper he wrote with Vince Poor and Sergio Verdú), most existing converse results for channel codes are consequences of a deceptively simple “meta-converse” involving the performance of binary hypothesis tests in the Neyman–Pearson framework. In this post, I will show that this is also the case for the Ahlswede–Gács converse.
Let and be two finite alphabets.
Definition 1 Let be an -tuple of positive integers. An -prefix code is a collection , where:
- For each , and .
- No element of the set is a prefix of any other element.
Two special cases of importance:
- If and, for every , , then we get the standard definition of a prefix code.
- If , then the sets must be disjoint (by the prefix property), and we get a length- block code with codewords and the corresponding decoding sets . In case is nonempty, we can think of it as a rejection region.
Now consider a DMC with transition matrix . We will abuse notation and write
for each and all -tuples , .
Definition 2 Given , we say that is an -prefix code for if it is an -prefix code and
For noiseless codes, we can just take and consider the one-to-one mapping , , where is the length of the th codeword; for channel block codes with , we recover the usual maximal probability of error criterion. It is also possible to adapt this framework to channel coding with noiseless feedback.
The Ahlswede–Gács converse and some consequences
Now, the theorem:
where is the capacity of and is nonnegative and equal to zero when is noiseless.
As claimed, Theorem 3 is a unified converse for noiseless and noisy channel codes. Indeed, if , is noiseless, and each , then , , and
In the limit , we obtain the usual Kraft’s inequality. On the other hand, if , then Eq. (1) says that any block code of length with codewords and maximal probability of error must satisfy
Taking logs of both sides and dividing by the block length, we get
which is, essentially, the strong converse of Wolfowitz.
The proof using Neyman–Pearson functions
Consider two probability distributions and on a finite alphabet . A (randomized) hypothesis test is a conditional probability distribution , where , and we say that the test decides in favor of if . For each , define the Neyman–Pearson function
i.e., the minimum probability of erroneously deciding in favor of over all tests that correctly choose with probability at least . An important fact that we will need later is the following: for any ,
The next lemma is very much in the spirit of Yury’s “meta-converse:”
Now consider and, for any probability distribution over , define the mutual information
Then the capacity is given by . Let be the (unique) capacity-achieving input distribution and let be the corresponding output distribution, i.e.,
We will need the following facts:
Here, (5) follows from convexity properties of the mutual information, while (6) and (7) follow from the fact that, for a fixed , is a sum of independent random variables , , whose expectations (resp., variances) are bounded by (resp., ).
Next, consider the DMC with transition probabilities for all . In other words, ignores the input and just generates a random output letter drawn from . If is an -prefix code for , then we have
for an arbitrary choice of . In particular, if we fix some and choose
Using this in our bound and summing over all , we get
It remains to show that the sum on the left-hand side is bounded from above by . This is where the prefix property comes in. Let and define the sets
Owing to the prefix property, these sets must be disjoint. Indeed, if there exist some and such that is nonempty, then (assuming, say, ) there must be an -tuple in which is a prefix of some -tuple in , a contradiction. Moreover, for every
Thus, we obtain (1).
Remark 1 The change-of-measure trick with the optimal output distribution is also used by Ahlswede and Gács, who in turn give credit to Kemperman. As recently discussed by Anand, this same trick was also used by Ahlswede in his proof of the strong converse for the MAC. In fact, now that I think of it, the Packing Lemma that appears in that work can also be proved using the Neyman–Pearson machinery.