While reading the book *Search Problems* by Rudolf Ahlswede and Ingo Wegener (sadly, both deceased), I have come across a neat result by Ahlswede and Péter Gács, which can be thought of as a unified information-theoretic converse for noiseless and noisy channel codes. In particular, it contains as special cases the well-known Kraft’s inequality for prefix codes and the strong converse for discrete memoryless channels (DMC’s). In fact, Ahlswede and Gács establish achievability as well using a Feinstein-type maximal code construction, but I will focus on the converse.

Now, as shown recently by Yury Polyanskiy in his doctoral dissertation (see also the paper he wrote with Vince Poor and Sergio Verdú), most existing converse results for channel codes are consequences of a deceptively simple “meta-converse” involving the performance of binary hypothesis tests in the Neyman–Pearson framework. In this post, I will show that this is also the case for the Ahlswede–Gács converse.

** Definitions **

Let and be two finite alphabets.

Definition 1Let be an -tuple of positive integers. An-prefix codeis a collection , where:

- For each , and .
- No element of the set is a prefix of any other element.

Two special cases of importance:

- If and, for every , , then we get the standard definition of a prefix code.
- If , then the sets must be disjoint (by the prefix property), and we get a length- block code with codewords and the corresponding decoding sets . In case is nonempty, we can think of it as a rejection region.

Now consider a DMC with transition matrix . We will abuse notation and write

for each and all -tuples , .

Definition 2Given , we say that is an -prefix code for if it is an -prefix code and

For noiseless codes, we can just take and consider the one-to-one mapping , , where is the length of the th codeword; for channel block codes with , we recover the usual maximal probability of error criterion. It is also possible to adapt this framework to channel coding with noiseless feedback.

** The Ahlswede–Gács converse and some consequences **

Now, the theorem:

Theorem 3If is an -prefix code for the DMC , then for any we have

where is the capacity of and is nonnegative and equal to zero when is noiseless.

As claimed, Theorem 3 is a unified converse for noiseless and noisy channel codes. Indeed, if , is noiseless, and each , then , , and

In the limit , we obtain the usual Kraft’s inequality. On the other hand, if , then Eq. (1) says that any block code of length with codewords and maximal probability of error must satisfy

Taking logs of both sides and dividing by the block length, we get

which is, essentially, the strong converse of Wolfowitz.

** The proof using Neyman–Pearson functions **

Consider two probability distributions and on a finite alphabet . A (randomized) *hypothesis test* is a conditional probability distribution , where , and we say that the test decides in favor of if . For each , define the *Neyman–Pearson function*

i.e., the minimum probability of erroneously deciding in favor of over all tests that correctly choose with probability at least . An important fact that we will need later is the following: for any ,

The next lemma is very much in the spirit of Yury’s “meta-converse:”

Lemma 4Let and be two transition matrices with input alphabet and output alphabet . Fix an . Then for any , , and such that ,

*Proof:* Consider a deterministic test between and that decides in favor of only when . Then (4) is immediate from the definition (2).

Now consider and, for any probability distribution over , define the mutual information

Then the capacity is given by . Let be the (unique) capacity-achieving input distribution and let be the corresponding output distribution, i.e.,

We will need the following facts:

Here, (5) follows from convexity properties of the mutual information, while (6) and (7) follow from the fact that, for a fixed , is a sum of independent random variables , , whose expectations (resp., variances) are bounded by (resp., ).

Next, consider the DMC with transition probabilities for all . In other words, ignores the input and just generates a random output letter drawn from . If is an -prefix code for , then we have

Hence, we can apply Lemma 4 and then Eq. (3) separately for each to obtain

for an arbitrary choice of . In particular, if we fix some and choose

then from Chebyshev’s inequality together with Eqs. (6) and (7) we have

Using this in our bound and summing over all , we get

It remains to show that the sum on the left-hand side is bounded from above by . This is where the prefix property comes in. Let and define the sets

Owing to the prefix property, these sets must be disjoint. Indeed, if there exist some and such that is nonempty, then (assuming, say, ) there must be an -tuple in which is a prefix of some -tuple in , a contradiction. Moreover, for every

so that

Thus, we obtain (1).

Remark 1The change-of-measure trick with the optimal output distribution is also used by Ahlswede and Gács, who in turn give credit to Kemperman. As recently discussed by Anand, this same trick was also used by Ahlswede in his proof of the strong converse for the MAC. In fact, now that I think of it, the Packing Lemma that appears in that work can also be proved using the Neyman–Pearson machinery.