A unified converse for noiseless and noisy channel codes
While reading the book Search Problems by Rudolf Ahlswede and Ingo Wegener (sadly, both deceased), I have come across a neat result by Ahlswede and Péter Gács, which can be thought of as a unified information-theoretic converse for noiseless and noisy channel codes. In particular, it contains as special cases the well-known Kraft’s inequality for prefix codes and the strong converse for discrete memoryless channels (DMC’s). In fact, Ahlswede and Gács establish achievability as well using a Feinstein-type maximal code construction, but I will focus on the converse.
Now, as shown recently by Yury Polyanskiy in his doctoral dissertation (see also the paper he wrote with Vince Poor and Sergio Verdú), most existing converse results for channel codes are consequences of a deceptively simple “meta-converse” involving the performance of binary hypothesis tests in the Neyman–Pearson framework. In this post, I will show that this is also the case for the Ahlswede–Gács converse.
Definitions
Let and
be two finite alphabets.
Definition 1 Let
be an
-tuple of positive integers. An
-prefix code is a collection
, where:
- For each
,
and
.
- No element of the set
is a prefix of any other element.
Two special cases of importance:
- If
and, for every
,
, then we get the standard definition of a prefix code.
- If
, then the sets
must be disjoint (by the prefix property), and we get a length-
block code with
codewords
and the corresponding decoding sets
. In case
is nonempty, we can think of it as a rejection region.
Now consider a DMC with transition matrix . We will abuse notation and write
for each and all
-tuples
,
.
Definition 2 Given
, we say that
is an
-prefix code for
if it is an
-prefix code and
For noiseless codes, we can just take and consider the one-to-one mapping
,
, where
is the length of the
th codeword; for channel block codes with
, we recover the usual maximal probability of error criterion. It is also possible to adapt this framework to channel coding with noiseless feedback.
The Ahlswede–Gács converse and some consequences
Now, the theorem:
Theorem 3 If
is an
-prefix code for the DMC
, then for any
we have
where
is the capacity of
and
is nonnegative and equal to zero when
is noiseless.
As claimed, Theorem 3 is a unified converse for noiseless and noisy channel codes. Indeed, if ,
is noiseless, and each
, then
,
, and
In the limit , we obtain the usual Kraft’s inequality. On the other hand, if
, then Eq. (1) says that any block code of length
with
codewords and maximal probability of error
must satisfy
Taking logs of both sides and dividing by the block length, we get
which is, essentially, the strong converse of Wolfowitz.
The proof using Neyman–Pearson functions
Consider two probability distributions and
on a finite alphabet
. A (randomized) hypothesis test is a conditional probability distribution
, where
, and we say that the test decides in favor of
if
. For each
, define the Neyman–Pearson function
over all tests that correctly choose
with probability at least
. An important fact that we will need later is the following: for any
,
Lemma 4 Let
and
be two transition matrices with input alphabet
and output alphabet
. Fix an
. Then for any
,
, and
such that
,
Proof: Consider a deterministic test between and
that decides in favor of
only when
. Then (4) is immediate from the definition (2).
Now consider and, for any probability distribution
over
, define the mutual information
Then the capacity is given by . Let
be the (unique) capacity-achieving input distribution and let
be the corresponding output distribution, i.e.,
We will need the following facts:
Here, (5) follows from convexity properties of the mutual information, while (6) and (7) follow from the fact that, for a fixed ,
is a sum of
independent random variables
,
, whose expectations (resp., variances) are bounded by
(resp.,
).
Next, consider the DMC with transition probabilities for all
. In other words,
ignores the input and just generates a random output letter drawn from
. If
is an
-prefix code for
, then we have
Hence, we can apply Lemma 4 and then Eq. (3) separately for each to obtain
for an arbitrary choice of . In particular, if we fix some
and choose
then from Chebyshev’s inequality together with Eqs. (6) and (7) we have
Using this in our bound and summing over all , we get
It remains to show that the sum on the left-hand side is bounded from above by . This is where the prefix property comes in. Let
and define the sets
Owing to the prefix property, these sets must be disjoint. Indeed, if there exist some and
such that
is nonempty, then (assuming, say,
) there must be an
-tuple in
which is a prefix of some
-tuple in
, a contradiction. Moreover, for every
so that
Thus, we obtain (1).
Remark 1 The change-of-measure trick with the optimal output distribution
is also used by Ahlswede and Gács, who in turn give credit to Kemperman. As recently discussed by Anand, this same trick was also used by Ahlswede in his proof of the strong converse for the MAC. In fact, now that I think of it, the Packing Lemma that appears in that work can also be proved using the Neyman–Pearson machinery.

leave a comment