Endorse the petition to honor Claude Elwood Shannon with a United States Postal Service stamp on the 100th anniversary of his birth.
Larry Wasserman‘s recent post about misinterpretation of p-values is a good reminder about a fundamental distinction anyone working in information theory, control or machine learning should be aware of — namely, the distinction between stochastic kernels and conditional probability distributions.
Welcome to the Princeton-Stanford Information Theory b-log! All researchers working on information theory are invited to participate by posting items to the blog. Both original material and pointers to the web are welcome.
Igal Sason and I have just posted to arXiv our tutorial paper “Concentration of Measure Inequalities in Information Theory, Communications and Coding”, which was submitted to Foundations and Trends in Communications and Information Theory. Here is the abstract:
This tutorial article is focused on some of the key modern mathematical tools that are used for the derivation of concentration inequalities, on their links to information theory, and on their various applications to communications and coding.
The first part of this article introduces some classical concentration inequalities for martingales, and it also derives some recent refinements of these inequalities. The power and versatility of the martingale approach is exemplified in the context of binary hypothesis testing, codes defined on graphs and iterative decoding algorithms, and some other aspects that are related to wireless communications and coding.
The second part of this article introduces the entropy method for deriving concentration inequalities for functions of many independent random variables, and it also exhibits its multiple connections to information theory. The basic ingredients of the entropy method are discussed first in conjunction with the closely related topic of logarithmic Sobolev inequalities. This discussion is complemented by a related viewpoint based on probability in metric spaces. This viewpoint centers around the so-called transportation-cost inequalities, whose roots are in information theory. Some representative results on concentration for dependent random variables are briefly summarized, with emphasis on their connections to the entropy method.
Finally, the tutorial addresses several applications of the entropy method and related information-theoretic tools to problems in communications and coding. These include strong converses for several source and channel coding problems, empirical distributions of good channel codes with non-vanishing error probability, and an information-theoretic converse for concentration of measure.
There are already many excellent sources on concentration of measure; what makes ours different is the emphasis on information-theoretic aspects, both in the general theory and in applications. Comments, suggestions, thoughts are very welcome.
As we have seen in Part I, the rational inattention framework of Christopher Sims aims to capture the best a rational agent can do when his capacity for processing information is limited. This rationally inattentive agent, however, has no reason to question his statistical model. In this post we will examine the robustness framework of Thomas Sargent, which deals with the issue of model uncertainty, but does not assume any capacity limitations.
Economic activity involves making decisions. In order to make decisions, agents need information. Thus, the problem of acquisition, transmission, and uses of information has been occupying the economists’ attention for some time now (there is even a whole subfield of “information economics”). It is not surprising, therefore, that information theory, the brainchild of Claude Shannon, would eventually make its way into economics. In this post and the one to follow, I will briefly describe two specific strands of information-theoretic work in economics: the rational inattention framework of Christopher Sims and the robustness ideas of Thomas Sargent. (As an interesting aside: Sims and Sargent have shared the 2011 Nobel Memorial Prize in Economics, although not directly for their information-theoretic work, but rather for their work related to causality.)
Tom’s passing is a great loss. His work has left an indelible mark not only on information theory, but also on machine learning (for instance, his result with Peter Hart on the probability of error of the nearest-neighbor rule being at most twice the Bayes rate is one of the cornerstones of the theory of pattern recognition), statistics, probability, finance, etc. (here is Sergio’s wonderful presentation of Tom’s work across all these fields). His famous textbook, Elements of Information Theory, written with Joy Thomas, was the first one to feature not only the standard topics, such as source and channel coding, but also the more advanced material, such as multiterminal information theory, Kolmogorov complexity, connections between information theory and statistics, connections between information theory and gambling, or universal source coding, in a characteristically lucid manner that made them accessible to beginners. Everything Tom did was touched by elegance, simplicity, and grace. He will be missed, but never forgotten.
Sasha Rakhlin and I will be presenting our paper “Lower bounds for passive and active learning” at this year’s NIPS, which will be taking place in Granada, Spain from December 12 to December 15. The proofs of our main results rely heavily on information-theoretic techniques, specifically the data processing inequality for -divergences and a certain type of constant-weight binary codes.