# The Information Structuralist

## Concentrate, concentrate!

Posted in Information Theory, Mathematics, Narcissism, Papers and Preprints, Probability by mraginsky on December 19, 2012

Igal Sason and I have just posted to arXiv our tutorial paper “Concentration of Measure Inequalities in Information Theory, Communications and Coding”, which was submitted to Foundations and Trends in Communications and Information Theory. Here is the abstract:

This tutorial article is focused on some of the key modern mathematical tools that are used for the derivation of concentration inequalities, on their links to information theory, and on their various applications to communications and coding.

The first part of this article introduces some classical concentration inequalities for martingales, and it also derives some recent refinements of these inequalities. The power and versatility of the martingale approach is exemplified in the context of binary hypothesis testing, codes defined on graphs and iterative decoding algorithms, and some other aspects that are related to wireless communications and coding.

The second part of this article introduces the entropy method for deriving concentration inequalities for functions of many independent random variables, and it also exhibits its multiple connections to information theory. The basic ingredients of the entropy method are discussed first in conjunction with the closely related topic of logarithmic Sobolev inequalities. This discussion is complemented by a related viewpoint based on probability in metric spaces. This viewpoint centers around the so-called transportation-cost inequalities, whose roots are in information theory. Some representative results on concentration for dependent random variables are briefly summarized, with emphasis on their connections to the entropy method.

Finally, the tutorial addresses several applications of the entropy method and related information-theoretic tools to problems in communications and coding. These include strong converses for several source and channel coding problems, empirical distributions of good channel codes with non-vanishing error probability, and an information-theoretic converse for concentration of measure.

There are already many excellent sources on concentration of measure; what makes ours different is the emphasis on information-theoretic aspects, both in the general theory and in applications. Comments, suggestions, thoughts are very welcome.

## Lower bounds for passive and active learning

Sasha Rakhlin and I will be presenting our paper “Lower bounds for passive and active learning” at this year’s NIPS, which will be taking place in Granada, Spain from December 12 to December 15. The proofs of our main results rely heavily on information-theoretic techniques, specifically the data processing inequality for ${f}$-divergences and a certain type of constant-weight binary codes.

Just a couple of short items, while I catch my breath.

1. First of all, starting January 1, 2012 I will find myself amidst the lovely cornfields of Central Illinois, where I will be an assistant professor in the Department of Electrical and Computer Engineering at UIUC. This will be a homecoming of sorts, since I have spent three years there as a Beckman Fellow. My new home will be in the Coordinated Science Laboratory, where I will continue doing (and blogging about) the same things I do (and blog about).

2. Speaking of Central Illinois, last week I was at the Allerton Conference, where I had tried my best to preach Uncle Judea‘s gospel to anyone willing to listen information theorists and their fellow travelers. The paper, entitled “Directed information and Pearl’s causal calculus,” is now up on arxiv, and here is the abstract:

Probabilistic graphical models are a fundamental tool in statistics, machine learning, signal processing, and control. When such a model is defined on a directed acyclic graph (DAG), one can assign a partial ordering to the events occurring in the corresponding stochastic system. Based on the work of Judea Pearl and others, these DAG-based “causal factorizations” of joint probability measures have been used for characterization and inference of functional dependencies (causal links). This mostly expository paper focuses on several connections between Pearl’s formalism (and in particular his notion of “intervention”) and information-theoretic notions of causality and feedback (such as causal conditioning, directed stochastic kernels, and directed information). As an application, we show how conditional directed information can be used to develop an information-theoretic version of Pearl’s “back-door” criterion for identifiability of causal effects from passive observations. This suggests that the back-door criterion can be thought of as a causal analog of statistical sufficiency.

If you had seen my posts on stochastic kernels, directed information, and causal interventions, you will, more or less, know what to expect.

Incidentally, due to my forthcoming move to UIUC, this will be my last Allerton paper!

## Autumn travels, the Canadian edition

Posted in Narcissism by mraginsky on October 27, 2010

I am posting this while using free wi-fi on board a VIA Rail train (hooray for socialism!). Yesterday I gave a talk at McGill (The Harvard of Canada!) on the fundamental limitations of adaptive dynamical systems, and now I am on my way to do the same at Queen’s University. Regular posts will resume in November.

Posted in Control, Feedback, Information Theory, Narcissism, Optimization, Papers and Preprints by mraginsky on October 13, 2010

It’s time to fire up the Shameless Self-Promotion Engine again, for I am about to announce a preprint and a paper to be published. Both deal with more or less the same problem — i.e., fundamental limits of certain sequential procedures — and both rely on the same set of techniques: metric entropy, Fano’s inequality, and bounds on the mutual information through divergence with auxiliary probability measures.

So, without further ado, I give you: (more…)

## Autumn travels

I have been on the road for the past few days. First I went to Washington DC to visit University of Maryland at College Park and to present my work on empirical processes and typical sequences at their Information and Coding Theory Seminar. A scientist’s dream — two hours in front of a blackboard, no slides!

And now I find myself amid the luscious cornfields of Central Illinois. That’s right, until Friday I’m in Urbana-Champaign for the annual Allerton conference. This year, Todd Coleman (UIUC), Giacomo Como (MIT), and I have co-organized a session on Information Divergence and Stochastic Dynamical Systems, which promises to be quite interesting — it will feature invited talks on Bayesian inference and evolutionary dynamics, reinforcement learning, optimal experimentation, opinion dynamics in social networks, signaling in decentralized control, and optimization of observation channels in control problems. If you happen to be attending Allerton this year, come on by!

## Typical, just typical

In the spirit of shameless self-promotion, I would like to announce a new preprint (preliminary version was presented in July at ISIT 2010 in Austin, TX):

Maxim Raginsky, “Empirical processes, typical sequences and coordinated actions in standard Borel spaces”, arXiv:1009.0282, submitted to IEEE Transactions on Information Theory

Abstract: This paper proposes a new notion of typical sequences on a wide class of abstract alphabets (so-called standard Borel spaces), which is based on approximations of memoryless sources by empirical distributions uniformly over a class of measurable “test functions.” In the finite-alphabet case, we can take all uniformly bounded functions and recover the usual notion of strong typicality (or typicality under the total variation distance). For a general alphabet, however, this function class turns out to be too large, and must be restricted. With this in mind, we define typicality with respect to any Glivenko–Cantelli function class (i.e., a function class that admits a Uniform Law of Large Numbers) and demonstrate its power by giving simple derivations of the fundamental limits on the achievable rates in several source coding scenarios, in which the relevant operational criteria pertain to reproducing empirical averages of a general-alphabet stationary memoryless source with respect to a suitable function class.

## Enter the Information Structuralist, hilarity ensues

Posted in Narcissism by mraginsky on August 17, 2010

Everyone has a blog, or so I’ve been told. So here I am, with a blog of my own. Ostensibly, this is an “academic” endeavor, so I can pontificate share my thoughts and musings on subjects related to my research field — information theory, particularly as it pertains to decision-making, statistical learning and inference, control, and adaptation.

Why did I decide to call my blog The Information Structuralist? Partly, I was riffing on Dave Bacon’s The Quantum Pontiff and on Kush and Lav Varshney’s Information Ashvins, as well as on the structuralist school. But, more importantly, this name gets straight to the core of what this blog is all about.

The term “information structure” is used in game theory and, through the work of Hans Witsenhausen, in decentralized control to describe the state of knowledge in a multiagent system: an information structure specifies who knows what, when they know it, and what they may expect to know in the future. Kenneth Arrow in The Limits of Organization describes it thus:

By information structure … I mean not merely the state of knowledge existing at any moment of time but the possibility of acquiring relevant information in the future. We speak of the latter in communication terminology as the possession of an information channel, and the information to be received as signals from the rest of the world.

The upshot is that those aspects of information that were intentionally dismissed by Shannon — namely, its timeliness, structure, meaning, and value — must necessarily be brought to the foreground once we begin talking about ways information is (or will be) put to use. This goes sharply against the conventional wisdom that information is measured in the “universal currency” of bits. Once we summon forth all those demons Shannon had tried to banish, we must look beyond the bit and dabble in such unspeakable and eldritch things as sigma-algebras (and partial orders thereof), statistical experiments, information utility, quality of decisions, risk, and even learnability and combinatorial complexity. Fortunately, even when we thus expand our field of inquiry, we do not have to give up the time-tested tools Uncle Claude gave us — we merely need to learn to use them in new, heretofore unimagined, ways.

And that is the raison d’etre of my blog.