I have been on the road for the past few days. First I went to Washington DC to visit University of Maryland at College Park and to present my work on empirical processes and typical sequences at their Information and Coding Theory Seminar. A scientist’s dream — two hours in front of a blackboard, no slides!
And now I find myself amid the luscious cornfields of Central Illinois. That’s right, until Friday I’m in Urbana-Champaign for the annual Allerton conference. This year, Todd Coleman (UIUC), Giacomo Como (MIT), and I have co-organized a session on Information Divergence and Stochastic Dynamical Systems, which promises to be quite interesting — it will feature invited talks on Bayesian inference and evolutionary dynamics, reinforcement learning, optimal experimentation, opinion dynamics in social networks, signaling in decentralized control, and optimization of observation channels in control problems. If you happen to be attending Allerton this year, come on by!
As I was thinking more about Massey’s paper on directed information and about the work of Touchette and Lloyd on the information-theoretic study of control systems (which we had started looking at during the last meeting of our reading group), I realized that directed stochastic kernels that feature so prominently in the general definition of directed information are known in the machine learning and AI communities under another name, due to Judea Pearl — interventional distributions.
These are some notes on the first paper for our Information Theory reading group, but they may be of use to anyone with an interest in the way information theorists deal with feedback.
Warning: lots of equations and disjointed muttering about stochastic kernels and conditional independence below the fold!
In the past few days I have come across two short but thought-provoking papers that seem to voice some of the same concerns, yet arrive at very different conclusions.
JW, Probability in control? (to appear in European Journal of Control)
Probability is one of the success stories of applied mathematics. It is universally used, from statistical physics to quantum mechanics, from econometrics to financial math- ematics, from information theory to control, from psychology and social sciences to medicine. Unfortunately, in many applications of probability, very little attention is paid to the modeling aspect. That is, the interpretation of the probability used in the model is seldom discussed, and it is rarely explained how one comes to the numeri- cal values of the distributions of the random variables used in the model. The aim of this communication is to put forward some remarks related to the use of probability in Systems and Control.
William Feller has a Note on Bayes’ rule in his classic probability book in which he expresses doubts about the Bayesian approach to statistics and decries it as a method of the past. We analyze in this note the motivations for Feller’s attitude, without aiming at a complete historical coverage of the reasons for this dismissal.
What are we to make of this?
In the spirit of shameless self-promotion, I would like to announce a new preprint (preliminary version was presented in July at ISIT 2010 in Austin, TX):
Maxim Raginsky, “Empirical processes, typical sequences and coordinated actions in standard Borel spaces”, arXiv:1009.0282, submitted to IEEE Transactions on Information Theory
Abstract: This paper proposes a new notion of typical sequences on a wide class of abstract alphabets (so-called standard Borel spaces), which is based on approximations of memoryless sources by empirical distributions uniformly over a class of measurable “test functions.” In the finite-alphabet case, we can take all uniformly bounded functions and recover the usual notion of strong typicality (or typicality under the total variation distance). For a general alphabet, however, this function class turns out to be too large, and must be restricted. With this in mind, we define typicality with respect to any Glivenko–Cantelli function class (i.e., a function class that admits a Uniform Law of Large Numbers) and demonstrate its power by giving simple derivations of the fundamental limits on the achievable rates in several source coding scenarios, in which the relevant operational criteria pertain to reproducing empirical averages of a general-alphabet stationary memoryless source with respect to a suitable function class.