To shake things up a bit, I have started an information theory reading group in my department at Duke. We will meet every Monday to read and discuss papers on information theory, both recent and the classics, particularly those pertaining to estimation and inference, learning, decision-making, and control.
We’ve had our first meeting yesterday, where it was decided that the theme for the fall semester of 2010 will be Control. We have also settled on the following papers (all available for download from the reading group website):
- J. Massey, “Causality, feedback and directed information”
- H. Touchette and S. Lloyd, “Information-theoretic approach to the study of control systems”
- J. Walrand and P. Varaiya, “Optimal causal coding-decoding problems”
- N. Elia, “When Bode meets Shannon: control-oriented feedback communication schemes”
- N. Martins and M. Dahleh, “Feedback control in the presence of noisy channels: ‘Bode-like’ fundamental limitations of performance”
- V. Anantharam and V. Borkar, “An information-theoretic view of stochastic resonance”
Most likely, as the meetings continue, I will be posting various blurbs, notes and musings on these papers and any related matters. We may also shuffle things around a bit — for example, spend some time discussing the basics of stochastic control for the benefit of those not familiar with it.
This is an expository post on a particularly nice way of thinking about stochastic systems in information theory, control, statistical learning and inference, experimental design, etc. I will closely follow the exposition in Chapter 2 of Sekhar Tatikonda‘s doctoral thesis, which in turn builds on ideas articulated by Hans Witsenhausen and Roland Dobrushin.
A general stochastic system consists of smaller interconnected subsystems that influence one another’s behavior. For example, in a communications setting we have sources, encoders, channels and decoders; in control we have plants, sensors, and actuators. The formalism I am about to describe will allow us to treat all these different components on an equal footing. (more…)
A great example of academic snark in the abstract of Ronald Howard‘s classic paper “Information value theory” (IEEE Transactions on Systems Science and Cybernetics, vol. SSC-2, no. 1, pp. 22-26, 1966):
The information theory developed by Shannon was designed to place a quantitative measure on the amount of information involved in any communication. The early developers stressed that the information measure was dependent only on the probabilistic structure of the communication process. For example, if losing all your asset in the stock market and having whale steak for supper have the same probability, then the information associated with the occurrence of either event is the same. Attempts to apply Shannon’s information theory to problems beyond communications have, in the large, come to grief. The failure of these attempts could have been predicted because no theory that involves just the probabilities of outcomes without considering their consequences could possibly be adequate in describing the importance of uncertainty to a decision maker.
Incidentally, to this day there is no completely convincing approach to quantifying the value of information in a decision-making scenario. The obvious idea — namely, looking at the reduction in the expected loss due to acquiring additional information — goes only so far, since more or less the only property this putative measure has is the monotonicity with respect to increasing knowledge. Now compare and contrast this with the elaborate additivity properties of the Shannon-theoretic quantities! On the other hand, it seems that the work of Grünwald and Dawid that I had mentioned in a previous post may be a glimmer of hope. We will have to see.
Three cheers for open access!
While searching for a paper on the Rényi entropy, I stumbled across Kybernetika: International journal published by Institute of Information Theory and Automation. Since 1965, this journal has been publishing articles on information theory, statistical decisions, optimal control, finite automata, neural nets, mathematical economics, optimization, adaptive behavior, and other subjects that were, during the heyday of cybernetics, viewed as but individual aspects of a soon-to-be-born grand unifying science of natural and artificial adaptive systems. Even though the cyberneticians’ dream never came true (as detailed in Andrew Pickering‘s fascinating account The Cybernetic Brain, which I am now reading), it gave rise to numerous offshoots in other disciplines.
Rummaging through the journal archives, I found a few interesting articles by information theorists, such as Mark Pinsker, Albert Perez and the recently deceased Igor Vajda, and even by actual cyberneticians, such as Gordon Pask.
Here are a couple of articles that would be interesting to the readers of this blog:
Albert Perez, Information-theoretic risk estimates in statistical decision, Kybernetika, vol. 3, no. 1, pp. 1-21, 1967
In this paper we give some information-theoretical estimates of average and Bayes risk change
in statistical decision produced by a modification of the probability law in action and, in particular,
by reducing or enlarging the sample space as well as the parameter space sigma-algebras. These
estimates, expressed in terms of information growth or generalized f-enrotpy not necessarily of
Shannon’s type, are improved versions of the estimates we obtained in previous papers.
It is the object of this paper to show that a game theoretical viewpoint may be taken to underlie
the maximum entropy principle as well as the minimum discrimination information principle,
two principles of well known significance in theoretical statistics and in statistical thermodynamics. Our setting is very simple and certainly calls for future expansion.
Oddly, the latter paper does not seem to be very well known. However, recent work by Peter Grünwald and Philip Dawid extends Topsøe’s game-theoretic viewpoint and develops generalized notions of entropy and divergence for statistical decision problems with arbitrary loss functions:
Peter Grünwald and Philip Dawid, Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory, Annals of Statistics, vol. 32, no. 4, pp. 1367-1433, 2004
We describe and develop a close relationship between two problems that have customarily been regarded as distinct: that of maximizing entropy, and that of minimizing worst-case expected loss. Using a formulation grounded in the equilibrium theory of zero-sum games between Decision Maker and Nature, these two problems are shown to be dual to each other, the solution to each providing that to the other. Although Topsøe described this connection for the Shannon entropy over 20 years ago, it does not appear to be widely known even in that important special case.
We here generalize this theory to apply to arbitrary decision problems and loss functions. We indicate how an appropriate generalized definition of entropy can be associated with such a problem, and we show that, subject to certain regularity conditions, the above-mentioned duality continues to apply in this extended context. This simultaneously provides a possible rationale for maximizing entropy and a tool for finding robust Bayes acts. We also describe the essential identity between the problem of maximizing entropy and that of minimizing a related discrepancy or divergence between distributions. This leads to an extension, to arbitrary discrepancies, of a well-known minimax theorem for the case of Kullback–Leibler divergence (the “redundancy-capacity theorem” of information theory).
For the important case of families of distributions having certain mean values specified, we develop simple sufficient conditions and methods for identifying the desired solutions. We use this theory to introduce a new concept of “generalized exponential family” linked to the specific decision problem under consideration, and we demonstrate that this shares many of the properties of standard exponential families.
Finally, we show that the existence of an equilibrium in our game can be rephrased in terms of a “Pythagorean property” of the related divergence, thus generalizing previously announced results for Kullback–Leibler and Bregman divergences.
The actual paper is quite lengthy (over 60 pages of generalized entropy goodness!), but well worth the time.
Everyone has a blog, or so I’ve been told. So here I am, with a blog of my own. Ostensibly, this is an “academic” endeavor, so I can
pontificate share my thoughts and musings on subjects related to my research field — information theory, particularly as it pertains to decision-making, statistical learning and inference, control, and adaptation.
Why did I decide to call my blog The Information Structuralist? Partly, I was riffing on Dave Bacon’s The Quantum Pontiff and on Kush and Lav Varshney’s Information Ashvins, as well as on the structuralist school. But, more importantly, this name gets straight to the core of what this blog is all about.
The term “information structure” is used in game theory and, through the work of Hans Witsenhausen, in decentralized control to describe the state of knowledge in a multiagent system: an information structure specifies who knows what, when they know it, and what they may expect to know in the future. Kenneth Arrow in The Limits of Organization describes it thus:
By information structure … I mean not merely the state of knowledge existing at any moment of time but the possibility of acquiring relevant information in the future. We speak of the latter in communication terminology as the possession of an information channel, and the information to be received as signals from the rest of the world.
The upshot is that those aspects of information that were intentionally dismissed by Shannon — namely, its timeliness, structure, meaning, and value — must necessarily be brought to the foreground once we begin talking about ways information is (or will be) put to use. This goes sharply against the conventional wisdom that information is measured in the “universal currency” of bits. Once we summon forth all those demons Shannon had tried to banish, we must look beyond the bit and dabble in such unspeakable and eldritch things as sigma-algebras (and partial orders thereof), statistical experiments, information utility, quality of decisions, risk, and even learnability and combinatorial complexity. Fortunately, even when we thus expand our field of inquiry, we do not have to give up the time-tested tools Uncle Claude gave us — we merely need to learn to use them in new, heretofore unimagined, ways.
And that is the raison d’etre of my blog.