The Information Structuralist

Deadly ninja weapons: Blackwell’s principle of irrelevant information

Having more information when making decisions should always help, it seems. However, there are situations in which this is not the case. Suppose that you observe two pieces of information, {x} and {y}, which you can use to choose an action {u}. Suppose also that, upon choosing {u}, you incur a cost {c(x,u)}. For simplicity let us assume that {x}, {y}, and {u} take values in finite sets {{\mathsf X}}, {{\mathsf Y}}, and {{\mathsf U}}, respectively. Then it is obvious that, no matter which “strategy” for choosing {u} you follow, you cannot do better than {u^*(x) = \displaystyle{\rm arg\,min}_{u \in {\mathsf U}} c(x,u)}. More formally, for any strategy {\gamma : {\mathsf X} \times {\mathsf Y} \rightarrow {\mathsf U}} we have

\displaystyle  c(x,u^*(x)) = \min_{u \in {\mathsf U}} c(x,u) \le c(x,\gamma(x,y)).

Thus, the extra information {y} is irrelevant. Why? Because the cost you incur does not depend on {y} directly, though it may do so through {u}.

Interestingly, as David Blackwell has shown in 1964 in a three-page paper, this seemingly innocuous argument does not go through when {{\mathsf X}}, {{\mathsf Y}}, and {{\mathsf U}} are Borel subsets of Euclidean spaces, the cost function {c} is bounded and Borel-measurable, and the strategies {\gamma} are required to be measurable as well. However, if {x} and {y} are random variables with a known joint distribution {P}, then {y} is indeed irrelevant for the purpose of minimizing expected cost.

Warning: lots of measure-theoretic noodling below the fold; if that is not your cup of tea, you can just assume that all sets are finite and go with the poor man’s version stated in the first paragraph. Then all the results below will hold.


Divergence in everything: bounding the regret in online optimization

Let’s continue with our magical mystery tour through the lands of divergence.

(image yoinked from Sergio Verdú‘s 2007 Shannon Lecture slides)

Today’s stop is in the machine learning domain. The result I am about to describe has been floating around in various forms in many different papers, but it has been nicely distilled by Hari Narayanan and Sasha Rakhlin in their recent paper on a random walk approach to online convex optimization.


Bell Systems Technical Journal: now online

The Bell Systems Technical Journal is now online. Mmmm, seminal articles … . Shannon, Wyner, Slepian, Witsenhausen — they’re all here!

(h/t Anand Sarwate)