Having more information when making decisions should always help, it seems. However, there are situations in which this is not the case. Suppose that you observe two pieces of information, and , which you can use to choose an action . Suppose also that, upon choosing , you incur a cost . For simplicity let us assume that , , and take values in finite sets , , and , respectively. Then it is obvious that, no matter which “strategy” for choosing you follow, you cannot do better than . More formally, for any strategy we have
Thus, the extra information is irrelevant. Why? Because the cost you incur does not depend on directly, though it may do so through .
Interestingly, as David Blackwell has shown in 1964 in a three-page paper, this seemingly innocuous argument does not go through when , , and are Borel subsets of Euclidean spaces, the cost function is bounded and Borel-measurable, and the strategies are required to be measurable as well. However, if and are random variables with a known joint distribution , then is indeed irrelevant for the purpose of minimizing expected cost.
Warning: lots of measure-theoretic noodling below the fold; if that is not your cup of tea, you can just assume that all sets are finite and go with the poor man’s version stated in the first paragraph. Then all the results below will hold.
Let’s continue with our magical mystery tour through the lands of divergence.
Today’s stop is in the machine learning domain. The result I am about to describe has been floating around in various forms in many different papers, but it has been nicely distilled by Hari Narayanan and Sasha Rakhlin in their recent paper on a random walk approach to online convex optimization.