Alekh Agarwal and Sasha Rakhlin are organizing a workshop at this year’s NIPS. I’m on the program committee, so it is my duty (and distinct pleasure) to invite you all to peruse the full call for papers here, or at least to check out this key snippet:
We would like to welcome high-quality submissions on topics including but not limited to:
- Fundamental statistical limits with bounded computation
- Trade-offs between statistical accuracy and computational costs
- Computation-preserving reductions between statistical problems
- Algorithms to learn under budget constraints
- Budget constraints on other resources (e.g. bounded memory)
- Computationally aware approaches such as coarse-to-fine learning
Interesting submissions in other relevant topics not listed above are welcome too. Due to the time constraints, most accepted submissions will be presented as poster spotlights.
Oh, and did I mention that the workshop will take place in mid-December in Sierra Nevada, Spain?
Having more information when making decisions should always help, it seems. However, there are situations in which this is not the case. Suppose that you observe two pieces of information, and , which you can use to choose an action . Suppose also that, upon choosing , you incur a cost . For simplicity let us assume that , , and take values in finite sets , , and , respectively. Then it is obvious that, no matter which “strategy” for choosing you follow, you cannot do better than . More formally, for any strategy we have
Thus, the extra information is irrelevant. Why? Because the cost you incur does not depend on directly, though it may do so through .
Interestingly, as David Blackwell has shown in 1964 in a three-page paper, this seemingly innocuous argument does not go through when , , and are Borel subsets of Euclidean spaces, the cost function is bounded and Borel-measurable, and the strategies are required to be measurable as well. However, if and are random variables with a known joint distribution , then is indeed irrelevant for the purpose of minimizing expected cost.
Warning: lots of measure-theoretic noodling below the fold; if that is not your cup of tea, you can just assume that all sets are finite and go with the poor man’s version stated in the first paragraph. Then all the results below will hold.
Let’s continue with our magical mystery tour through the lands of divergence.
Today’s stop is in the machine learning domain. The result I am about to describe has been floating around in various forms in many different papers, but it has been nicely distilled by Hari Narayanan and Sasha Rakhlin in their recent paper on a random walk approach to online convex optimization.
It’s time to fire up the Shameless Self-Promotion Engine again, for I am about to announce a preprint and a paper to be published. Both deal with more or less the same problem — i.e., fundamental limits of certain sequential procedures — and both rely on the same set of techniques: metric entropy, Fano’s inequality, and bounds on the mutual information through divergence with auxiliary probability measures.
So, without further ado, I give you: (more…)