## COST: NIPS 2011 Workshop on Computational Trade-offs in Statistical Learning

Alekh Agarwal and Sasha Rakhlin are organizing a workshop at this year’s NIPS. I’m on the program committee, so it is my duty (and distinct pleasure) to invite you all to peruse the full call for papers here, or at least to check out this key snippet:

We would like to welcome high-quality submissions on topics including but not limited to:

• Fundamental statistical limits with bounded computation
• Trade-offs between statistical accuracy and computational costs
• Computation-preserving reductions between statistical problems
• Algorithms to learn under budget constraints
• Budget constraints on other resources (e.g. bounded memory)
• Computationally aware approaches such as coarse-to-fine learning

Interesting submissions in other relevant topics not listed above are welcome too. Due to the time constraints, most accepted submissions will be presented as poster spotlights.

Oh, and did I mention that the workshop will take place in mid-December in Sierra Nevada, Spain?

## Deadly ninja weapons: Blackwell’s principle of irrelevant information

Having more information when making decisions should always help, it seems. However, there are situations in which this is not the case. Suppose that you observe two pieces of information, ${x}$ and ${y}$, which you can use to choose an action ${u}$. Suppose also that, upon choosing ${u}$, you incur a cost ${c(x,u)}$. For simplicity let us assume that ${x}$, ${y}$, and ${u}$ take values in finite sets ${{\mathsf X}}$, ${{\mathsf Y}}$, and ${{\mathsf U}}$, respectively. Then it is obvious that, no matter which “strategy” for choosing ${u}$ you follow, you cannot do better than ${u^*(x) = \displaystyle{\rm arg\,min}_{u \in {\mathsf U}} c(x,u)}$. More formally, for any strategy ${\gamma : {\mathsf X} \times {\mathsf Y} \rightarrow {\mathsf U}}$ we have

$\displaystyle c(x,u^*(x)) = \min_{u \in {\mathsf U}} c(x,u) \le c(x,\gamma(x,y)).$

Thus, the extra information ${y}$ is irrelevant. Why? Because the cost you incur does not depend on ${y}$ directly, though it may do so through ${u}$.

Interestingly, as David Blackwell has shown in 1964 in a three-page paper, this seemingly innocuous argument does not go through when ${{\mathsf X}}$, ${{\mathsf Y}}$, and ${{\mathsf U}}$ are Borel subsets of Euclidean spaces, the cost function ${c}$ is bounded and Borel-measurable, and the strategies ${\gamma}$ are required to be measurable as well. However, if ${x}$ and ${y}$ are random variables with a known joint distribution ${P}$, then ${y}$ is indeed irrelevant for the purpose of minimizing expected cost.

Warning: lots of measure-theoretic noodling below the fold; if that is not your cup of tea, you can just assume that all sets are finite and go with the poor man’s version stated in the first paragraph. Then all the results below will hold.

## Divergence in everything: bounding the regret in online optimization

Let’s continue with our magical mystery tour through the lands of divergence.

(image yoinked from Sergio Verdú‘s 2007 Shannon Lecture slides)

Today’s stop is in the machine learning domain. The result I am about to describe has been floating around in various forms in many different papers, but it has been nicely distilled by Hari Narayanan and Sasha Rakhlin in their recent paper on a random walk approach to online convex optimization.