## COST: NIPS 2011 Workshop on Computational Trade-offs in Statistical Learning

Alekh Agarwal and Sasha Rakhlin are organizing a workshop at this year’s NIPS. I’m on the program committee, so it is my duty (and distinct pleasure) to invite you all to peruse the full call for papers here, or at least to check out this key snippet:

We would like to welcome high-quality submissions on topics including but not limited to:

- Fundamental statistical limits with bounded computation
- Trade-offs between statistical accuracy and computational costs
- Computation-preserving reductions between statistical problems
- Algorithms to learn under budget constraints
- Budget constraints on other resources (e.g. bounded memory)
- Computationally aware approaches such as coarse-to-fine learning
Interesting submissions in other relevant topics not listed above are welcome too. Due to the time constraints, most accepted submissions will be presented as poster spotlights.

Oh, and did I mention that the workshop will take place in mid-December in Sierra Nevada, Spain?

## Missing all the action

**Update:** I fixed a couple of broken links.

I want to write down some thoughts inspired by Chernoff’s memo on backward induction that may be relevant to feedback information theory and networked control. Some of these points were brought up in discussions with Serdar Yüksel two years ago.

## Deadly ninja weapons: Blackwell’s principle of irrelevant information

Having more information when making decisions should always help, it seems. However, there are situations in which this is not the case. Suppose that you observe two pieces of information, and , which you can use to choose an action . Suppose also that, upon choosing , you incur a cost . For simplicity let us assume that , , and take values in finite sets , , and , respectively. Then it is obvious that, no matter which “strategy” for choosing you follow, you cannot do better than . More formally, for any strategy we have

Thus, the extra information is *irrelevant*. Why? Because the cost you incur does not depend on directly, though it may do so through .

Interestingly, as David Blackwell has shown in 1964 in a three-page paper, this seemingly innocuous argument does not go through when , , and are Borel subsets of Euclidean spaces, the cost function is bounded and Borel-measurable, and the strategies are required to be measurable as well. However, if and are *random variables* with a known joint distribution , then is indeed irrelevant for the purpose of minimizing *expected* cost.

**Warning:*** lots of measure-theoretic noodling below the fold; if that is not your cup of tea, you can just assume that all sets are finite and go with the poor man’s version stated in the first paragraph. Then all the results below will hold.*

## Divergence in everything: bounding the regret in online optimization

Let’s continue with our magical mystery tour through the lands of divergence.

(image yoinked from Sergio Verdú‘s 2007 Shannon Lecture slides)

Today’s stop is in the machine learning domain. The result I am about to describe has been floating around in various forms in many different papers, but it has been nicely distilled by Hari Narayanan and Sasha Rakhlin in their recent paper on a random walk approach to online convex optimization.

## Sincerely, your biggest Fano

It’s time to fire up the Shameless Self-Promotion Engine again, for I am about to announce a preprint and a paper to be published. Both deal with more or less the same problem — i.e., fundamental limits of certain sequential procedures — and both rely on the same set of techniques: metric entropy, Fano’s inequality, and bounds on the mutual information through divergence with auxiliary probability measures.

So, without further ado, I give you: (more…)

leave a comment