Divergence in everything: Cramér-Rao from data processing
Gather round for another tale of the mighty divergence and its adventures!
(image yoinked from Sergio Verdú‘s 2007 Shannon Lecture slides)
This time I want to show how the well-known Cramér–Rao lower bound (or the information inequality) for parameter estimation can be derived from one of the most fundamental results in information theory: the data processing theorem for divergence. What is particularly nice about this derivation is that, along the way, we get to see how the Fisher information controls local curvature of the divergence between two members of a parametric family of distributions.
Missing all the action
Update: I fixed a couple of broken links.
I want to write down some thoughts inspired by Chernoff’s memo on backward induction that may be relevant to feedback information theory and networked control. Some of these points were brought up in discussions with Serdar Yüksel two years ago.
The lost art of writing
From the opening paragraph of Herman Chernoff‘s unpublished 1963 memo “Backward induction in dynamic programming” (thanks to Armand Makowski for a scanned copy):
The solution of finite sequence dynamic programming problems involve a backward induction argument, the foundations of which are generally understood hazily. The purpose of this memo is to add some clarification which may be slightly redundant and whose urgency may be something less than vital.
Alas, nobody writes like that anymore.
1 comment