Gather round for another tale of the mighty divergence and its adventures!
This time I want to show how the well-known Cramér–Rao lower bound (or the information inequality) for parameter estimation can be derived from one of the most fundamental results in information theory: the data processing theorem for divergence. What is particularly nice about this derivation is that, along the way, we get to see how the Fisher information controls local curvature of the divergence between two members of a parametric family of distributions.
The solution of finite sequence dynamic programming problems involve a backward induction argument, the foundations of which are generally understood hazily. The purpose of this memo is to add some clarification which may be slightly redundant and whose urgency may be something less than vital.
Alas, nobody writes like that anymore.