Divergence in everything: Cramér-Rao from data processing

Gather round for another tale of the mighty divergence and its adventures!

(image yoinked from Sergio Verdú‘s 2007 Shannon Lecture slides)

This time I want to show how the well-known Cramér–Rao lower bound (or the information inequality) for parameter estimation can be derived from one of the most fundamental results in information theory: the data processing theorem for divergence. What is particularly nice about this derivation is that, along the way, we get to see how the Fisher information controls local curvature of the divergence between two members of a parametric family of distributions.

Continue reading “Divergence in everything: Cramér-Rao from data processing”