The Information Structuralist

ECE 299: regression with quadratic loss; stochastic simulation via Rademacher bootstrap

Posted in Control, Corrupting the Young, Statistical Learning and Inference by mraginsky on April 20, 2011

I gave the last lecture earlier today, wrapping up the semester. Here are the notes from the last two weeks:

Monday’s lecture was on stochastic gradient descent as an alternative to batch empirical risk minimization. I will post the notes soon.

Tagged with:

ECE 299: learning-theoretic bounds for vector quantizers; binary classification

Posted in Corrupting the Young, Statistical Learning and Inference by mraginsky on March 29, 2011

More learning-theoretic goodness:

  • Case study: empirical quantizer design, where I discuss beautiful work by Tamás Linder et al. that uses VC theory to bound the performance of empirically designed vector quantizers (which is engineering jargon for consistency of the method of k-means).
  • Binary classification: from the classic bounds for linear and generalized linear discriminant rules to modern techniques based on surrogate losses; voting methods; kernel machines; Convex Risk Minimization.
Tagged with:

ECE 299: Vapnik-Chervonenkis classes

Posted in Corrupting the Young, Statistical Learning and Inference by mraginsky on February 28, 2011

Having covered generalization bounds for abstract ERM via Rademacher averages, we now trace the historical development of the field:

  • Vapnik-Chervonenkis classes: shatter coefficients; VC dimension; examples of VC classes; Sauer-Shelah lemma; implication for Rademacher averages

Mmmmm, character-building!

Tagged with:

Statistical Learning Theory (ECE 299, Spring 2011)

Posted in Corrupting the Young, Statistical Learning and Inference by mraginsky on February 16, 2011

Now that the ITA Workshop is over and the ISIT deadline is officially past, I can resume semi-regular blogging.

This semester, I am teaching a graduate course on Statistical Learning Theory in my department. My aim is to introduce graduate students in electrical engineering to such things as Empirical Risk Minimization, generalization bounds, model selection, complexity regularization, minimax lower bounds &c., with certain examples of applications to information theory, signal processing, and adaptive control.

Unfortunately, I have not come across a textbook that would be suitable for teaching statistical learning theory to graduate students of engineering. Most texts out there are skewed towards either the computer science side of things or the mathematical statistics crowd. The only book that comes somewhat close to what I had in mind is M. Vidyasagar‘s Learning and Generalization, but its $169 price tag has stopped me from officially adopting it. Instead, I have been preparing my own lecture notes. As the class goes on, I will be posting additional notes here, but here is what I have covered so far:

  • Introduction: learning from examples in a probabilistic setting; goals of learning; basics of statistical decision theory; estimation and approximation errors
  • Concentration inequalities: Markov’s and Chebyshev’s inequalities; the Chernoff bounding trick; Hoeffding’s inequality; McDiarmid’s inequality; examples
  • Formulation of the learning problem: concept and function learning in the realizable case; PAC learning; model-free learning; Empirical Risk Minimization and its consistency
  • More on Empirical Risk Minimization: error bounds via Rademacher averages

Normally, I write these in frenetic two-hour bursts, so they are more than likely full of misconceptions, omissions, appalling lack of rigor, bugs, typos, and the like. Caveat lector!

Tagged with: