Information theory in economics, Part II: Robustness
As we have seen in Part I, the rational inattention framework of Christopher Sims aims to capture the best a rational agent can do when his capacity for processing information is limited. This rationally inattentive agent, however, has no reason to question his statistical model. In this post we will examine the robustness framework of Thomas Sargent, which deals with the issue of model uncertainty, but does not assume any capacity limitations.
Again, for simplicity I will stick to single-stage problems. The setting is exactly the same as before: we have an agent who must make a decision pertaining to some state of the world
on the basis of some signal
correlated with
. If the quality of a decision procedure
is measured in terms of its expected cost
, then the agent faces the optimization problem
and the optimal decision function is given by
In order to compute , the agent must know
(the distribution of
) and the observation model
that relates the observed signal
to the unobserved state
. In other words, the agent must have a probabilistic model, embodied in the joint distribution
of
and
.
All of this is standard fare in the realm of Bayesian rationality. But now let’s suppose that the agent treats the model only as an approximation that was adopted for reasons of computational tractability, limited knowledge, etc. What should the agent do in this situation? Sargent’s proposal, inspired by ideas from robust control and game theory, is this: instead of simply optimizing the decision procedure to the “nominal” model
(which may, after all, prove to be inaccurate), the agent should hedge his bets and allow for the fact that the “true” model (which we will denote by
) may differ from the nominal model
, but the difference is bounded. While there are many ways of quantifying this model uncertainty, Sargent has opted for the divergence bound
. The parameter
is chosen by the agent and reflects his degree of confidence in
: the smaller this
, the more the agent will tend to trust his model-building skills. With the problem framed in this way, the natural strategy is the minimax one: if we define, for any distribution
of
and for any strategy
, the expected cost
then the agent should solve the optimization problem
The idea here is that the resulting strategy will be robust against some amount of model uncertainty, as determined by the magnitude of . We can also envision a malicious adversary, who tries to thwart the agent’s objective by choosing the worst possible joint distribution
of the state and the signal, subject to the divergence constraint
. Thus, we can view the quantity
defined in (1) as the upper value of a zero-sum game between the agent and the adversary. The agent’s moves are the strategies
(so that mixed strategies would involve additional randomization on the agent’s part), while the adversary’s moves are the distributions
in the “divergence ball” of radius
around the nominal distribution
.
Now let’s see what we can say about the optimal strategy in (1). We start by examining the inner maximization for a fixed . If we introduce a Lagrange multiplier
for the constraint
, then it can be shown that strong duality holds, so
where now the supremum is over all probability distributions for and
. We will now show that, for a fixed
, we can compute the supremum over
in closed form. To that end, we will need the following result, known as the Donsker-Varadhan formula:
Lemma 1 (Donsker-Varadhan) For any probability measure
on some space
and any measurable function
such that
,
Remark 1 This result is so fundamental and so simple that it has been rediscovered multiple times (e.g., by the machine learning community).
Proof: To keep things simple, I will present the proof for finite . Let us define the tilted distribution
Then the supremum in (2) is achieved by . Indeed, for any other
we will have
where equality holds if and only if .
Now we can use (2) and write
Consequently, we can express the optimal value of the optimization problem (1) as
This is as far as we can get — we have no way of doing the optimization over the choice of the strategy and the Lagrange multiplier
in closed form. But we can gain some intuition by focusing on some fixed value of
. To that end, let us define
Now let’s examine the quantity under the logarithm. It is actually a standard Bayesian optimum cost problem for the nominal model , except now instead of the original cost
we have the exponentiated cost
. So for each value of
the optimal strategy
minimizes the expected cost
:
So the robust optimization problem (1) is, actually, an ordinary Bayesian optimization problem in disguise, but with a different cost function. This may not seem like a big deal, but now we can actually see how cautious a robust strategy is compared to a non-robust one. Let’s suppose that, instead of optimizing the expected cost over all
, we wanted to minimize the probability that the cost
exceeds some threshold
. Then the problem is
The solution to this problem will, of course, depend on . But now we will see that a robust strategy will work for all
(even though it will not be optimal for each individual
). To see this, we will exploit the fact that the graph of the step function with the step at
, i.e., of the indicator function of the semi-infinite interval
, lies below the graph of the shifted exponential function
for any
:
The two graphs touch precisely at the point , and the bound is tighter for larger values of
. Therefore, using (5) and the definition of
in (3), we can write
Moreover, this upper bound is actually achieved by . The main thing to note here is the presence of the exponentially decaying factor
. So, if an agent decides to use a robust strategy and if his model
actually turns out to be correct, then he ends up ensuring not only a small average cost, but also small probability of large excursions of the cost! For this reason, the strategy (4) is often called risk-sensitive, the problem of minimizing the expected exponential cost
is called risk-sensitive optimization, and the Lagrange multiplier
is called the risk aversion factor.
For a nice treatment of robustness and risk-sensitive optimization in the context of information theory and statistical estimation, check out this paper by Neri Merhav.
I would like to use some of your posts for the IT society’s newsletter… This one is quite quite pretty…
I am certainly open to that idea, so we should definitely talk about that. (And thank you.)
In the interest of improving communication between disciplines, I invite you and your readers to look at our work on robustness and rational inattention, available at my website.
Thank you very much for the link!
[…] As Max points out in the comments, this is really a specialized version of the Donsker-Varadhan formula, also mentioned by Mokshay in a comment here. I think the difficulty with concepts like these is […]