Information theory in economics, Part II: Robustness
As we have seen in Part I, the rational inattention framework of Christopher Sims aims to capture the best a rational agent can do when his capacity for processing information is limited. This rationally inattentive agent, however, has no reason to question his statistical model. In this post we will examine the robustness framework of Thomas Sargent, which deals with the issue of model uncertainty, but does not assume any capacity limitations.
Again, for simplicity I will stick to single-stage problems. The setting is exactly the same as before: we have an agent who must make a decision pertaining to some state of the world on the basis of some signal correlated with . If the quality of a decision procedure is measured in terms of its expected cost , then the agent faces the optimization problem
and the optimal decision function is given by
In order to compute , the agent must know (the distribution of ) and the observation model that relates the observed signal to the unobserved state . In other words, the agent must have a probabilistic model, embodied in the joint distribution of and .
All of this is standard fare in the realm of Bayesian rationality. But now let’s suppose that the agent treats the model only as an approximation that was adopted for reasons of computational tractability, limited knowledge, etc. What should the agent do in this situation? Sargent’s proposal, inspired by ideas from robust control and game theory, is this: instead of simply optimizing the decision procedure to the “nominal” model (which may, after all, prove to be inaccurate), the agent should hedge his bets and allow for the fact that the “true” model (which we will denote by ) may differ from the nominal model , but the difference is bounded. While there are many ways of quantifying this model uncertainty, Sargent has opted for the divergence bound . The parameter is chosen by the agent and reflects his degree of confidence in : the smaller this , the more the agent will tend to trust his model-building skills. With the problem framed in this way, the natural strategy is the minimax one: if we define, for any distribution of and for any strategy , the expected cost
The idea here is that the resulting strategy will be robust against some amount of model uncertainty, as determined by the magnitude of . We can also envision a malicious adversary, who tries to thwart the agent’s objective by choosing the worst possible joint distribution of the state and the signal, subject to the divergence constraint . Thus, we can view the quantity defined in (1) as the upper value of a zero-sum game between the agent and the adversary. The agent’s moves are the strategies (so that mixed strategies would involve additional randomization on the agent’s part), while the adversary’s moves are the distributions in the “divergence ball” of radius around the nominal distribution .
Now let’s see what we can say about the optimal strategy in (1). We start by examining the inner maximization for a fixed . If we introduce a Lagrange multiplier for the constraint , then it can be shown that strong duality holds, so
where now the supremum is over all probability distributions for and . We will now show that, for a fixed , we can compute the supremum over in closed form. To that end, we will need the following result, known as the Donsker-Varadhan formula:
Remark 1 This result is so fundamental and so simple that it has been rediscovered multiple times (e.g., by the machine learning community).
Proof: To keep things simple, I will present the proof for finite . Let us define the tilted distribution
Then the supremum in (2) is achieved by . Indeed, for any other we will have
where equality holds if and only if .
Now we can use (2) and write
Consequently, we can express the optimal value of the optimization problem (1) as
This is as far as we can get — we have no way of doing the optimization over the choice of the strategy and the Lagrange multiplier in closed form. But we can gain some intuition by focusing on some fixed value of . To that end, let us define
Now let’s examine the quantity under the logarithm. It is actually a standard Bayesian optimum cost problem for the nominal model , except now instead of the original cost we have the exponentiated cost . So for each value of the optimal strategy minimizes the expected cost :
So the robust optimization problem (1) is, actually, an ordinary Bayesian optimization problem in disguise, but with a different cost function. This may not seem like a big deal, but now we can actually see how cautious a robust strategy is compared to a non-robust one. Let’s suppose that, instead of optimizing the expected cost over all , we wanted to minimize the probability that the cost exceeds some threshold . Then the problem is
The solution to this problem will, of course, depend on . But now we will see that a robust strategy will work for all (even though it will not be optimal for each individual ). To see this, we will exploit the fact that the graph of the step function with the step at , i.e., of the indicator function of the semi-infinite interval , lies below the graph of the shifted exponential function for any :
Moreover, this upper bound is actually achieved by . The main thing to note here is the presence of the exponentially decaying factor . So, if an agent decides to use a robust strategy and if his model actually turns out to be correct, then he ends up ensuring not only a small average cost, but also small probability of large excursions of the cost! For this reason, the strategy (4) is often called risk-sensitive, the problem of minimizing the expected exponential cost is called risk-sensitive optimization, and the Lagrange multiplier is called the risk aversion factor.