The Information Structuralist

Information theory in economics, Part I: Rational inattention

Posted in Echoes of Cybernetics, Economics, Games and Decisions, Information Theory by mraginsky on June 1, 2012

Economic activity involves making decisions. In order to make decisions, agents need information. Thus, the problem of acquisition, transmission, and uses of information has been occupying the economists’ attention for some time now (there is even a whole subfield of “information economics”). It is not surprising, therefore, that information theory, the brainchild of Claude Shannon, would eventually make its way into economics. In this post and the one to follow, I will briefly describe two specific strands of information-theoretic work in economics: the rational inattention framework of Christopher Sims and the robustness ideas of Thomas Sargent. (As an interesting aside: Sims and Sargent have shared the 2011 Nobel Memorial Prize in Economics, although not directly for their information-theoretic work, but rather for their work related to causality.)

In a nutshell, both Sims and Sargent aim to mitigate a significant shortcoming of the rational expectations hypothesis, namely that (to quote Cosma) “what game theorists [and economists] somewhat disturbingly call rationality is assumed throughout — … game players are assumed to be hedonistic yet infinitely calculating sociopaths endowed with supernatural computing abilities.” To put this in more charitable terms, the basic tenet of rational expectations is that all economic agents are continuously optimizing, have access to all relevant information, can react to it instantly, and have unlimited computational capabilities. This is, to put it mildly, a huge oversimplification that does not mesh well with empirical observations. In reality, we see all sorts of “inertia” and delayed reaction effects (what Keynes has referred to as “stickiness” of prices, wages, etc.). Moreover, even disregarding stickiness, there is no reason to believe that the models used by the agents are at all accurate (indeed, if the 2008 financial crisis has taught us anything, quite the opposite is true). Thus, two adjustments are needed to the rational expectations framework: one to account for the fact that economic agents and institutions have only limited resources and capacity for acquiring and processing information, and another to formalize the pervasiveness of model uncertainty.

This post will focus on rational inattention, which seeks to address the first issue. (A follow-up post will discuss robustness, which tackles the second issue.) The main tenet of rational inattention is that any economic agent can observe all relevant information only through a capacity-limited channel, where capacity is understood in the sense of Shannon. This capacity constraint is meant to capture the idea that, contrary to the rational expectations viewpoint, no one really keeps track of all available information, with perfect accuracy, all the time. This is the inattention part. The rational part captures the notion that an agent faced with an information capacity constraint will aim to optimize the observation channel through which he receives the relevant information, subject to that constraint. So, the question is — what should guide the optimization process?

Here is my take on what Sims is proposing. To keep things simple, I will limit the discussion to one-stage decision problems, although the same can be done for multistage problems too. Consider an agent who wishes to make a decision (or take an action) pertaining to some state of the world (modeled as a random variable {X} with a known and fixed distribution {P_X}) on the basis of some related signal {Z} under the mutual information constraint {I(X; Z) \le R}. The decision variable {U} is assumed to be some (possibly randomized) function of the signal {Z}, and the overall quality of the decision is measured in terms of expected cost, {{\mathbb E}[c(X,U)]}. Let’s assume for the moment that the signal space {{\mathsf Z}} is fixed. So now the agent has to optimize two objects: the observation channel {P_{Z|X}} and the decision procedure {P_{U|Z}}. Given {P_{Z|X}}, the best choice of {P_{U|Z}} is to minimize the expected posterior cost:

\displaystyle  U^* = \mathop{\text{arg\,min}}_{u} {\mathbb E}[c(X,u)|Z] \ \ \ \ \ (1)

(if there are several minimizers, select one at random). Now we want to optimize the observation channel {P_{Z|X}}. Clearly, the best such channel, under the given constraint {I(X;Z) \le R}, is the one that yields the largest reduction in expected cost relative to the best “no-information” expected cost {\min_u {\mathbb E}[c(X,u)]}:

\displaystyle  P^*_{Z|X} = \mathop{\text{arg\,max}}_{P_{Z|X}: I(X;Z) \le R}\left\{ \min_{u} {\mathbb E}[c(X,u)] - {\mathbb E}\left[\min_u {\mathbb E}[c(X,u)|Z]\right]\right\}.

Once the optimal observation channel {P^*_{Z|X}} is found, the optimal decision procedure {P^*_{U|Z}} is given by (1).

Thus, a rationally inattentive agent must maximize the value of available information for the purpose of decision-making. As we have seen before, the notion of the value of information is intimately related to rate-distortion theory. Indeed, if we now allow the agent to choose not only the observation channel and the decision procedure, but also the signal space {{\mathsf Z}}, then the smallest attainable cost will be given by the distortion-rate function (DRF)

\displaystyle  D(R) = \inf_{P_{U|X}; I(X;U) \le R}{\mathbb E}[c(X,U)]. \ \ \ \ \ (2)

More precisely, as proved by Stratonovich, the DRF gives the best attainable reduction of expected costs under a mutual information constraint:

\displaystyle  D(R) = \sup_{{\mathsf Z}} \sup_{P_{Z|X}: I(X;Z) \le R} \left\{ \min_u {\mathbb E}[c(X,u)] - {\mathbb E}\left[\min_u {\mathbb E}[c(X,u)|Z]\right] \right\},

where the first supremum is over the choice of signal space {{\mathsf Z}}, while the second is over all admissible observation channels with output in {{\mathsf Z}}. In fact, the supremum over {{\mathsf Z}} is attained by choosing {{\mathsf Z}} to be the space of beliefs {{\mathcal P}({\mathsf X})}, and then the optimum {Z^* \in {\mathcal P}({\mathsf X})} is the posterior distribution {P^*_{X|U}} induced by {P_X} and the test channel {P^*_{U|X}} that achieves the optimum in (2).

This simple idea goes a long way, as anyone can see by browsing the work of Sims on the topic. For example, if all random objects are actually time series (or stochastic processes), then the rational inattention framework can indeed account for stickiness; it can account for discreteness phenomena in retail price or wage changes; and it can even (somewhat speculatively) shed light on transparency in monetary policy: why do financial markets often react so strongly and drastically to the Fed’s very succinct discrete summaries of its monetary policy changes, as compared to more smooth reactions to other central banks that provide more regular and more detailed summaries?

Of course, as any information theorist knows, the Shannon DRF is only an asymptotic measure of performance, and it needs to be related to an operational criterion via an appropriate coding theorem. In a communication system, the relevant operational criteria pertain to the quality of signal reconstruction at the receiver, and the mutual information constraint is only an asymptotic abstraction of the channel’s reliability by way of the law of large numbers. However, in economics it is not at all clear, which operational interpretation one should attach to mutual information constraints faced by rationally inattentive agents. If we view an economic institution (say, a firm or an investment bank) as an organism, then it interfaces with its environment via its “sensory organs” (electronic communications, eyes, ears and brains of its analysts, etc.), and so the mutual information constraint may represent the institutions’s “sensory capacity” (taking us all the way back to early cybernetics work on the information capacity of single neurons). But even then any talk of capacity should be tied to coding, which will inevitably introduce delays and layers of complexity. While Sims acknowledges the issues of delay and complexity in his papers, there is (as of yet) no satisfactory operationalization of rational inattention.

About these ads

3 Responses

Subscribe to comments with RSS.

  1. kvarsh said, on June 4, 2012 at 4:34 pm

    In our work, Lav and I treat decision making from the same perspective of a rational, but limited, decision maker. However, we encode the limitation in a one-stage Bayesian M-ary hypothesis testing problem as one of limited precision in the prior probability. The decision maker, following Radner’s paradigm of costly rationality, optimizes a quantizer of prior probability space under a distortion function directly related to the Bayes risk of the decision. It turns out that this distortion function, which we call the Bayes risk error, is a Bregman divergence.

    There are no asymptotics needed in our model, because we are dealing directly with decision making performance expressed through Bayes risk rather than informational criteria that become valid performance criteria in limiting cases. The quantizer that we find is an operational encoder of the ideas expressed by Sims.

    • mraginsky said, on June 4, 2012 at 6:03 pm

      Kush, thanks for your comment.

      I am familiar with your work in this direction, and I agree that your notion of limited rationality is operationally meaningful in a nonasymptotic setting. I am not sure, however, that quantization of prior beliefs encodes the same notion of inattention as the one proposed by Sims — he assumes full knowledge of the prior, but puts limits on the agent’s ability to update the prior (via Bayesian updates) through observations by constraining the likelihood.

      Yuan and Clarke actually formulate this idea from the viewpoint of Bayesian inference: A. Yuan and B. Clarke, “An Information Criterion for Likelihood Selection,” IEEE Transactions on Information Theory, Vol. 45, No. 2, pp. 562-571, March 1999.

      • kvarsh said, on June 8, 2012 at 12:32 pm

        “Thinking through Categories” by Mullainathan and “Calibrated Learning and Correlated Equilibrium” by Foster and Vohra, although not having information theoretically-defined capacity constraints, are both interesting views on updating Bayesian priors.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 48 other followers

%d bloggers like this: