Directed stochastic kernels and causal interventions
As I was thinking more about Massey’s paper on directed information and about the work of Touchette and Lloyd on the information-theoretic study of control systems (which we had started looking at during the last meeting of our reading group), I realized that directed stochastic kernels that feature so prominently in the general definition of directed information are known in the machine learning and AI communities under another name, due to Judea Pearl — interventional distributions.
Let us recall the definition of a directed stochastic kernel. Consider random variables
with the obvious causal ordering. Factorize their joint distribution according to this causal ordering:
into two disjoint sets
and
, the directed stochastic kernel of
given
is given by
and
, viz.
As opposed to the usual mutual information , which measures the amount of statistical correlation between
and
, the directed information quantifies the amount of the causal influence
has on
.
Now let’s connect directed stochastic kernels to probabilistic graphical models, or Bayes networks. Staying with the same causal ordering, let us suppose that, for each , we can split the set
into two disjoint subsets,
and
, such that
the stochastic kernel
depends only on those components of
that lie in
, and we can rewrite the causal factorization (1) as
vertices, the
th vertex corresponding to
, where there is an edge connecting vertex
to vertex
if and only if
. It is not hard to see that the resulting graph will, indeed, be acyclic. This DAG, or Bayes network, representation of (4), thanks to the efforts of Judea Pearl and many others, is now one of the main tools in machine learning and AI.
Now, it turns out that, for a given assignment of values to the vertices in
, we can represent the operation of forming the directed kernel
graphically as well. In order to do that, we simply locate the vertices
, sever all edges incident on them, and consider the resulting DAG. After this operation is performed, the only vertices with edges incident on them are the ones in
, and we end up with (2). Judea Pearl calls the resulting distribution of
(given
) the interventional distribution, meaning that we have intervened into the causal model by disconnecting the vertices in
from any causal influences upon them and then forcing them to take the values assigned by
.
From that perspective, the directed information
represents the expected log-odds on when we get to observe
as statistical evidence versus the situation in which we intervene into the system by setting
to fixed values. The directed information is zero when there is no causal influence of
on
— i.e., when there is no difference between letting the system evolve passively and then recording our beliefs on
given
versus orchestrating the evolution of
by setting
ourselves.
Let me illustrate this point by a simple example of a control system that actually comes from Touchette and Lloyd, even if they do not ever talk about interventions or directed kernels. Consider a system in some initial state , which is driven into some final state
via application of a control
. We have the causal ordering
. Imagine two modes of controlling the system:
- Open-loop —
is chosen at random independently of
, and then the final state
is a given stochastic function of
and
.
- Closed-loop —
is chosen stochastically as a function of
(so the controller observes the initial state), and then, as before,
is determined from
and
.
The causal factorization is
Let us assume that in both cases the distribution of the initial state , as well as the state transition law
, are the same. Let us inspect the directed kernel
. We have
in both cases. But the directed information is
In the open-loop case, the control is chosen independently of the initial state , and so an equivalent causal ordering would have been
. In that case, the initial and the final state have no causal influence on
because the latter might as well have been chosen aeons before, but they are certainly correlated with it. In the closed-loop case, though, the initial and the final state do exert causal influence on the control due to the presence of the feedback link. Thus, the effect of the intervention in the closed-loop case is to simply sever the feedback link, which will make a difference relative to the setting in which the feedback link is present. Moreover, the causal influence is the greater, the more information the controller can extract from the initial state.

leave a comment