Solved – Why is the Kalman Filter a specific case of a (dynamic) Bayesian network

bayesian networkkalman filtermachine learning

Question: How can this be the case? Why are Kalman filters so much more complicated than any other Bayesian network? Are there any Bayesian networks which are intermediate in complexity?

Perhaps one way to answer this question would be to state or give a reference to the simplest possible model of a Kalman filter possible.

Background:
I am only familiar with Kalman filters with regards to literature about stationary stochastic processes in continuous time, in which they appear to be immensely complicated objects which I frankly don't understand. From the description given in Loeve and Shiryaev, where it is referred to as the Kalman-Bucy filter and is more closely associated with signal processing in continuous time and topics like orthogonal Karhunen-Loeve expansion (which again I don't understand yet).

However, all of the other Bayesian networks I am familiar are discrete-time, discrete state-space objects that one probably could learn about during the first or second semester of a first introduction to probability theory. This is obviously not true of the Kalman-Bucy filter in Shiryaev.

Yet Wikipedia states that Kalman filters are a special case of dynamic Bayesian networks.

Best Answer

The remark is not referring to continuous-time--continuous-observation Kalman-Bucy filters, but to discrete-time Kalman filters. The confusion seems to be only due to the OP not knowing about the discrete-time version (which in my experience is most commonly meant when 'Kalman filter' is mentioned). See, for example, the Wikipedia article 'Kalman filter' or [1].

In the discrete-time case, the state-space of the nodes is indeed not discrete but $\mathbb{R}^m$ for the measurements and $\mathbb{R}^n$ for the observations. There are however other Bayesian networks with continuous state-space (for the variables) and Gaussian conditional distributions, too [e.g. 2].

The discrete-time linear-Gaussian dynamic-system model can be written as a dynamic Bayesian network as follows.

  • Time-slice $k$ consists of nodes $\mathbf{x}_k$ and $\mathbf{y}_k$ and there is an edge pointing from $\mathbf{x}_k$ to $\mathbf{y}_k$.
  • The intertemporal edges are from $\mathbf{x}_k$ to $\mathbf{x}_{k+1}$.
  • The conditional probability distributions are $\mathbf{y}_k \mid \mathbf{x}_{k+1} \sim \mathrm{N}(\mathbf{A}_k\,\mathbf{x}_k,\mathbf{Q}_k)$ and $\mathbf{y}_{k} \mid \mathbf{x}_k \sim \mathrm{N}(\mathbf{H}_k \, \mathbf{x}_k, \mathbf{R}_k)$ where all quantities except $\mathbf{x},\mathbf{y}$ are known matrices.

The Kalman filter is then an algorithm for sequentially updating the distributions of $\mathbb{x}_k$ given observed $\mathbb{y}_1,\ldots,\mathbb{y}_k$ in this dynamic Bayesian network. The only probability theory required is computing conditional distributions of (finite-dimensional) multivariate Gaussian distributions.

Caveat: There exists also something called 'Continuous-time Bayesian networks'[3] but I'm not aware of any connection between them and the Kalman-Bucy filter's model.

References

[1]: Simo Särkkä (2013). Bayesian Filtering and Smoothing. Cambridge University Press. Section 4.3. Available on the author's webpage. (Conflict-of-interest disclaimer: the author was my PhD advisor)

[2]: F.V. Jensen (2001), Bayesian Networks and Decision Graphs, Springer (p. 69) (Curiously,this book p. 65 claims that a "Kalman filter" is any hidden Markov model with only one variable having intertemporal 'relatives' but this is definitely nonstandard usage)

[3]: Nodelman, U., Shelton, C. R., & Koller, D. (2002, August). Continuous time Bayesian networks. In Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence (pp. 378-387).