Are there formal definitions of “state” and “state variable” in the context of state space models in control theory

control theorydefinitionlinear algebralinear-controlordinary differential equations

I'm taking a class on control theory and I thought I understood the state space representation of linear systems — it seemed like essentially just extra syntax (or, "syntactic sugar" as programmers would call it) used in the process of a turning an nth order linear differential equation into a system of n first order linear DE's. It seemed the "state variables" were additional syntax to "hide" the additional derivatives.
e.g. If I have the DE $$v''(t) = v'(t) + 2u(t) + 3v(t)$$
I can define the state variables
$$x_1 = v(t), x_2 = x_1'(t) = v'(t)$$
and rewrite the single 2nd order DE as the set of equations
$$ x_2' = x_2 + 2u + 3x_1
$$

$$y = x_1
$$

where the first equation is considered the "state" equation and the second is the output equation. And since this is a set of linear equations we could of course represent it in matrix form. The process is simple enough, at least for the sorts of problems we're dealing with in my class.

But the thing I realized today that I don't get is, what makes the first equation the "state" equation? What even is a state, mathematically, in the context of state spaces? I've seen informal definitions stuff like,

A state variable is one of the set of variables that are used to describe the mathematical "state" of a dynamical system. Intuitively, the state of a system describes enough about the system to determine its future behaviour in the absence of any external forces affecting the system.
(Wikipedia)

and

The state variables represent values from inside the system that can change over time. (Wikibooks)

And I've seen specific examples where it intuitively makes sense to call one set of variables the state variables: e.g. if we're modeling an RLC circuit and we just care about the voltage on the resistor and are controlling the voltage of the power source, it makes sense intuitively to call the resistor voltage the output, the power source voltage the input, and the voltages through the inductor and capacitor the state variables.

But surely there has to be a more formal and less hand-wavy definition than that? What I'm really looking for is something in terms of linear and/or abstract algebra that doesn't rely on vague and nonmathematical terms like "inside the system".

Best Answer

There is an excellent paper by J.C. Willems titled "Paradigms and Puzzles in the Theory of Dynamical Systems" that provides one possible interpretation of how one can arrive at the notion of a state. Elements of this work can be seen in most modern texts that cover the intersection of control theory and dynamical system theory. There is some length and abstraction to this work so I will make no attempt of summarizing it in full here. Instead I will nail down the fundamental ideas briefly and point out what Willem's formal notion of state is. The work builds the notion of a "behavioural" definition of a system.

In Definition II.1, Willem's introduces the notion of a dynamical system. A dynamical system is a triple $(\mathbb{T}, \mathbb{W}, \mathfrak{B})$ with $\mathbb{T}$ being the time axis (often a subset of $\mathbb{R}$), $\mathbb{W}$ the signal space (for linear systems, think exponential bounded) and $\mathfrak{B} \subseteq \mathbb{W}^\mathbb{T}$ the behaviour of the system. An element $f \in \mathfrak{B}$ is a map from $\mathbb{T}$ to $\mathbb{W}$ that is a solution to the dynamical system. Note that this notion captures all sorts of dynamical systems, including discrete or even finite state space systems. The behavioural definition of a dynamical system pays no attention to how these behaviours are generated: those are the behavioural equations (e.g. ODEs). Notions of linearity and time-invariance can also be defined for these systems. Linearity is easy to define: a dynamical system is linear if $\mathbb{W}$ is a vector space and $\mathfrak{B}$ is a vector subspace (in the natural way). I will assume our systems are linear time-invariant.

To any dynamical system, we can always add additional pieces of information. These are called the latent variables. Let the space of latent variables be denoted by $\mathbb{L}.$ A dynamical system with latent variables is but simply a tuple $(\mathbb{T}, \mathbb{W}, \mathbb{L}, \mathfrak{B}_f)$ where $\mathfrak{B}_f \subseteq (\mathbb{W} \times \mathbb{L})^\mathbb{T}$ is called the full behaviour of the system From here, the notion of state can be defined. A state-space dynamical system is a dynamical system with latent variables $\mathbb{L} = \mathbb{X}$ (think state-space) where the full behaviour $\mathfrak{B}_f$ satisfies the axiom of state. What is that?

Axiom of State. Let $(w_1, x_1), (w_2, x_2) \in \mathfrak{B}_f$ be two arbitrary full behaviours of the system and let $\bar t \in \mathbb{T}.$ If $x_1(\bar t) = x_2(\bar t)$ then the full behaviour, $$ (w(t), x(t)) := \left\{ \begin{array}{lll} (w_1(t), x_1(t)) & \quad & t < \bar t\\ (w_2(t), x_2(t)) & \quad & t \geq \bar t \end{array} \right. $$ is also a full behaviour in $\mathfrak{B}_f$.

The author does a fairly good job of unravelling this definition, and spends a great deal of time doing so. Here is just a snippet:

This axiom requires that any trajectory from $\mathfrak{B}_f$, arriving in a particular state can be concatenated with any trajectory from $\mathfrak{B}_f$, emanating from that same state. Thus, once the state at time zero is known, the future behavior is fixed and no additional information relevant for the future will be acquired by giving further details about the past trajectory.

This axiom captures the adage that a state is that which uniquely determines the future. This is precisely the notion wikipedia hints at, and also the definition I have seen in any modern state-space control theory text that does cover modelling. Really, that is the level of abstraction most control theorists care about; the state gives enough enough to determine the future uniquely.

That is enough theory, it is time for a simple example. Let $\mathbb{T} = \mathbb{R}$ and let $\mathbb{W} = \mathbb{R}.$ Define the family of behaviours to be,

$$\mathfrak{B} = \left\{ v \in C^2(\mathbb{T}) \subseteq \mathbb{W}^\mathbb{T} \colon v''(t) = - v(t) \right\}.$$

That is, the behaviours of our system are those continuously differentiable twice functions that solve the second-order ODE. We are starting with the behavioural equation describing our system, and carefully identifying the space of solutions we are considering. From this we then add latent variables. You already know what one good choice of latent variables is, so let us make a bad choice to see how the axiom of state fails. Set $\mathbb{L} = \mathbb{R}$ and define the full behaviour of the system to be,

$$ \mathfrak{B}_f = \left\{ (v, x) \in (\mathbb{W}\times \mathbb{L})^\mathbb{T} \colon v \in \mathfrak{B}, x(t) = v(t) \right\}.$$

See that I have taken the only latent variable to be the original behaviour $v(t)$ of the system. We know that we need two latent variables (states) to describe the evolution of this system so we should find that the axiom of state is not satisfied.

Consider the two full behaviours,

$$(v_1(t), x_1(t)) = (\sin(t), \sin(t)),\quad (v_2(t),x_2(t)) = (-\sin(t), -\sin(t)).$$

Observe that $x_1(0) = x_2(0)$ so we satisfy the premise of the axiom of state. As a result, we should be able to define,

$$(v(t), x(t)) = \left\{ \begin{array}{lll} (v_1(t), x_1(t)) & \quad & t < 0\\ (v_2(t), x_2(t)) & \quad & t \geq 0 \end{array} \right.$$

and this ought to be a full behaviour of our original system. But it cannot be since it is not continuously differentiable twice at $0$ (in fact, it isn't even differentiable).

I haven't discussed input-state-output systems here because it only adds more notation to keep track of and obscures the fundamental point being driven home here. That is also covered in that paper and it isn't a far jump from what I have already discussed.