Order of a non-linear autoregressive exogenous (NARX) model

discrete timenonlinear systemregressionstatisticstime series

Suppose I have a non-linear autoregressive exogenous (NARX) model of the kind
$$y(k+1)=f(y(k), y(k-1), …, y(k-s), u(k), u(k-1),…, u(k-t)) $$
where $y$ and $u$ represent respectively the output and the input of a discrete non linear system.

Is it correct to say that the order of the NARX model is $\max\{s+1, t+1\}$?

Here the Wikipedia link on info about the NARX model: https://en.wikipedia.org/wiki/Nonlinear_autoregressive_exogenous_model

Here is the paper which references the NARX model. See equation (1).

Best Answer

Definitions and citations

We have various kinds of definitions of the "order" of a (NARX/AR/ any kind of) model. These are so varied and apparently rare that frankly, I had a lot of trouble looking for them.

Let me mention what seem to be the two "central" ideas around model order(s).

That models do not have an order, but rather orders. Essentially, we distinguish the amount of prior input required and the amount of prior output required.
That models have an order which is a single number, which captures the number of previous time steps the model has to account for during regression.

Let me use examples to further my convictions. Note that I skip the notion of delay, which is an idea of how much dependence on previous parameters is delayed in the system. To put it more precisely, if $y(t) = f(y(t-2))$ for example, then $y(t)$ is like a delayed response by a step since $y(t-1)$ should have depended on $y(t-2)$ but did not, so the delay of the process is $1$. We ignore delay and set delay to zero everywhere. So all equations are abridged to have delay zero, and to set other parameters unnecessary to our discussion to convenient values.

In this paper, the explicit idea of model order is discussed on the first page (or page 459 of the relevant journal) of the paper. More precisely, I slightly abridge the paper's equations to make life simpler, and present an equation and a definition :

$$ y(i) = f[y(i-1),...,y(i-m),u(i-1),\ldots,u(i-\theta-n)]^T $$ The values $\color{red}{m \text{ and } n }$ are often referred to as the model orders.

In this paper, the notion is actually extended far more. In fact, randomness is brought into the regression in the form of a variable $e(t)$, and the resulting model is referred to as a NARMAX model. In such a case, the model order is not just described by two but actually by three numbers. To put this in context, the relevant equation on page $2$ of the paper is :

$$ y(t) = f[y(t-1),y(t-2),...,y(t-n_y),u(t-1),...,u(t-n_u), e(t-1),...,e(t-n_e)] + e(t) $$

and the description given is :

For fixed model order $n_u,n_y,n_e$... (page $3$)

So when randomness comes in, we have a different look on things, and the model order must then include that number $n_e$ as well.

Now we take a look at the book "Spectral analysis of time series" by Priestley. Fundamentally speaking, the book relegates the idea of a control in the process, and also neglects linearity for extended periods. However, it is clear that it refers to non-linear model orders with the use of precisely one number. Even linear models (including ARMA) are used with a single number. For example, the model

$X_t - a_tX_{t-1} - ... -a_{t-k}X_k = b\epsilon_t$

is referred to as a model of order $k$ (where $\epsilon_t$ includes randomness). In particular, in parts, the book makes a control a "part of the state", as I see it, so that $X_{t}$ and $u_t$ aren't considered separate, but rather $u_t$ comes as a part of $X_t$. But this is applied in non-linear models as well. For example, the model $X_{t+1} = aX_t +b \epsilon_t + cX_t\epsilon_t$ is referred to as an order one model, despite the fact that the randomness is taken to order $1$.

However, the book actually mentions something even better. Indeed, for certain models such as the threshold and TAR model, the idea of a "model order" actually uses the AIC (Akaike information criterion). The Akaike information criterion's Wikipedia page is here and it is described as a complexity number that balances the model's goodness of fit and the number of parameters. The goodness of fit, however, is a moot point for the NARX model that uses NO randomness, and therefore that particular term is actually zero. To quote from the Wiki page for likelihood functions : "Given no event (no data), the probability and thus likelihood is 1". So the point is : the order reduces to the number of parameters.

Summarizing thoughts, and my answer to your question

The citations of the given source do not actually make explicit references to model orders very cleanly, or never at least as cleanly as these papers. But let me summarize my thoughts, because we've clearly seen that opinion is split.

The point is that functions can be extended to make "empty" use of their arguments. For example, consider the function $f(x,y) = x$. This doesn't even make use of the second argument, so calling it a function of two variables is correct but just feels redundant!

However, that's exactly the point. See, let's look at your model, which is : $$ y(k+1) = f(y(k),y(k-1),...,y(k-s),u(k),u(k-1),...,u(k-t)) $$

Suppose without loss of generality that $s>t$. Then, we can add the variables $u(k-t-1),u(k-t-2),...,u(k-s)$, and let the function $f$ be the same. This is creating fake dependence on $u(k-t-1),u(k-t-2),...,u(k-s)$ : the truth is that like the variable $y$ in $f(x,y) = x$, they are empty and carry no meaning. When we do this, we get : $$ y(k+1) = f(y(k),y(k-1),...,y(k-s),u(k),u(k-1),...,u(k-s)) $$

where it makes far, far more sense to say the order is $s+1$ (which equals $\max\{s+1,t+1\}$). This kind of empty dependence allows us to use the notion of order as ONE number.

Now, when is this kind of reasoning feasible? The answer is obvious : when adding the redundant variables doesn't change the context too much. When it's not too much effort. We usually measure effort in terms of "computational complexity". So, in some sense, if $s$ and $t$ are similar in computational order (note : I'm willing to explain what this means outside the post, but I'm keeping it imprecise inside) then one can augment the function with these additional variables, and say that the order of the given system is $\max\{s+1,t+1\}$.

So are you correct, keeping your source in mind? yes, you are. Why? Because in your source, it is quite clearly mentioned that the terms $n_x$ and $n_y$ are four and three respectively, as used in the applications. See, for example page $9$ of $16$ for these details. Now,these are constants. So straight away, they are not going to explode, or have any kind of change. Therefore, it instantly becomes reasonable to say that the additional "waste" parameter is not going to be an computational burden in any manner, and therefore the "order" of the model can be given as $\{s+1,t+1\}$.

So when would such an idea fail? Let's say $t = s^2$. Then, calling a model with $s$ time delay and $s^2$ control delay as a model with order $s^2$ is going to look very comical because you are grossly, grossly overestimating the $s$ time delay (by a whole square root : how fast does $s^2$ grow with $s$!). In such a case, it is obvious that you are better off mentioning each parameter in separation, and calling the collective the model order.

Note also, that I have mentioned only one reason that the model orders should not be kept separate. There may be a plethora of reasons as to why they should be kept separate, including the fact that an increase of a single model order could lead to computational blow up of several orders. This is, in fact the case in the papers mentioned. For them, the optimization between several AR schemes of low order suggests that even among small fluctuations of input and output order, computational gaps exist that must be sorted out.

So are you right? Don't worry, you are. But there is another right, and we will acknowledge that as well.

Best Answer

Definitions and citations

Summarizing thoughts, and my answer to your question

Related Solutions

Cost Function Confusion for Ordinary Least Squares estimation in Linear Regression

Related Question