Suppose I have a non-linear autoregressive exogenous (NARX) model of the kind
$$y(k+1)=f(y(k), y(k-1), …, y(k-s), u(k), u(k-1),…, u(k-t)) $$
where $y$ and $u$ represent respectively the output and the input of a discrete non linear system.
Is it correct to say that the order of the NARX model is $\max\{s+1, t+1\}$?
Here the Wikipedia link on info about the NARX model: https://en.wikipedia.org/wiki/Nonlinear_autoregressive_exogenous_model
Here is the paper which references the NARX model. See equation (1).
Best Answer
Definitions and citations
We have various kinds of definitions of the "order" of a (NARX/AR/ any kind of) model. These are so varied and apparently rare that frankly, I had a lot of trouble looking for them.
Let me mention what seem to be the two "central" ideas around model order(s).
That models do not have an order, but rather orders. Essentially, we distinguish the amount of prior input required and the amount of prior output required.
That models have an order which is a single number, which captures the number of previous time steps the model has to account for during regression.
Let me use examples to further my convictions. Note that I skip the notion of delay, which is an idea of how much dependence on previous parameters is delayed in the system. To put it more precisely, if $y(t) = f(y(t-2))$ for example, then $y(t)$ is like a delayed response by a step since $y(t-1)$ should have depended on $y(t-2)$ but did not, so the delay of the process is $1$. We ignore delay and set delay to zero everywhere. So all equations are abridged to have delay zero, and to set other parameters unnecessary to our discussion to convenient values.
and the description given is :
So when randomness comes in, we have a different look on things, and the model order must then include that number $n_e$ as well.
is referred to as a model of order $k$ (where $\epsilon_t$ includes randomness). In particular, in parts, the book makes a control a "part of the state", as I see it, so that $X_{t}$ and $u_t$ aren't considered separate, but rather $u_t$ comes as a part of $X_t$. But this is applied in non-linear models as well. For example, the model $X_{t+1} = aX_t +b \epsilon_t + cX_t\epsilon_t$ is referred to as an order one model, despite the fact that the randomness is taken to order $1$.
However, the book actually mentions something even better. Indeed, for certain models such as the threshold and TAR model, the idea of a "model order" actually uses the AIC (Akaike information criterion). The Akaike information criterion's Wikipedia page is here and it is described as a complexity number that balances the model's goodness of fit and the number of parameters. The goodness of fit, however, is a moot point for the NARX model that uses NO randomness, and therefore that particular term is actually zero. To quote from the Wiki page for likelihood functions : "Given no event (no data), the probability and thus likelihood is 1". So the point is : the order reduces to the number of parameters.
Summarizing thoughts, and my answer to your question
The citations of the given source do not actually make explicit references to model orders very cleanly, or never at least as cleanly as these papers. But let me summarize my thoughts, because we've clearly seen that opinion is split.
The point is that functions can be extended to make "empty" use of their arguments. For example, consider the function $f(x,y) = x$. This doesn't even make use of the second argument, so calling it a function of two variables is correct but just feels redundant!
However, that's exactly the point. See, let's look at your model, which is : $$ y(k+1) = f(y(k),y(k-1),...,y(k-s),u(k),u(k-1),...,u(k-t)) $$
Suppose without loss of generality that $s>t$. Then, we can add the variables $u(k-t-1),u(k-t-2),...,u(k-s)$, and let the function $f$ be the same. This is creating fake dependence on $u(k-t-1),u(k-t-2),...,u(k-s)$ : the truth is that like the variable $y$ in $f(x,y) = x$, they are empty and carry no meaning. When we do this, we get : $$ y(k+1) = f(y(k),y(k-1),...,y(k-s),u(k),u(k-1),...,u(k-s)) $$
where it makes far, far more sense to say the order is $s+1$ (which equals $\max\{s+1,t+1\}$). This kind of empty dependence allows us to use the notion of order as ONE number.
Now, when is this kind of reasoning feasible? The answer is obvious : when adding the redundant variables doesn't change the context too much. When it's not too much effort. We usually measure effort in terms of "computational complexity". So, in some sense, if $s$ and $t$ are similar in computational order (note : I'm willing to explain what this means outside the post, but I'm keeping it imprecise inside) then one can augment the function with these additional variables, and say that the order of the given system is $\max\{s+1,t+1\}$.
So when would such an idea fail? Let's say $t = s^2$. Then, calling a model with $s$ time delay and $s^2$ control delay as a model with order $s^2$ is going to look very comical because you are grossly, grossly overestimating the $s$ time delay (by a whole square root : how fast does $s^2$ grow with $s$!). In such a case, it is obvious that you are better off mentioning each parameter in separation, and calling the collective the model order.
Note also, that I have mentioned only one reason that the model orders should not be kept separate. There may be a plethora of reasons as to why they should be kept separate, including the fact that an increase of a single model order could lead to computational blow up of several orders. This is, in fact the case in the papers mentioned. For them, the optimization between several AR schemes of low order suggests that even among small fluctuations of input and output order, computational gaps exist that must be sorted out.
So are you right? Don't worry, you are. But there is another right, and we will acknowledge that as well.