Solved – Can neural net extrapolate output value

neural networks

Let's assume we have a training set with $y \in \mathbb{R}$. Thus all the data is between $y_{min}$ and $y_{max}$. If we built a decision tree model it cannot return $y_{pred}$ outside the given range (using any combination of input features). Thus decision tree cannot extrapolate in terms of predicted values. Can a neural net regression model extrapolate and return $y_{pred}$ values outside the $y$ range in a training set? Does it depend on the activation function or not?

Below is my attempt to answer this question.

The output neuron of the model is just $\sum \Theta_ia_i$, where $\Theta_i$ – weight of i-th neuron on the previous hidden layer, and $a_i$ – value of activation function of that neuron. If we use logistic function then $a \in (-1;1)$. Thus maximum possible $y_{pred} = \sum \Theta_i$, assuming that all $a$ reach their maximum value around 1. But if we will use linear activation function, which doesn't have restrictions on output values of $a$ ($a \in \mathbb{R}$) the model will return $y_{pred} \in \mathbb{R}$, which can be ouside $y$ range of the training set.

Is my line of reasoning correct or there are some mistakes?

Best Answer

More generally an output neuron of a feed-forward neural net is defined as $f(\sum \Theta_i a_i)$, where $f$ is the link function of the output neuron (which was the identity function in your question) and the other notations are as in your question. The activation values of the neurons of the previous layer $a_i$ depend on the choice of their own link function (say $g$) and of the activation values of the layer before that.

You are right when you say that if all the link functions $g$ in the previous layer are bounded, then the network's output values will be bounded too, even if $f$ is identity. (There was a small confusion however, the logistic link is bounded in $(0,1)$, the hyperbolic tangent is bounded in $(-1,1)$.)

Setting all link functions ($g$ and $f$) of the network to the identity function might seem like a good idea in order to allow extrapolation outside the observed response values in the training set, but that special case of feedforward neural networks is in fact just a linear regression model, in which case you don't need the neural net framework at all!

A better solution would be to use unbounded but non-linear link functions for g, like a rectifier $g(x)= max(0, x)$. This keeps the nice "universal approximation" property of the neural net, but with unbounded outputs.

Beware, however, prediction will be less good in regions of the data that were not present in the training data. You should probably use regularization to prevent over-fitting and (hopefully) improve extrapolating prediction power.