Solved – Feedforward Neural networks for Regression confusion

neural networksnonlinear regressionregression

I’m a bit confused about the concept of using feedforwrd neural networks via backpropagation to model a nonlinear relationship between the input and output variable in a regression setting. Can this be done? I.e., could such an ANN deduce a relationship like $y = \sin(x^2) + x^3$?

If it’s not easy to do this, does it have to do with the fact that in regression problems the output layer activation function is linear?

Thanks.

Best Answer

Yes, feedforward neural nets can be used for nonlinear regression, e.g. to fit functions like the example you mentioned. Learning proceeds the same as in other supervised problems (typically using backprop). One difference is that a loss function that makes sense for regression is needed (e.g. squared error). Another difference is the output layer. If the target outputs are general real numbers, then it makes sense to use a linear output layer. This is because other activation functions are bounded. For example, sigmoidal units (which might be used for output in a classification problem) can only output values between 0 and 1.

If we connect the input layer directly to a linear output layer, network output will be a linear function of the input. So, we'll be performing linear regression. To fit nonlinear functions, we need one or more hidden layers with nonlinear activation functions.

For example, suppose we want to fit your example function. There's a single input $x \in \mathbb{R}$. We feed it through one or more nonlinear hidden layers, each containing multiple units. The output of the last hidden layer (with $p$ units) is $h(x) = [h_1(x), \dots, h_p(x)]$. There's a single, linear output unit with weights $w = [w_1, \dots, w_p]$ and bias $b$. So, the output of the entire network is:

$$f(x) = b + \sum_{i=1}^p w_i h_i(x)$$

We can see from this expression that the output is a weighted sum of nonlinear basis functions, where each basis function is the output of a unit in the last hidden layer. Say we minimize the squared error using backprop. What we're essentially doing is fitting the function by learning an adaptive set of basis functions and their weights.