Solved – Neural networks bounded output

machine learningneural networks

I am using a neural network with a linear activation function as the output function. My response is continuous and cannot be negative nor can it exceed a certain value. the value is usually between 0 and 1000.

I have used a log response to avoid negative values but When I run the neural network some values will exceed 1000 and be unreasonable since these values are not possible.

What are my options if I want bounded output and at the same time I want to use mini-batch stochastic gradient descent as the optimization function.

I have tried scaling the response between -1 and 1 and using tanh activation function as the output function and also sigmoid as the activation function and scaling the response between 0 and 1. These prevents the predicted values to be outside the bounds 0 and 1000 but does not result in an overall better model in terms of RMSE and MAE.

Is there other models one could try?
Is tanh and sigmoid as the activation good shoices for a continuous output?

I'm also wondering if anyone has any suggestions of things to try to improve the neural network with a linear output activation function.

Help would be appreciated.

Best Answer

A trick for bounded output range is to scale the target values between (0,1) and use sigmoid output + binary cross-entropy loss.

This is often used for image data, where all the pixel values are between (0,255).
Say $a=wh+b$ is the activation of the last layer, for sigmoid output + binary cross-entropy loss $$E(a,t')=t'\log\sigma(a)+(1-t')\log(1-\sigma(a)),\quad \frac{\partial E}{\partial a}=\sigma(a)-t'$$ where $t'$ is the scaled target value. The derivative wrt $a$ is just prediction - target, which is somewhat similar to the derivative of using unbounded activations + MSE.