Solved – neural network for regression – actual data vs predicted data

neural networksrregression

I'm building a neural network (for regression) in R using the nnet() package.

My dependent variable's between -0.454 and 1.000, so I squared the variable to get a 0-1 range for my model, and then I took the square root to restore the original scale.

When I plot the actual data and the output of my model using the predict() function, I find the predicted data points to be of a single value (0.560).

It's my first time building a NN (previously I've stuck to linear regression models), so I'm not sure what's going on. Is this single value some kind of averaged output, or do I need to look into the structure/size of my network?

Thanks!

enter image description here

nnet.fit <- nnet(var_out^2 ~ var_in1  + var_in2 + var_in3, data=my data, size=2) 
nnet.predict <- predict(nnet.fit)^(0.5) 
plot(data$eq5d.var_out, nnet.predict,                  
     main="NN predictions vs actual",
     xlab="Actual")

Best Answer

Disclaimer: I’m not very familiar with the specifics of the neural network implementation in R. I have quite a lot of experience with neural networks in other languages and the theory of these techniques and other more classical machine learning techniques.


Just looking at your data normalization approach it seems like you may be unintentionally setting all negative values equal to their positive equivalents when you may really want them to retain a unique representation but map to the interval [0 1].

I would instead suggest that you subtract the minimum data value (-0.454) to each data sample x then subsequently divide by the data range (1.454). Your normalization formula would then be: $\hat{x} = (x+0.454)/1.454$

To see the error of your approach imagine that you’d like to map the input value $x_1=-0.4$ to the output $y_1=-0.4$ and the input value $x_2=0.4$ to the output $y_2=0.4. This is basically training your regression model to perform an identity mapping. Using your data normalization, an input of -0.4 would be mapped to 0.4 and the corresponding output would be 0.4 as well instead of the desired -0.4. This situation remains true for all negative numbers under your data normalization.


Regarding the nnet output of a constant value, check what type of non-linearity is being used after the linear mapping of each layer. Some classical non-linearity functions (sigmoid, tanh) are known to succumb to saturated gradients, or disappearing gradients during training which causes the network to stop learning. This can occur if the data isn’t normalized to the correct range (modern practice is to map to the interval [-1 1] so that gradients are realized uniformly as positive and negative numbers), or if the gradients aren’t properly initialized before training begins.

I would suggest printing/plotting the values of some of the network gradients periodically during training to observe if they saturate (all set to a constant value of 1 or inf) or vanish (all set to 0).