Solved – r – choosing correct nnet model

caretnnetr

Language: R

Background

data = 1800 observations (rows) x 5 variables (columns)

I am using library(caret) and training regression models using nnet() and would like to know whether my interpretation and choice of model is valid.

My response variable is in hours (therefore continuous and non-negative)

Method

In this example I am training three models, the only difference between them is the number of max iterations (maxit) (to try and 'smooth' the values at the high-end of the range).

I have also split my data into a training df_train and a test df_test dataset using createDataPartition().

# use 10-fold cross-validation
cvCtrl <- trainControl(method="repeatedcv", repeats=3)
modFit_1 <- train(v_response ~., method="nnet", trControl=cvCtrl, data=df_train, trace=TRUE, maxit=1000, linout = 1)
modFit_2 <- train(v_response ~., method="nnet", trControl=cvCtrl, data=df_train, trace=TRUE, maxit=2000, linout = 1)
modFit_3 <- train(v_response ~., method="nnet", trControl=cvCtrl, data=df_train, trace=TRUE, maxit=4000, linout = 1)

The plots of predicted vs actual values are for each model, and their RMSE values are:

modFit_1

RMSE = 17.31634

modFit_2

RMSE = 16.73134

modFit_3

RMSE = 9.526294

Questions

  1. As my response variable has to be non-negative would I immediately assume modFit_2 to be the 'best' model out of the three (even though modFit_3 has a lower RMSE)?

  2. Is there a way to ensure the predicted values are non-negative (and is my modFit_2 non-negative just by chance)?

Update

Distribution of v_response

v_response_distribution

Residual vs fitted plot

df_diag <- data.frame(residuals = modFit_3$finalModel$residuals, 
                        fitted = modFit_3$finalModel$fitted.values)

ggplot(data=df_diag, aes(x=fitted, y=residuals)) +
  geom_point()

residual_v_fitted

Best Answer

Your plots of modFit_3 looks perfect to me: a straight 45° degree line and the errors look homoscedastic. Please, check the homoscedasticity by plotting errors vs predicted values. If it is so, can't you just use max(0, predicted) for a (very) few negative values?

A general way to ensure non-negativity of the response is log-transform it before training, that is

train(log(v_response) ~., method="nnet", ...)

and then inverse transform the predicted values. But this may also have other effects, in particular change error distribution. So be careful. For deciding about log-transform it would also be helpful to see the histogram of v_response

In your code you are using linout = 1, which is linear activation function. This is very similar (though not equivalent) to linear regression. It allows negative values. You can try other activation functions, such as rectifier, but this is out of the "nnet" scope. I suspect that the default value of linout (logistic) that introduces nonlinearities in the model (actually, the main purpose of neural network) did not give you satisfactory results. But if your problem is linear by nature (seems so based on your perfect modFit_3 plot), why do you bother with nnet and don't try linear regression?

Related Question