Solved – Neural Networks Regression Model

neural networksregressiontensorflow

I'm working on estimation of pressure based on a noisy sensor for an R&D project. I've achieved good results with GBM \ Random Forest using the excellent xgboost software library. I'm trying to compare this model to a Tensorflow neural network regression. I know this may not be the optimal approach, as highlighted here and the simplistic models recommended here.

Is there a good neural network model (not necessarily deep) that can be used for regression? I am aware that neural nets usually have "smooth" activation functions which can be used for regression, however I don't know which model is useful and\or is competitive with more classic "statistical" regression models (such as Lasso, ElasticNet etc.).

Best Answer

These are different weight (coefficient) regularization methods. Weight regularization modifies the objective function that we minimize by adding additional terms that penalize large weights. The suitability of the specific weight regularization you want to apply depends on your data and use case.

Lasso (L1) regularization is widely used in domains with massive datasets where efficient and fast algorithms are essential. The lasso is not robust to high correlations among predictors and will arbitrarily choose one and ignore the others and break down when all predictors are identical. The lasso penalty expects many coefficients to be close to zero, and only a small subset to be larger (and nonzero).

Ridge (L2) regularization is ideal if there are many predictors all with non-zero coefficients and drawn from a normal distribution. It shrinks the coefficients of correlated predictors equally towards zero, and is therefore suitable for cases with many predictors, each having small effects on the outcome. L2 regularization prevents coefficients of linear regression models with many correlated variables from being poorly determined and exhibiting high variance.

Elastic net is an extension of the lasso that is robust to extreme correlations among the predictors. It uses a mixture of the L1 (lasso) and L2 (ridge) penalties, and was first proposed for analyzing high dimensional data.

For further details on these three, see [1].

A final, (deep) neural network specific weight regularization with multiple hidden layers is the Max norm constraint.

Max Norm regularization has a similar goal of attempting to restrict the weights from becoming too large. Max norm constraints enforce an absolute upper bound on the magnitude of the incoming weight vector for every neuron and use projected gradient descent to enforce the constraint. In other words, anytime a gradient descent step moved the incoming weight vector such that L2||w||>c, we project the vector back onto the sphere (centered at the origin) with radius c.

Alternative ways to prevent over-fitting

Note that you can regularize a neural net with other methods as well, such as:

i. modifying the number of hidden units, where higher number of hidden units refer to a more complex model, and you can carry out model comparison by looking at the goodness of fit or MLE and a complexity penalty as in AIC or BIC.

ii. earlystopping - where you stop training your model once you observe that generalization is deteriorating, i.e. the test error starts growing.

iii. apply drop-out to prevent co-adaptation of neurons. This has a lengthier explanation, I am giving you the link to the original Srivastava & Hinton, et al paper instead.