Solved – Are deep neural networks robust to outliers

machine learningneural networksoutliers

Tree based models such as (gradient boosting or random forest) have a lot of advantages, such as robust to collinearity and outliers.

I can see deep neural networks (MLP) are robust to collinearity. But are they robust to outliers? and Why?

Best Answer

Relative to a standard multiple regression model, I believe an MLP is much more robust to outliers. This is the case for several reasons:

1) the multiple regression has only one single shot at fitting the data. Meanwhile the MLP has so many more opportunities to fit the data by varying the number of nodes and hidden layers to use to fit the data. This more flexible fitting mechanism should allow the MLP to underweight the impact of outliers (relative to either a Y or a X variable);

2) MLPs activation functions typically use a Logit Regression mechanism (Sigmoid) or a Tangent Hyperbolic function (Tanh). The former generates intermediary outputs between 0 and 1 and the latter between -1 and +1. Those activation functions further enhance the capability of MLPs to deal with non-linear events and outliers.

3) MLPs can incorporate regularization mechanisms. The latter should assist in resolving multicollinearity and reducing the impact of outliers.

Also, to diagnose the impact of outliers on your MLPs, you can also do cross validation.

However, if your main objective is to reduce the impact of outliers there are more transparent ways to deal with that. Tree based models are certainly a good way to do that, as you mentioned. But, there is also a whole family of Robust Regression models. Some of them combine with regularization mechanisms to resolve the multicollinearity issue. And, those models are far easier to explain to a non-specialized audience.

Related Question