Solved – Best regression correcting for non-normality, outliers and heteroskedasticity

quantile regressionregressionrobust

We are performing a regression on cross-sectional data for $Y$ = subjective well-being (scale 0-10) and $X$ = working hours (divided into 5 dummy categories; less than 27 hours, 27-32 hours etc).

After having performed statistical tests we have the following:

Non-normality in the residuals
Heteroskedasticity (when control variables are included)
Outliers and leverage

Our question is now whether OLS still can be applied to our regression, despite the high kurtosis in the residuals (violation of the non-normality assumption)?
In that case, which is the best OLS regression to run that corrects for all the violations mentioned above (e.g. PROCREG)?

We have read that quantile regression can be appropriate as it does not require normality in the residuals. We are however only familiar with OLS regressions, and thus we do not really know what implications it will have for the other tests. Would be great to get some tips about how to best proceed now.

Furthermore, how do we perform a simple test for spatial regression in SAS (EG)?

Best Answer

I am no expert of the wellbeing literature, but I guess that a viable route is to transform the outcome variable into a dummy variable. It could equal 0 if the original y is lower than the median and equal 1 is it is higher than the median.

I think this transformation is good starting point. Usually, wellbeing is around 7.5/10 whatever country whatever study; this is why the distribution is skewed and dummy based on the median of y is better.

Of course, the second step would be to use the ordered logit model...but beware of all the related problems. As someone has already suggested, better to start from the UCLA website to seek for information on this econometric model.

Best Answer

Related Solutions

Linear Regression – Why Care About Normally Distributed Error Terms and Homoskedasticity in Linear Regression

Solved – Heteroskedasticity and Distribution of the Dependent Variable in Linear Models

Related Question