Solved – should i normalize dependent variable for linear regression

multiple regressionnormalizationregression

If we want to perform a multiple linear regression on the dependent variable $Y$ by independent variables $X_1$,$X_2$, etc., should I normalize the $X_i$ variables only? Or should I also normalize the dependent variable $Y$?

If I normalize $Y$ , how will I interpret the predicted values? Won't the predicted values be in the normalized form? How should I denormalize them to get the exact values?

Also, which is the best normalization method, if my $X_i$ variables are a mixture of both continuous and categorical variables?

Best Answer

Without seeing your data (especially the residuals of the final regression model) and further context, it is hard to provide you with a definitive answer.

However, when talking about regression in general, your dependent variable does not have to be normally distributed. The model's residuals on the other hand, do have to be normally distributed. Look at it this way: one of the assumptions of linear regression is that your independent variables explain the variation in the dependent variable in such a way, that the error around the model's estimated values (== the residuals) of the dependent variable is normally distributed.