Solved – Is a skewed target variable bad for a linear regression model

logarithmmultiple regressionskewness

I'm currently learning linear regression with multiple variables using gradient descent (from a machine learning course on Coursera).

My goal is to predict housing prices in a specific city. My model currently has an $R^2 = 0.83$, and it does an OK job at predicting prices. My question concerns the skewed form of the price data.

When I plot the error residuals, they seem to follow a normal distribution. However, the original prices are heavily right skewed. I'm trying to understand if this is something I should adjust for, or if it doesn't matter.

Should I use a logarithmic transformation or the like on my target variable (house price)?

Best Answer

When I plot the error residuals, they seem to follow a normal distribution. However, the original prices are heavily right skewed. I'm trying to understand if this is something I should adjust for, or if it doesn't matter.

This doesn't matter. Only the distribution of the residuals matters. For more, see: What if residuals are normally distributed, but y is not?

Related Question