Solved – Wrong predictions in linear regression

regressionresiduals

My predictions are far off the actual values, and I really do not understand what I am doing wrong. I have 20 dummy independent variables, and a continuous dependent variable. The dataset is small, 1000 points approx. The best result I got so far is using generalized linear regression, Poisson link function, and include all interactions terms. Only after introducing interactions terms, I actually find that some (less than 50%) of my prediction actually somewhat follow the true value. What else could I be missing to explain the remaining data? Could it be that simply my independent variables are not good predictors for the remaining part of the data?

Below I show you the predicted (y) vs observed (x) dependent variable,
the residuals (y) vs actual dependent variable (x), and the residual (y) vs the predicted dependent variable (x).
enter image description here

Best Answer

The third plot is the most useful. Your data is showing heteroskedasticity, that is, the variance of the error term is not constant. It is clear by the third plot. Your error should be randomly distributed around zero. The others plots are not very useful to me at this point. Plot $\sqrt{\hat{e}}$ (e is the error) vs $\hat{y}$ too and check it. I suggest to use Generalized LS estimator, or try a transformation of $y$, either log or Cox transformation.