Solved – Improving a linear regression: Add predictors or change model

multiple regressionpredictortime series

I am trying to model a time series variable $Y_{t}$ with $4$ physical predictor variables. I used the following linear regression:

$Y=\beta_{0}+\beta_{1}f_{1}(X_{1})+\beta_{2}f_{2}(X_{2})+\beta_{3}f_{3}(X_{3})+\beta_{4}f_{4}(X_{4})+\epsilon$ where $f_{i} \in \{\log, \tan, \sin, \cos,1/.,Id\}$.

My best model gives me an adjusted determination coefficient $R^2_\rm{adjusted}$ of $0.87$.
But I know that this indicator is not completely reliable. So I would like to know if I can improve my model.

If yes, how can I know if I have to:

  1. Change the model of regression: add some polynomials or other depedencies?
  2. Look for more predictors $X_{i}$ that I may have not discovered yet?

Any ideas would be nice!

Edit: My true purpose

$Y$ is a value which is measured each day, not at the same hour. What I really want to do is studying the variation of $Y$ with time and "cleaned" from the other external variables $X_{i}$ which are "blurring" my data.

So thanks to my regression I want to find the part of $Y$ which depends on this external varying factors, and then remove this part. At the end, if my estimation $\hat{Y}$ is correct, $Y-\hat{Y}$ will correspond to my data "unblurred", and I will be free to sudy a "normalized" or "cleaned" version of $Y$ with time.

I tested my model for another set of data of $Y$ measured a year ago, and it also fits the data with a $R^2_{adjusted}$ around $0.8$. And I can see graphically that the model is not completely insane, but not perfect either.

Best Answer

In my field (social science using cross sectional surveys), an adjusted R squared of .87 would be much too large. That would be a sure sign that you have done somehting meaningless like predict something with a second measure of itself. So whether or not you need to improve your model depends on the context, which you did not give us.

If you are looking for alternative transformations of your explanatory/right-hand-side/predictor-variables you could consider fractional polynomials:

Royston P, Altman DG. (1994): Regression using fractional polynomials of continuous covariates: parsimonious parametric modelling (with discussion). Applied Statistics, 43:429-467.

Royston P, Ambler G, Sauerbrei W. (1999): The use of fractional polynomials to model continuous risk variables in epidemiology. International Journal of Epidemiology, 28:964-974.

Royston P, Sauerbrei W. (2004): A new approach to modelling interactions between treatment and continuous covariates in clinical trials by using fractional polynomials. Statistics in Medicine, 23:2509-2525.

Royston P, Sauerbrei W. (2007): Improving the robustness of fractional polynomial models by preliminary covariate transformation: a pragmatic approach. Computational Statistics and Data Analysis, 51:4240-4253.

Royston P, Sauerbrei W (2008): Multivariable Model-Building - A pragmatic approach to regression analysis based on fractional polynomials for continuous variables. Wiley.

Sauerbrei W. (1999): The use of resampling methods to simplify regression models in medical statistics. Applied Statistics, 48, 313-329.

Sauerbrei W, Meier-Hirmer C, Benner A, Royston P. (2006): Multivariable regression model building by using fractional polynomials: Description of SAS, STATA and R programs. Computational Statistics & Data Analysis, 50:3464-3485.

Sauerbrei W, Royston P. (1999): Building multivariable prognostic and diagnostic models: transformation of the predictors by using fractional polynomials. Journal of the Royal Statistical Society A, 162:71-94.

Sauerbrei W, Royston P, Binder H (2007): Selection of important variables and determination of functional form for continuous predictors in multivariable model building. Statistics in Medicine, 26:5512-28.

Related Question