I have time series data and want to perform multiple regression. I have three time series independent variables that i want to include in the regression model. Before doing any analysis do I need to remove the trend from all data sets ( dependent and independent variables)?
Solved – Do we need to remove trend in dependent and independent variables in Multiple regression Analysis,
multiple regressionmultivariate analysisregression
Related Solutions
You said you want "the relative contribution and co-variation among the variables." Multiple regression can give you what you're looking for, if you take advantage of what it has to offer. A correlation matrix is a very small part of what regression can yield. Assuming the predictors are relatively independent--which can be tested using tolerance or variance inflation factor statistics from a regression--the strength of the standardized coefficients will indicate the strength of each predictor's association with the outcome. T-statistics and partial and semipartial correlations can add information about this as well.
Since you have multiple dependent and independent variables, a multivariate analysis would be one way to proceed. A multivariate analysis will attempt to model the relationship between your dependent and independent variables, and as an outcome you will be able to test if those factors are significant in your model. This is useful if you want to assess the significance of the factors within such a model, but if you are interested in knowing the significance of the relationship between the covariates and one response you can run a regression the way you describe.
Suppose though, that you want to construct a model for both responses simultaneously, and assess the significance of the factors in $that$ model. Then you can use multivariate analysis of covariance (MANCOVA). MANCOVA will provide you with the contribution to the variance in the responses made by each factor, as well as their significance. Note that MANCOVA will produce both type I, II, and III sums of squares (SS). Which one is appropriate depends on the balance of your data. See here for more information on the types of SS. http://mcfromnz.wordpress.com/2011/03/02/anova-type-iiiiii-ss-explained/
If you are using R, you can determine the statistical significance of your factors by performing multivariate regression and using this as input in the manova function. The code would go something like:
#fit a multivariate regression model and then test the type I SS using MANCOVA.
fit = lm(formula = cbind(Abundance, Richness) ~ Temp_1 + Rain_1 + Sunlight_1 + Temp_2 + Rain_2 + Sunlight_2 + Temp_3 + Rain_3 + Sunlight_3 + Temp_4 + Rain_4 + Sunlight_4, data = yourData)
summary(manova(fit), test="Hotelling-Lawley")
A more thorough overview of how to perform such an analysis is provided here:
http://www.uni-kiel.de/psychologie/rexrepos/posts/multRegression.html
Note that separate regressions return the same slopes as multivariate regression, and also not that different tests besides the "Hotelling-Lawley" are possible for the MANCOVA test of type I SS, and that you can also test type II SS.
As you pointed out, PCA is another multivariate data analysis method. It may be interesting for performing exploratory data analysis, though you will not obtain the same sort of significance testing discussed above. If you perform PCA on your data, a bi-plot may be a good way to investigate interesting relationships. SAS provides some rather clear discussion interpreting the biplot: http://support.sas.com/documentation/cdl/en/imlsug/62558/HTML/default/viewer.htm#ugmultpca_sect2.htm
Best Answer
Yes, you do need to detrend all your variables (both X and Y); otherwise you are going to have a mispecified model and a classic example of a "spurious" regression. Clive Granger defined spurious regressions as regressions that often have a very high R Square (close to 1.00) but are absent any economic meaning. Spurious regressions also have typically very highly correlated residuals (Durbin Watson often < 1.2). The reason such regressions have no meaning is that many time series just grow over time. So, your model typically has no more explanatory power than a simple trend variable that counts 1, 2, 3, ...
Some people may argue that you do not need to detrend your variables if the residuals of such regressions are stationary and not too correlated. In such a case, you would have a successful Cointegration model (Clive Granger also published papers on this subject).
However, one may express some reservations on such Cointegration models using variables that are not detrended at all. Such model structure is probably not the best one to extract reliable information on the variables you have tested.
Thus, detrending all your variables (and preferably using the same or similar basis of detrending) is a good foundation to develop a well specified model.