Solved – use the correlation between two variables when observations on each variable are autocorrelated

correlationgranger-causalityrtime series

I have two variables:

  • urban areas
  • protected areas.

My observations are urban areas and protected areas in each year. But these observations are the cumulative ones, so observations in each variable have auto-correlation.

Can I use the general correlation such as yielded by cor() in R to measure the correlation between these two variables? If not, which indicator or method can I use?

I have the scatter plot: the horizontal variable is urban area in a specific year, and the vertical variable is another one in that specific year. And these two variables are increasing as years pass. I can see these two variables present a linear relationship. And my purpose is to find a indicator which can measure this linear relationship. I actually have tested the linear regression: the urban area as independent variable, the protected area as dependent variable, and I put 14 pairs of each year into the regression model, and the coefficients can pass the t-test, and model can pass the t-test, the $R^2$ can reach more than 0.9.

I want to research the relationship between urban development and protected area development. And the scatter plot below is urban and protected area pairs on global scale for 1950-2014 with 5 year intervals (except for 2010 and 2014).

I want to test two questions: First, are these two areas (urban and protected areas) both increasing over the research period? Second, does urbanization (here I mean the development of urban area) cause the development of protected areas?

I want to use some correlation analysis to solve the first question, such as correlation, linear regression or MIC value. However, because my data are time series, I'm not sure it can be used in the calculation of correlation? So I raise this question. In addition, I don't know other methods that could be used to measure strength of linear relationship between two time series.

And for the second question, I want to use Granger causality test to test the causality relationship between these two areas statistically. I know the result of Granger causality can't be sure to determine the causality relationship. And in my opinion, the reasons to improve the development of urban areas or protected areas are both complex, and some of them may be shared. At this level, I simply want to test the causality relationship between these two variables.

scatter plot between urban and farm land, the point is a variable pair in a specific year

Best Answer

You can certainly calculate the correlation between two time series. That's a short answer.

When, as true here and as true often, there is a marked trend in both cases, the correlation is likely to be extremely high. In general, it's not especially helpful. It's not as if there was serious doubt that there would be an apparent association; that's easily imagined from looking at the graphs of time series and thinking about the corresponding scatter plot. The P-value from conventional calculations is certainly not applicable, as independence of observations clearly does not hold.

The correlation throws absolutely no light on questions of process or causation. It's just a descriptive measure of strength of linear association.

What appear to be the same or similar data as in the question appear at How to interpolate a variable with frequency of 5 years to annual data? As an exercise I calculated the correlation between the variables parea and urea there as 0.9957; and between their logarithms as 0.9911.

In fact, many of the classic examples of high but spurious correlations arise from situations where two time series both show marked trends, but for quite different reasons, including apocryphally the price of rum and the number of Methodist ministers. Here there seems likely to be substantive association, but that's not the main question.

Related Question