Solved – use the correlation between two variables when observations on each variable are autocorrelated

correlationgranger-causalityrtime series

I have two variables:

urban areas
protected areas.

My observations are urban areas and protected areas in each year. But these observations are the cumulative ones, so observations in each variable have auto-correlation.

Can I use the general correlation such as yielded by cor() in R to measure the correlation between these two variables? If not, which indicator or method can I use?

I have the scatter plot: the horizontal variable is urban area in a specific year, and the vertical variable is another one in that specific year. And these two variables are increasing as years pass. I can see these two variables present a linear relationship. And my purpose is to find a indicator which can measure this linear relationship. I actually have tested the linear regression: the urban area as independent variable, the protected area as dependent variable, and I put 14 pairs of each year into the regression model, and the coefficients can pass the t-test, and model can pass the t-test, the $R^2$ can reach more than 0.9.

I want to research the relationship between urban development and protected area development. And the scatter plot below is urban and protected area pairs on global scale for 1950-2014 with 5 year intervals (except for 2010 and 2014).

I want to test two questions: First, are these two areas (urban and protected areas) both increasing over the research period? Second, does urbanization (here I mean the development of urban area) cause the development of protected areas?

I want to use some correlation analysis to solve the first question, such as correlation, linear regression or MIC value. However, because my data are time series, I'm not sure it can be used in the calculation of correlation? So I raise this question. In addition, I don't know other methods that could be used to measure strength of linear relationship between two time series.

And for the second question, I want to use Granger causality test to test the causality relationship between these two areas statistically. I know the result of Granger causality can't be sure to determine the causality relationship. And in my opinion, the reasons to improve the development of urban areas or protected areas are both complex, and some of them may be shared. At this level, I simply want to test the causality relationship between these two variables.

scatter plot between urban and farm land, the point is a variable pair in a specific year

Best Answer

You can certainly calculate the correlation between two time series. That's a short answer.

When, as true here and as true often, there is a marked trend in both cases, the correlation is likely to be extremely high. In general, it's not especially helpful. It's not as if there was serious doubt that there would be an apparent association; that's easily imagined from looking at the graphs of time series and thinking about the corresponding scatter plot. The P-value from conventional calculations is certainly not applicable, as independence of observations clearly does not hold.

The correlation throws absolutely no light on questions of process or causation. It's just a descriptive measure of strength of linear association.

What appear to be the same or similar data as in the question appear at How to interpolate a variable with frequency of 5 years to annual data? As an exercise I calculated the correlation between the variables parea and urea there as 0.9957; and between their logarithms as 0.9911.

In fact, many of the classic examples of high but spurious correlations arise from situations where two time series both show marked trends, but for quite different reasons, including apocryphally the price of rum and the number of Methodist ministers. Here there seems likely to be substantive association, but that's not the main question.

Related Solutions

Solved – How to identify that relationship between two variables are monotonic or not

Your understanding of Spearman's rank-correlation measure seems wrong. There is no monotonicity assumption in applying it. In fact it was designed exactly for the purpose of measuring how monotonic the relationship is: if it is 1 (resp. -1), then a higher x value means a higher (resp. lower) y value.

The closer the value is at 0, the more you can say "the relationship (if any) is non-monotonic".

Now if you want to distinguish a weak non-monotonic relationship from a strong non-monotonic relationship, we need to get a little bit creative: One option is to compare the square of Spearman's rank correlation with the R-squared from a rank-regression with non-linear parts such as squares or splines: $$ \text{rank}(y_i) = \alpha + \beta_1 \text{rank}(x_i) + \beta_2 \text{rank}(x_i)^2 + \varepsilon_i $$ A low squared Spearman's rank correlation but high R-squared from such regression indicates a strong non-monotonic relationship.

A small example for illustration:

x <- seq(0, pi, by = 0.01)
y <- sin(x)
plot(y ~ x)

cor(rank(y), rank(x))^2 # almost 0
summary(lm(rank(y) ~ poly(rank(x), 2))) # Multiple R-squared:  0.9375

# More safe (works in almost any case):
require(splines)
summary(lm(rank(y) ~ ns(rank(x), df = 6))) # Multiple R-squared:  0.9986

So here, we would speak of a strong non-linear relationship. A quick look at the scatter plot: enter image description here

Solved – How to understand mutual Granger causality

Consider the following examples:

Your time series are the gross domestic products (GDP) of France and Germany. As the two country’s economics are strongly interacting, a strong French economy is likely to give rise to an improvement in the German economy and vice versa. Thus, knowing the GDP for France allows you to better predict the German GDP and vice versa. The time series Granger-cause each other.
Your time series are the populations of two species in a complex ecosystem, which are predator and prey towards each other. A high population of the prey is good for the predator and a high population of the predator is bad for the prey. Again, knowing either population helps to predict the other and both time series Granger-cause each other.

In general, mutual Granger causality occurs whenever two systems are mutually interacting with each other, which is the default interaction. One-directional Granger causality occurs in the rare case where you have a one-directional interaction between systems (for example the weather influences the performance of your wind turbine, but not vice versa).

Also note that two systems that do not interact or only interact in one direction but are influenced by a third one may be mutual Granger-causal.

Best Answer

Related Solutions

Solved – How to identify that relationship between two variables are monotonic or not

Solved – How to understand mutual Granger causality

Related Question