Solved – Partial Cross-correlation in R

cross correlationpartial-correlationr

I think the title is fairly self-explanatory. I want to compute the cross-correlation between two time series controlled for the values at other lags. I can't find any existing code to do this, either in R or any other language, and I'm not at all confident enough in my knowledge of statistics (or R) to try to write something myself. It would be analogous to the partial autocorrelation function, just for the cross-correlation instead of the autocorrelation.

If it helps at all, my larger objective is to look for lagged correlations between different measurements of a physical system (to start with, flux and photon index from gamma ray measurements of blazars), with the goal of building a general linear model to try to predict flaring events.

Best Answer

EDIT

After performing the same experiment as below but using pacf both in its matrix and univariate versions, the results as Ben says in his comments, are, despite close in magnitude and sign, inconsistent with regards to the univariate PACF.

After looking at the algorithm to compute the partial cross-correlation (Wei, 1990) implemented in R, I came to the conclusion that it is not the same one as the one used for a univariate PACF. This is because it uses a system of equations to solve the covariance matrix structure betweeen x & y, and so results from the matrix pacf may differ slightly from the standard version of pacf. This is also why Ben got an error when trying pacf(cbind(x,x)), because the covariance matrix is singular (perfectly collinear) and the equation system has no explicit solution. See the code of my own answer to this question in another post PCCF

PACF

Numeric results:

> pacf(ts(cbind(x,y)), lag.max = 10)$acf
, , x

              x           y
1  -0.003727913  0.09929345
2  -0.019429467  0.09901536
3  -0.094662076 -0.08654108
4  -0.115294825  0.05975806
5  -0.102362997  0.03858411
6  -0.149191058 -0.12823916
7   0.033546670  0.01312224
8   0.014324840  0.10429490
9  -0.040499352  0.09112193
10 -0.016651355  0.19938648

, , y

              x           y
1  -0.072296936 -0.08924322
2  -0.078223312  0.06256214
3  -0.031792998  0.08157497
4  -0.037541212  0.02807893
5  -0.031197429  0.17576577
6  -0.163777892  0.04700697
7  -0.096232203  0.17346797
8   0.074688604 -0.16685102
9   0.142284523  0.21034469
10 -0.005083913 -0.08787128

> pacf(x,lag.max = 10)$acf
, , 1

              [,1]
 [1,] -0.003651251
 [2,] -0.027034679
 [3,] -0.107610793
 [4,] -0.115848824
 [5,] -0.104851323
 [6,] -0.153474662
 [7,]  0.023496026
 [8,] -0.002487242
 [9,] -0.024917174
[10,] -0.004873933

I'll leave the next (earliest) explanation for informative purposes, although it was not what Ben was asking for:

===================================================================

set.seed(1)
x=rnorm(100)
y=rnorm(100)
acf(ts(cbind(x,y)), lag.max = 10)$acf
ccf(x,y,lag.max = 10)$acf

The output of the acf is:

, , 1

              [,1]          [,2]
 [1,]  1.000000000 -0.0009943199
 [2,] -0.003651251  0.0931958052
 [3,] -0.027020987  0.0838231520
 [4,] -0.107330672 -0.0846397759
 [5,] -0.112573012  0.0625416141
 [6,] -0.093379369  0.0102113266
 [7,] -0.124715177 -0.1158511804
 [8,]  0.064980847  0.0140146020
 [9,]  0.043035067  0.0842043416
[10,]  0.025965230  0.0805874051
[11,]  0.024651027  0.1577061350

, , 2

               [,1]         [,2]
 [1,] -0.0009943199  1.000000000
 [2,] -0.0770969065 -0.089335795
 [3,] -0.0752198505  0.063139650
 [4,] -0.0285379520  0.053629242
 [5,] -0.0352903171  0.016052174
 [6,] -0.0165617885  0.169277776
 [7,] -0.1527458267  0.008381153
 [8,] -0.0518997943  0.166212699
 [9,]  0.0909969284 -0.179222389
[10,]  0.1243573098  0.243623031
[11,] -0.0077009885 -0.075363415

The output of the ccf is (I highlighted row 11 for explanation purposes):

, , 1

               [,1]
 [1,]  0.1577061350
 [2,]  0.0805874051
 [3,]  0.0842043416
 [4,]  0.0140146020
 [5,] -0.1158511804
 [6,]  0.0102113266
 [7,]  0.0625416141
 [8,] -0.0846397759
 [9,]  0.0838231520
[10,]  0.0931958052
**[11,] -0.0009943199**
[12,] -0.0770969065
[13,] -0.0752198505
[14,] -0.0285379520
[15,] -0.0352903171
[16,] -0.0165617885
[17,] -0.1527458267
[18,] -0.0518997943
[19,]  0.0909969284
[20,]  0.1243573098
[21,] -0.0077009885

In order to compare both outputs, one must consider that acf computes the univariate ACFs for x & y on the diagonal of the output (that is[,,1][,1] and [,,2][,2]) and the cross-correlation in the anti-diagonal (that is[,,1][,2] and [,,2][,1]).

On the other hand, the built-in ccf computes the cross-correlation between both series and their lagas, and outputs it in one single matrix, with lag 0 (instantaneous correlation) being its midpoint value (in our example, value [11,]). This value is the first value of each anti-diagonal matrix in the acf results.

From here it is straightforward to see that:

  • if we are comparing the lags of x vs y we will be looking at the first anti diagonal matrix ([,,1][,2]), and reading the ccf from the 11th value up until the 1st one;
  • and for the comparison of the lags of y vs x, we will be looking at the second ([,,2][,1]) anti-diagonal matrix, and correspondingly at the ccf we will read from the 11th until the 21st value. This yields the same exact results when you compare the outputs.
Related Question