Suppose $v$ is an $n$-ary vector with entries from the set $\{0,1\}$ (i.e. a vector of ones and zeros).
A paper I am reading defines the "auto-correlation sequences" $$v*v$$ where $*$ denotes the correlation operator.
1) What is an auto-correlation sequence of a vector?
2) What is the correlation operator? (I'm assuming it can be applied to two distinct vectors too)
My first guess was that to auto-correlate a vector you try all the possible rotational permutations of the vector and measure the cosine of the angle between each permuted vector with the original. However, Mathematica's CorrelationFunction on $\{1,0\}$ with $lag=0$ returns 1 and with $lag=1$ returns $-\frac{1}{2}$, which shoots down my theory since I would expect orthogonal vectors to have $0$ correlation. So what is Mathematica doing here?
Best Answer
The sample correlation of vectors $(X_1, \dots, X_n)$ and $(Y_1, \dots, Y_n)$ is
$$\rho_{(X,Y)} = \frac{\frac{1}{n-1}\sum_{i=1}^n (X_i - \bar X)(Y_i - \bar Y) }{S_XS_Y},$$ where $\bar X, \bar Y$ are the respective sample means and $S_XS_Y$ are the respective sample standard deviations.
Roughly speaking, the sample autocorrelation of lag $\ell$ of a vector $(X_1, \dots X_n)$ is the sample correlation of the vector $(X_1, \dots, X_{n-\ell})$ and and the lagged vector $(X_\ell, X_{\ell + 1}, \dots, X_n).$
Various refinements are used in specific applications. Perhaps the one you are looking for is of the following form:
$$\rho_\ell = \frac{\sum_{i=1}^{n-\ell} (X_1 - \bar X)(X_\ell - \bar X) }{(n-1)S_X^2},$$ Notice that $\bar X$ and $S_X^2$ are based on the entire sequence. Also, when $\ell=0,$ we have $\rho_\ell = 1.$ See Wikipedia at the last bullet under Estimation.
As I recall, this is used in the R function
acf
: