What options are there for finding out what the time lag is for different time series? I'm looking at market data here – for example, if sugar does bad in a year, it's likely that soda might be hit the next year. I've come across cross-correlation, but am not sure how to go about using it. Are there any libraries that perform fast cross-correlation if that is the only way to go?
Solved – Time lag between correlated signals
correlationcross correlationtime series
Related Solutions
Sorta.
Cross-correlation and convolution are closely linked. Cross-correlation of $f(t)$ and $g(t)$ is the same as the convolution of $\bar{f}(-t)$ and $g(t)$, where $\bar{f}$ is the complex conjugate of $f$.
For certain types of $f$s, called Hermitian functions, cross correlation and convolution and convolution would produce exactly the same results. Thus, you're correct that convolution and cross-correlation can sometimes be interchanged. Even if your function is not Hermitian, you might be able to get away with using either method, depending on your goal.
However, neither cross-correlation nor convolution necessarily involve a Fourier transform. Both transforms are defined has happening purely in the time domain, and a naive implementation would just operate there.
That said, the Convolution Theorem says that convolution in one domain is equivalent to element-wise multiplication in the other. That is $$\mathscr{F}(f\ast g) = \mathscr{F}(f) \cdot \mathscr{F}(g)$$ where $\mathscr{F}$ is the Fourier transform$^1$. With a little bit of rearrangement$, one can instead write
$$f \ast g = \mathscr{F}^{-1}\big(\mathscr{F}(f) \cdot \mathscr{F}(g)\big)$$ uses the Fourier transform to compute convolution. Similar logic lets one compute the cross correlation in the same way: $$ f \star g = \mathscr{F}^{-1} \bigg( \overline{\mathscr{F}(f)} \cdot \mathscr{F}(g)\bigg)$$
This may seem like a round-about way of performing convolution, but it can often be more efficient. Convolving two sequences of length $n$ in the time domain requires $O\bigl(n^2\bigr)$ time. However, the Fourier transform can be performed in $O\bigl(n \log n\bigr)$ time$^2$ each while the pointwise multiplication takes $O(n)$ time. If your sequences are large and of approximately equal size, this approach can be faster.
1. You may need to correct for a normalizing factor of $2\pi$ or its square root, depending on how you defined the Fourier transform.
2.In addition to the asymptotic speed-up, many FFT implementations are incredibly well-tuned, so this works both in theory and in practice! FFTW is a good place to start if you're curious about that.
I want to prove that, overall, signal B is correlated to signal A.
If you want to prove that, you could calculate the empirical correlation and estimate its statistical significance under the assumption of $i.i.d.$ observations. However, time series data is notorious for not satisfying the $i.i.d.$ assumption; the conditional means and/or variances of time series usually change with time. Hence, you need some model to describe the relation between A and B and their time development (including possibly the time development of the relationship itself). Once you have built a model and validated its assumptions, you may proceed to model-based inference. For example, you may test the model's overall significance or significance of particular coefficients or their combinations. That way you may establish (or fail to establish) significant relationships between A and B. (You may think of the $i.i.d.$ case as being a very simple model that reflects constancy of means and variances (and higher order moments) and also constancy of the relationship between A and B.)
This may be too general to be directly useful, but it should provide a framework to think and develop a further discussion within. Unfortunately, I do not yet understand your problem sufficiently well to suggest a concrete model to work with.
Best Answer
The cross-correlation between two time can be computed but is of little(none) value in assessing the time delay as statistical tests for the cross-correlation coefficients require normality (i.e. independence of successive observations ) and more. One can pre-whiten the "cause" series via an ARIMA model to create a surrogate x and then apply the ARMA coefficients to the Y variable (appropriately made stationary) to get a surrogate y. Proceed to get meaningful cross correlation coefficients which may suggest the time lag between the originally measured series (Y and X). This is referred to as Transfer Function Model Identification.