[Math] How to correlate the timestamps of 2 systems

correlation

Whenever I've done (simple) correlation in the past, I've always had 2 sets of data that had "connected" axes:

Time of Day  |  Am I Hungry?

================================
   7 AM      |     No
   8 AM      |     Yes
   9 AM      |     Yes
   ...
   11 PM     |     Yes
   12 AM     |     No

Now it's easy to see: was I hungry at 8 AM? Yes. Obviously these two data sets will not be correlated, because my hunger waxes and waynes throughout the day (I don't get hungrier or less hungry as time goes on).

I now have a problem where I have 2 different software systems that are showing bizarre errors in their logs. Each log is showing its own set of bizarre errors, and I want to see how closely they are correlated.

For instance, App Log #1 produces "Fizz Errors", whereas App Log #2 produces "Buzz Errors". I want to see if there is a correlation of Fizz Errors to Buzz Errors, because I know what produces Fizz Errors and want to know if they are also causing Buzz Errors on the other system. For each Fizz/Buzz error, I have a specific timestamp (given in YYYY-MM-DD HH:MM:ss format).

However, since each axis represents timestamps given in seconds, they don't necessarily have similar plot points. For instance there might have been a Fizz Event at 2013-04-02 21:46:58, but no such Buzz Event at that time. So as opposed to the above example, where I had an "Am I Hungry" reading for every hour of the day, I don't have the same luxury here.

So I ask: how do I correlate these two sets of timestamps so I can see if they tend to crop up at the same times? Thanks in advance.

Best Answer

You are looking for a time correlation function of the two datasets with unknown offset. The simpleminded approach is to offset one dataset with respect to the other by a variable amount, then look for a correlation between the two. For each dataset, let "event happening" be $1$ and "event not happening" be $-1$. Then if the events were perfectly correlated, the product of the two will be constant $1$ if you find the correct offset. If you look at the data, you may well see a typical duration for an event. You can then take ($\frac 12$ of that) as your search step.

You can be more formal about this by taking the Fourier transform of the datasets and looking for a correlation. Section 13.2 of Numerical Recipes has a short discussion, but you will need chapter 12 to make sense of it. Other numerical analysis books will discuss it as well.

Related Question