Solved – Finding correlation, triggers and patterns between two (or more datasets)

correlationexcelpattern recognitiontime series

I am trying to find correlations between multiple data sets, for this example i am using two pieces of data, the closing stock price for Coca-Cola and Pepsi for the last year.

I've been pointed in the direction of using the =CORREL() function in excel which has given me an indication that there is a correlation as it returns the value 0.869783523 but does not give me any clue as to triggers or patterns in the data which is what i am actually interested in.

Is there function or a workflow that i can implement in excel to give me a more detailed read out giving me an idea of triggers and/or patterns?

Best Answer

I'm not an econometrician, so forgive any abuse of terminology.

From your question and further comments, you are not looking for classic correlation, you are looking for some kind of causal, or at least temporal correlation. At first, the data you have seems to be just a vector of two variables; Coke and Pepsi closing stock prices. Looking at the Pearson product-moment correlation coefficient , which is what CORREL calculates, will give you the linear relationship of one variable with the other, but no more.

If you are looking for a trigger or some other relationship, you need more information—you need some other variable or variables which affect both of your "top-level" outcomes which can explain some relationship.

You may actually already have some data, in that if you have closing prices, you probably have the closing dates as well. A first step may be to simply see if one closing price tends to lag the other, in which case some form of regression may be appropriate.

Also, and this is not based on anything authoritative outside of a hunch, so take it for what it is worth, you may wish to regress Coke on time ane Pepsi, and then Pepsi on time and Coke, and see if one or the other regressions have the company variable as significant as well as the time variable.

Your best bet, in my opinion, though would be to try and develop some database of statistics that may affect both (time, S&P close on same date, CPI on same date, spot/future price of corn (for corn syrup) on given date) and see if you can develop a model using those statistics.

Related Question