Solved – Time series and cross correlation

cross correlation

I have two time series:

numbers of subscriptions per hours for 10 days, for a website.
the time where TV ads, for a specific TV channel, have been displayed e.g. 10:25AM on Monday 12 of August, TV AD1 displayed on CNBC as an example.
The issue with that time series is there is only the time and the date but nothing else.

I would like to do a cross correlation to see the impact, or the lack of impact, of the TV ads on the numbers of subscriptions.

But with only the time of the adverts, is this cross correlation doable? If yes, how?

I'm not asking for a specific solution designed by any of you, rather website links where I can find some ideas or any hints as I'm stuck at the moment.

Best Answer

The actual date /time/channel is an observation/transaction. A time series is a bucketing of transactions. For each type of advert,I would take the time of the advert and bucket them into hours to create a number of possibly "causal" time series. The impact of an advert may depend on the hour of the day or the day of the week or whether or not it is on a holiday or even “nearly” on a holiday.Since you only have 10 days it is not feasible to try and compute daily/holiday effects. The cross-correlation between each of these discrete advert time series can be computed (descriptive statistic) but shouldn’t be as your predictor varaibles are discrete counts and your subsription data may be autocorrelated. I would create 23 predictor series reflecting hour of the day and include these as well as the advert time series computed above into an ARMAX model.Care should be taken to identify and deal with any subscription readings that reflected either Pulses, Level Shifts, Time Trends or Seasonal Pulses as these would be assignable to omitted variables that you had not controlled for. Hope this helps.

Related Solutions

Solved – Cross-correlation of two non-stationary time series

The idea of how instantantaneously two series requires the forming of a useful ROBUST model. This is called a Transfer Function and sometimes a PDL model or ADynamic Regression. So my answer deals with forming a useful model.

Box and Jenkins suggested pre-filtering where the differencing operator identified as part of the ARIMA process was used as part of the filter to identify model form via the resultant cross-correlation . See https://onlinecourses.science.psu.edu/stat510/node/75 AND http://www.math.cts.nthu.edu.tw/download.php?filename=569_fe0ff1a2.pdf&dir=publish&title=Ruey+S.+Tsay-Lec1 (ignoring the discussion of the CORNER METHOD ) AND http://autobox.com/cms/images/dllupdate/TFFLOW.png . However if both X and Y are I(1) .. it is still possible that Y and X can related without any differencing at all which suggests that alternative approaches to incorporating differencing in the TF model or not be on the table.

cross-correlations between two series are DESCRIPTIVE but not INFERENTIAL ..http://finpko.faculty.ku.edu/myssi/FIN938/Yule.Spurious%20Regression.JRSS_1926.pdf . Apparently GOOGLE folks don't read the classics when they suggest cross-correlations as a way of assessing predictability en route to causation.

why don't you post your data and I will try and help you.

Solved – Preparing data for cross-correlation time series

When attempting to detect cross-correlation between two time series, the first thing you should do is make sure the time series are stationary (i.e. have a constant mean, variance, and autocorrelation).

The reason this is important is because a correlation is looking to measure a linear relationship between two variables. Presence of a time series trend interferes with gauging a true correlation between two time series variables, i.e. is it a true correlation or simply due to chance.

In this regard, firstly use the Dickey-Fuller test to screen for stationarity (it would help if you specify the software package you are using, I am using Python in this instance). Suppose you have two time series x and y:

xdf = ts.adfuller(x, 1)
ydf = ts.adfuller(y, 1)

Here's some sample output:

xdf
(-3.0704779047168596, 0.028816508715839483, 0, 106, {'1%': -3.4936021509366793, '5%': -2.8892174239808703, '10%': -2.58153320754717}, -723.247574137278)
ydf
(-2.949959856756157, 0.03983919029636401, 1, 105, {'1%': -3.4942202045135513, '5%': -2.889485291005291, '10%': -2.5816762131519275}, -815.3639322514784)

In this instance, we have p-values below 0.05, so the series do not need to be differenced for stationarity. In the case that we did, it would be necessary to difference the series. The following tutorial might help you.

Now, it is a matter of calculating the cross-correlation between x and y, and generating the lags:

# Calculate correlations
cc1 = np.correlate(x - x.mean(), y - y.mean())[0] # Remove means
cc1 /= (len(x) * x.std() * y.std()) #Normalise by number of points and product of standard deviations
cc2 = np.corrcoef(x, y)[0, 1]
print(cc1, cc2)

Upon obtaining the cross-correlation coefficient, the lags can be generated and the autocorrelations calculated:

# Generating lags
lg = 108
x = np.random.randn(lg)