Solved – How to perform Time Series Analysis on daily data

predictive-modelsrtime series

Sorry if my question is silly but I am extremely new to Data Science and Time series analysis.

So I have Tv program viewerships for the last 1 year and want to predict for the next 2 weeks. The data is of the form:

Year, Date, Week_day, Channel, Program, start_time, end_time, length, avg_impressions

avg_impressions is the average of all impressions of the program excluding the breaks. For example, if there were 10 impressions until first break, 20 impressions after first break until second and 30 impressions after the third break to the end, avg_impressions will be 20.

I want to do a time series analysis for the prediction. The data has 211,720 entries for 14 different channels. The data is from 10/10/2015 to 9/9/2016.

I am not very sure on how to convert the “Impressions` into a time series object. I tried:

pts <- ts(train_agg$Impressions, start=c(2015, 10, 10), end=c(2016, 9, 9), frequency=357)

Is this format correct? I want to check for seasonality in the data. (I read I could tbats on the ts object for that).

Can someone please explain how to specify start dates and end dates for the vector? I read the answers from here, here and here but I am still not sure if I am doing it right.

My R code for what all i did till now:

train_data <- head(prog, 207595)
test_data <- tail(prog, 4126)
colnames(train_data)[12] <- "Impressions"
colnames(test_data)[12] <- "Impressions"
train_agg <- aggregate(Impressions~date_in_days, data=subset(train_data, Channel=="NBC" & Hour==19), mean)
test_agg <- aggregate(Impressions~date_in_days, data=subset(test_data,    Channel=="NBC" & Hour==19), mean)

pts <- ts(train_agg$Impressions, start=c(2015,10,10), end=c(2016,9,9), frequency=357)
plot.ts(pts)

pts.msts <- msts(pts,seasonal.periods=c(7,357))
model <- tbats(pts.msts)
plot(forecast(model, h=7))
forecast(model, h=7)
accuracy(model)

Is this the right way? I am totally lost. Can someone please give me pointers as to what I should do?

Best Answer

For starters, your data is too short. When measuring seasonality(ie monthly) you need 3 iterations. Can you get more data? If not, then you are left to make a lot of assumptions which can be dangerous. Post your data and what country it is from and the beginning date.

By using regression, you can solve this problem. You can consider using 11 monthly dummies, 6 day of the week dummies and holiday dummies. Not all may be significant. Not all may be constant(ie june is high and then becomes low). You need to look for outliers and build a dummy for them. You need to look for lead and lag effects around holidays. You need to consider day of the month impacts, week of the month impacts, long weekend, friday before a monday holiday, monday after a friday holiday. You might have a trend or multiple trends. You might have a change in the general volume called a level shift.