I would like to apply one or more data mining techniques to a dataset, in order to see the effect advertising has on sales.
I am working from this dataset. It has 36 consecutive entries of monthly data for both sales and advertising – it's very small.
I exported the dataset to a ".csv". I deleted the date column, because I will use R's ts (time series object). The ".csv" now looks like this:
Advertising,Sales
12,15
20.5,16
21,18
..., ..., ...
23.4,17
16.4,1
The example coded below works. However, I had to split the matrix into two lists, because of the HoltWinters()
function. I would prefer to analyse Advertising and Sales together at the latter stages. What other data mining techniques may be more beneficial?
data <- read.csv("./advertising_sales.csv", header=TRUE)
data_ts <- ts(data, start = c(2011,1), frequency = 12)
print(data_ts) # to check data has been correctly added
> Jan 2012 13 17.3 21
+ Feb 2012 14 25.3 29
+ ...
+ Nov 2013 35 23.4 17
+ Dec 2013 36 16.4 1
plot(decompose(data_ts))
data_ts_ad <- data_ts[,1] #assign advertising as list, for HoltWinters
data_ts_sa <- data_ts[,2] # assign sales as list, for HoltWinters
#do HoltWinters for advertising
plot(HoltWinters(data_ts_ad))
data_ts_ad.hw <- HoltWinters(data_ts_ad)
predict(data_ts_ad.hw,n.ahead=9)
> Jan Feb Mar Apr May Jun Jul Aug
+ 2014 18.52852 25.47521 27.16683 36.41340 38.14678 33.04452 33.22488 32.12758
Sep
+ 2014 32.58964
plot(data_ts_ad,xlim=c(2010,2014))
lines(predict(data_ts_ad.hw, n.ahead=24), col=2)
#do HoltWinters for sales
plot(HoltWinters(data_ts_sa))
data_ts_sa.hw <- HoltWinters(data_ts_sa)
predict(data_ts_sa.hw,n.ahead=9)
> Jan Feb Mar Apr May Jun Jul Aug
+ 2014 11.05723 23.27877 50.06859 57.22696 61.50669 26.35195 62.26159 70.83347
Sep
+ 2014 23.18957
plot(data_ts_sa,xlim=c(2010,2014))
lines(predict(data_ts_sa.hw, n.ahead=24), col=2)
I recently came across a book called R and data mining: Examples and Case Studies by Yanchang Zhao. It has excellent worked examples and this is where I have found inspiration. However, I can't get my small brain to think which techniques can be applied to this dataset.
I am new to R, so please try dumb-down your answers slightly.
EDIT: Output of data_ts is given below.
dput(data_ts)
structure(c(12, 20.5, 21, 15.5, 15.3, 23.5, 24.5, 21.3, 23.5,
28, 24, 15.5, 17.3, 25.3, 25, 36.5, 36.5, 29.6, 30.5, 28, 26,
21.5, 19.7, 19, 16, 20.7, 26.5, 30.6, 32.3, 29.5, 28.3, 31.3,
32.2, 26.4, 23.4, 16.4, 15, 16, 18, 27, 21, 49, 21, 22, 28, 36,
40, 3, 21, 29, 62, 65, 46, 44, 33, 62, 22, 12, 24, 3, 5, 14,
36, 40, 49, 7, 52, 65, 17, 5, 17, 1), .Dim = c(36L, 2L), .Dimnames = list(
NULL, c("Advertising", "Sales")), .Tsp = c(2006, 2008.91666666667,
12), class = c("mts", "ts", "matrix"))
Best Answer
Given that you have a time series, with possible influences of trend and seasonality on sales, I recommend that you look for time series techniques that can handle causal effects such as advertising. This thread should be a good starting point, although your focus appears not to be forecasting.
Try something like this:
This will build an ARIMAX model for sales, with advertising as an external variable. You can then do
summary(model)
to see, e.g., parameter estimates.We see that ARIMAX believes that each unit of advertising increases sales by 1.64. You can plot:
If you have future values
data_ts_ad_future
for your advertising, you can forecast and plot point forecasts and prediction intervals: