Solved – Time series with multiple variables and different start date

rtime series

I have a dataset where sales values of multiple products are available. Some products have data starting from Jan 2014, but some products have data starting later than that and its varying for various products. I have tried to convert the series into a pivot table with each product Id as the colummn name and then converted the dataframe to a TS startig from Jan 2014. But, the products which start later than Jan 14, for example July 14 have the first 6 values as NA.
The forecast code which uses a for loop and forecast function from the forecast package is unable to forecast all the product's sales because of NA and I get NaN or 0s as the output for these products.

My questions:

  1. Is there some other method which can be used in my case? I am thinking of dealing with individual product's data seperately and convert it to a TS based on its start date. But I am not sure how to do that.

Sample code:

library(dplyr)
    library(tidyverse)
    library(reshape2)
    library(tidyr)
    library(forecast)
    set.seed(354)
df <- data.frame(Product_Id = rep(1:100, each = 50), 
                     Date = seq(from = as.Date("2014/1/1"), to = as.Date("2018/2/1") , by = "month"), 
                     Sales = rnorm(100, mean = 50, sd= 20))


df <- df[-c(251:256, 301:312, 2551:2562, 2651:2662, 2751:2762) ,]

df_new2 <- df %>% select(Product_Id, Date, Sales) %>% spread(Product_Id, Sales)
#Convert to a time series
df_new2 <- ts(df_new2, start=c(2014,1), frequency =12)

#loop to perform a forecast of all the contracts together
fcast2 <- matrix(NA,nrow=h,ncol=ns)
for(i in 1:ncol(df_new2)){
  fcast2[,i] <- forecast(df_new2[,i], h=12, robust = TRUE, lambda = "auto", biasadj = TRUE)$mean
  }
View(fcast2)

fcast2 is unable to produce the output of all the products.
The language I am using is R.

Any idea or suggestion would be appreciated.

Best Answer

Is there some other method which can be used in my case? I am thinking of dealing with individual product's data seperately and convert it to a TS based on its start date. But I am not sure how to do that.

It may be an old question, but you could still individually forecast each time series with the forecast package of rob hyndman, you already use. But you should encode the NAs with 0 and treat the time series as an intermittent demand time series. A Time series that is allowed to have zero values inbetween. For this you need only the MASE instead of MAPE, as the MASE can deal with 0 values https://robjhyndman.com/papers/mase.pdf

In addition you could try a multiple equation approach for all sales data in your time series and treat it as VAR or VECM.

The best packages for this are the vars package from Bernhard Pfaff: https://www.pfaffikus.de/rpacks/vars/ and the tsdyn package from Stigler for VECM

Perhaps this answer also helps other people. Feel free to contact me, if you have any questions.