Solved – forecasting multivariate time series (with categorical variables) in R

categorical dataforecastingmultivariate analysisrtime series

I want to forecast future(next 20 days) sales with sample dataset.

enter image description here

This is just a sample data and the actual data is from Jan 2014 to Dec 2016. As you can see, sales tend to increase as time goes by, and usually it gets higher when variable(c) is 'no'.

Time series forecasting cases that were posted on Stackoverflow or other Websites were either univariate time series or multivariate time series without categorical variables. I definitely don't want to ignore the variable(c) in this case.

I tried to use Random forest and XGBOOST but the result was really bad. It is because those decision-tree based models don't work well for extrapolation analysis. I am pretty sure regression based model will work well for this case but linear regression can't read the categorical variable and also, it won't capture the seasonality and weekdays trend.

To sum up, it would be great if you can recommend me a good future forecasting model that satisfies those 3 things:

  1. Captures the sales increasing trend
  2. Captures the seasonality and weekdays trend
  3. Treats not only numerical variables but also categorical variables

Thank you.

Best Answer

A linear regression function like lm in R will correctly interpret your categorical variable. Since it ony has two states, you might recode it as 0/1 anyway.

I think what you want is an ARIMAX model (ARIMA with eXogeneous variables). Alternatively, you might want to use a state-space model which also allows to introduce regressors (even with time-varying coefficients). For the first approach, if you are using R you might turn to package forecast (Arima function). For the second, you might look at packages dlm or KFAS.

Related Question