Solved – Hourly predictions using time series

forecastingmultiple-seasonalitiestime series

I'd like to build a model based on time series. I have a dataset with records every 30 minutes for three months.

What is the difference between modeling these data with the following kinds of models?

Extracting hour/week-day/month and use them as features in machine learning algorithms
Using ARMA models

My data contains weather information. One of scenarios I am working on is predicting "the use of bikes", it's related to information like weather/temperature/wind/time (day/hour, I think that month doesn't make sense) … In such scenarios, should I use a time series ARMA models or just extract hour/week-day/month and use them as features to apply algorithms like tree/random-forest.

Can any one explain the difference, or point to paper/book to check?

Note: I am self-learner, didn't attend any data science class. Apologies if this is obvious.

Best Answer

Well, the difference is... that they are different methods. ("Can any one explain the difference between apples and oranges?")

ARIMA models are explained in any introductory time series book. (I'll never tire of recommending this free open source online forecasting textbook.) If you want to include weather info, you'd need ARIMA models with eXplanatory or eXternal information, or ARIMAX models. These are also standard.
Trees/CARTs/Random Forests are explained in any Data Science textbook, or even the Wikipedia pages. These will, of course, model explanatory variables "as-is". Your idea of using days, hours and months as features does make sense in this context. However, simply feeding independent dummies for "9-10am", "10-11am" and so forth into your model may or may not account for the fact that your observations in the 9-10am and the 10-11am time buckets will be more highly correlated than the ones in the 9-10am and the 1-2pm buckets.

A couple of random thoughts:

ARIMA(X) will have a hard time dealing with the multiple seasonalities involved (year-over-year, intra-week with people commuting to work Mon-Fri but not Sat/Sun, intra-day with more people biking during the day). You could in principle model these seasonalities using dummies in your ML models. Alternatively, there are a couple of approaches to multiple seasonalities in the context of Exponential Smoothing/State Space models.
Weather is of course highly correlated with time-of-year and time-of-day: it's hotter in summer and during the day than in winter and during the night. If you already model seasonality as above, you may find that adding weather information does not improve the forecasts very much beyond what seasonality already does.
If you want to forecast something using the weather, remember that you will need weather forecasts, too! Don't assess your out-of-sample forecasts based on how they work with actual weather - you won't know tomorrow's actual weather when you do "production" forecasting. The uncertainty in weather forecasts adds an additional source of uncertainty in your bicycling forecasts. In particular, weather forecasts are not very reliable for more than 15 days out, so they won't be very helpful for forecasting bike rides that far out. (Incidentally, getting historical weather data is far easier and cheaper than getting historical weather forecasts.)
You may want to look at the electricity price or load forecasting literature - that use case deals with many of your challenges (high frequency data, multiple seasonalities, weather influence). I haven't read this review yet, but it may be helpful.

Related Solutions

Solved – Using a time series model to forecast future values in R

The difficulty you may have with auto.arima (and arima) is that I believe you'll have to do some futzing around to accomplish your task. The predict method for arima predicts n.ahead steps beyond the end of your training data. But your training data is from 1970-2000, while you're wanting to predict in 2011-2012 (I assume). It wouldn't make sense to tell it (with monthly data) to forecast 130 months beyond the end of your training data. And in fact it sounds like you do have, say, 2011 data and want to use that to predict 2012.

I wish I knew enough about the process to offer an answer beyond, "Now you need to take the ARIMA coefficients and create your own predict.arima function." If your process is ARI (no MA), you could use R's ar, whose predict function actually does allow you to enter new data (say 2011) to predict from. You could do the I part (differencing) yourself with R's diff function. No exogenous variables, though.

Solved – Time series forecasting using R

There is NO such a thing as "most efficient methods for forecasting in R". You as a forecaster need to figure it out which model is good for the question you are answering. First of all, what is vek2? Let's first use the auto.arima in the package forecast:

> x<-ts(x, frequency=7)
> y=auto.arima(x)
> plot(forecast(y,h=60))
> lines(fitted(y), col="blue")

enter image description here

The model fits reasonably well the data. Note that we are seeing the seanoality parameter that has been estimated by auto.arima. Don't forget to double check the residual and model adequacy before using it. Now lets try ets as well.

> fit <- ets(x)
> plot(forecast(fit,h=60))
> lines(fitted(fit), col="red")

enter image description here

This model fits well too. If all the assumptions are hold, then you need to compare these two models.

Best Answer

Related Solutions

Solved – Using a time series model to forecast future values in R

Solved – Time series forecasting using R

Related Question