I have a binary time series: We have 2160 data (0=didn't happen, 1=happened) for one-hour period in 90 days.
I want to forecast after these 90 days, where the next 1 will happen, and also Extend this provision for next one month.
Best Answer
One approach might be to assume that the Bernoulli sequence can be described by a latent Normal random variable using the Probit transformation. That is your realized $X_t \sim Bernoulli(p_t)$ where $p_t \sim \Phi^{-1}(Y_t)$ and $Y \sim N(\mu, \Sigma)$. This way you can place whatever time-series (e.g. ARIMA) structure you like on your $Y$ variable and then use standard time-series techniques to predict future observations (e.g. Holt-Winters). Should be possible to code something like this up in Stan or JAGS, but you might not get great predictions given the "glass darkly" view the Bernoulli process gives you of the latent state.
Well, the difference is... that they are different methods. ("Can any one explain the difference between apples and oranges?")
ARIMA models are explained in any introductory time series book. (I'll never tire of recommending this free open source online forecasting textbook.) If you want to include weather info, you'd need ARIMA models with eXplanatory or eXternal information, or ARIMAX models. These are also standard.
Trees/CARTs/Random Forests are explained in any Data Science textbook, or even the Wikipedia pages. These will, of course, model explanatory variables "as-is". Your idea of using days, hours and months as features does make sense in this context. However, simply feeding independent dummies for "9-10am", "10-11am" and so forth into your model may or may not account for the fact that your observations in the 9-10am and the 10-11am time buckets will be more highly correlated than the ones in the 9-10am and the 1-2pm buckets.
A couple of random thoughts:
ARIMA(X) will have a hard time dealing with the multiple seasonalities involved (year-over-year, intra-week with people commuting to work Mon-Fri but not Sat/Sun, intra-day with more people biking during the day). You could in principle model these seasonalities using dummies in your ML models. Alternatively, there are a couple of approaches to multiple seasonalities in the context of Exponential Smoothing/State Space models.
Weather is of course highly correlated with time-of-year and time-of-day: it's hotter in summer and during the day than in winter and during the night. If you already model seasonality as above, you may find that adding weather information does not improve the forecasts very much beyond what seasonality already does.
If you want to forecast something using the weather, remember that you will need weather forecasts, too! Don't assess your out-of-sample forecasts based on how they work with actual weather - you won't know tomorrow's actual weather when you do "production" forecasting. The uncertainty in weather forecasts adds an additional source of uncertainty in your bicycling forecasts. In particular, weather forecasts are not very reliable for more than 15 days out, so they won't be very helpful for forecasting bike rides that far out. (Incidentally, getting historical weather data is far easier and cheaper than getting historical weather forecasts.)
You may want to look at the electricity price or load forecasting literature - that use case deals with many of your challenges (high frequency data, multiple seasonalities, weather influence). I haven't read this review yet, but it may be helpful.
Best Answer
One approach might be to assume that the Bernoulli sequence can be described by a latent Normal random variable using the Probit transformation. That is your realized $X_t \sim Bernoulli(p_t)$ where $p_t \sim \Phi^{-1}(Y_t)$ and $Y \sim N(\mu, \Sigma)$. This way you can place whatever time-series (e.g. ARIMA) structure you like on your $Y$ variable and then use standard time-series techniques to predict future observations (e.g. Holt-Winters). Should be possible to code something like this up in Stan or JAGS, but you might not get great predictions given the "glass darkly" view the Bernoulli process gives you of the latent state.