Well, the difference is... that they are different methods. ("Can any one explain the difference between apples and oranges?")
ARIMA models are explained in any introductory time series book. (I'll never tire of recommending this free open source online forecasting textbook.) If you want to include weather info, you'd need ARIMA models with eXplanatory or eXternal information, or ARIMAX models. These are also standard.
Trees/CARTs/Random Forests are explained in any Data Science textbook, or even the Wikipedia pages. These will, of course, model explanatory variables "as-is". Your idea of using days, hours and months as features does make sense in this context. However, simply feeding independent dummies for "9-10am", "10-11am" and so forth into your model may or may not account for the fact that your observations in the 9-10am and the 10-11am time buckets will be more highly correlated than the ones in the 9-10am and the 1-2pm buckets.
A couple of random thoughts:
ARIMA(X) will have a hard time dealing with the multiple seasonalities involved (year-over-year, intra-week with people commuting to work Mon-Fri but not Sat/Sun, intra-day with more people biking during the day). You could in principle model these seasonalities using dummies in your ML models. Alternatively, there are a couple of approaches to multiple seasonalities in the context of Exponential Smoothing/State Space models.
Weather is of course highly correlated with time-of-year and time-of-day: it's hotter in summer and during the day than in winter and during the night. If you already model seasonality as above, you may find that adding weather information does not improve the forecasts very much beyond what seasonality already does.
If you want to forecast something using the weather, remember that you will need weather forecasts, too! Don't assess your out-of-sample forecasts based on how they work with actual weather - you won't know tomorrow's actual weather when you do "production" forecasting. The uncertainty in weather forecasts adds an additional source of uncertainty in your bicycling forecasts. In particular, weather forecasts are not very reliable for more than 15 days out, so they won't be very helpful for forecasting bike rides that far out. (Incidentally, getting historical weather data is far easier and cheaper than getting historical weather forecasts.)
You may want to look at the electricity price or load forecasting literature - that use case deals with many of your challenges (high frequency data, multiple seasonalities, weather influence). I haven't read this review yet, but it may be helpful.
First off, do not look to standard time series forecasting algorithms. These presuppose exactly one observations per time bucket, e.g., per day, week or month (and this observation may be "zero"). What you have, in contrast, is zero or multiple observations per time bucket. In addition, standard forecasting methods expect the time series to be continuous, but you may well have "holes" in the series where some teams do not work on some tasks or task types.
Instead, I would use standard regression models, "regression" being in the Machine Learning sense: predicting a numerical output. Just feed in your predictors and build models as usual.
If you suspect time dynamics, model these. Maybe your teams are less productive on Fridays? Then feed in Boolean dummies for day of week. Perhaps they are less productive during summer? Feed in a Fourier transform of the day of year. Possibly start with a multiple linear regression as a benchmark before trying more complex methods.
Think about what all those zeros in Hours
are: did people really finish a task in zero time, or is that really a missing piece of information, or was the task open and they did not work on it that day? As always, understanding your data is usually much more important than tweaking models. You may want to look at zero-inflated models.
In your question, you show TeamID
and TaskID
. I hope you actually have task (and/or team) features so you can actually predict something, because TaskID
sounds like an ID that was used for one task and will therefore not be used again - so you would not be able to forecast for a new TaskID
. But again, this is standard ML.
Finally, the MAPE has major shortcomings, especially if we have zeros, whose treatments makes quite a difference. Either use it as an objective measure, or if you use a "standard" loss function like MSE or likelihood, you may want to post-process predictions to find the point prediction that minimizes the expected MAPE. Actually, I have never seen a business problem that was better solved using a MAPE-optimal forecast rather than an MSE-optimal forecast.
Best Answer
You should always start with an extremely simple forecast, like the historical average. This can already be surprisingly hard to beat.
After that, as user 2974951 writes, you could apply Croston's method, which is tailored for intermittent-demand, i.e., demand with many zeros. It's not sound statistically, but it's a frequent benchmark (and implemented in most software packages).
You probably have seasonality - if only an intra-daily seasonality, with people probably renting more at certain times of day. Take a look at Seasonal Exponential Smoothing.
However, this simple seasonality is probably intermixed with an intra-weekly seasonality, with weekends probably differing from weekdays. This is a case of multiple-seasonalities. The tag wiki contains pointers to special models for this, like TBATS.
As seanv507 writes, you may also have effects of holidays, which you could model with a regression on holiday dummies. You could either use weekday and hour dummies (potentiall interaction terms between them) to capture the seasonalities, or run a regression on holidays and model residuals using a (multiply) seasonal time series method.
Note that these are in increasing order of complexity. Per above, in forecasting simple methods often work surprisingly well.
Finally, note that your choice of error measure will influence what the best forecast is. If you use the MAE (which elicits the conditional median), a forecast that is biased low will look better than one which is an unbiased expectation forecast - which all the methods above aim for. So I would very much recommend using the MSE, or RMSE. More information on this effect, with pointers to literature, can be found in section 2.12.2 in Petropoulos et al. (2021), "Forecasting: theory and practice".
Finally, I recommend this excellent free online forecasting textbook: Forecasting: Principles and Practice by Athanasopoulos & Hyndman, 2nd ed. or 3rd ed.