Solved – Best way to deal with forecasting with noisy data

arimaexponential-smoothingforecastingrvolatility-forecasting

I have a bunch of sales data. It is from distributors of 2000 different items, who service big companies and large distributors to a number of small independent stores. They sell some items which do good volume, and others where not even 100 units are sold in a year. What's more is that the method to determine true demand is not perfect – this is because if an item is out of stock, a customer who orders weekly will keep reordering until they get the stock and this will lead to an inflation in true demand as their order will be counted more than once. Conversely, going by sold data will not include demand from customers who did not reorder out of stock items.

Because of all this (as well as some other factors), the data, whilst showing some trend over large periods looks to me to be incredibly "noisy". Any trend over a short period of forecasting time would be wiped out.

Here's an example of a couple different plots.
Visual representation of noisey data

ETS from the forecast package in R could give some insight, or ARIMA (not that I know how it's fully used yet). Or do you think in such cases where there is so little information in the data that a Simple Exponential Smoothing technique will yield results as good as it gets?

Best Answer

Time series data often exhibits auto-regressive structure (ARIMA) or deterministic structure (daily/weekly/monthly effects) , sometimes both. Additionally there may be anomalies in the data (pulses/level shifts/seasonal pulses/local time trends). Sorting out the best combination while taking into account time-varying parameters and possibly time varying error variance can be a daunting task (for both humans and computers !) . You should "listen to the data" or use a well-understood prior and use it to identify/modify an appropriate model rather than trying to shoe-horn it into a predetermined guessed solution or a set of guessed solutions (auto.arima). As G.E.P. Box once supposedly said "All models are wrong but some are useful". Similar comments could be made here about responses to questions like yours.

Related Question