Solved – Short Time series Forecasting

forecastingmachine learningpredictive-modelssmall-sampletime series

I got a strange situation of working with a short time series. I want to predict number of cases of a particular procedure (Let's say Knee Surgery) that can be done in a hospital in a month.

Less Sample size or data points because I have only 2 year data.

Plotting the data showed me that the data doesn't have any trend, seasonality is not seen because I have few data points.

Can any one suggest a best approach in this case.

Best Answer

Time series expect you to provide sufficient observations to detect seasonality or trend phenomenons. If you lack data, or if there are simply no existing trend or seasonality, you cannot expect the model to find one. Let $X_{t+1}$ be the value you want to predict and $\hat{X}_{t+1}$ your prediction. For such cases, time-series model will return most of the time either :

  • The last observed value (random walk) ${\hat{X}_{t+1}}$ = $X_t$
  • The last observed value plus a constant (random walk with drift) $\hat{X}_{t+1} = X_t + c$
  • A weighted average based on a prior period (exponential smoothing) $\hat{X}_{t+1} = \sum_{k=0}^{n}\alpha_k X_{t-k}$

You may accept the model returned by an algorithm, or test one that appears more pertinent to you. While exploring your data, you can make your own assumptions, and test them using your judgement. For instance.

  • Compare year 1 and 2. On average, has the number of cases increased ? Can you can make the hypothesis that this increase will be kept for year 3 ? Is it reasonable to consider that the number of cases at month $X_{t+1}$ will be the same as month $X_{t-11}$ modulo the increase ? $\widehat{X}_{t+1} = (1+\beta)X_{t-11}$ where $\beta$ may either be the ratio between $X_{t-11} / X_{t-23}$ or $\sum_{k=0}^{11} X_{t-k} / \sum_{k=12}^{23} X_{t-k}$
  • Are there any significant distributions among cases between year 1 and year 2 ? Monthly distribution (e.g. between January and February, you observe a decrease of $\gamma$ %) ? Quarterly distribution (e.g. during summer, a decrease of $\sigma$ %) ? Can you assume that such distributions will be kept for year 3 ?

Also, do you have any additional data (features) associated to the time series ? Machine learning is useful when you have a lot of observations, and a significant number of features from which you are unable to distinguish a simple pattern. With 24 points, I highly doubt that you will find anything complex, and may as well go through a manual analysis.

Keep in mind that these assumptions can be hardly tested. You do not have a sufficient range (e.g. 3 years) to build a "model" on the two first years, and test it on the last one.

Bottom line : with limited data, you can perform only limited analysis.

Related Question