Solved – Adding noise to time series data to increase training data

forecastinglstmneural networksnoisetime series

I am dealing with a weekly time series forecasting problem and I am currently investigating the use of an LSTM to make a multi-step forecast for a univariate time series. I actually have a multivariate time series but I want to know if currently an LSTM is capable of making good predictions in the univariate case. The logic being if it is not good at making predictions in the univariate case, it probably can't cope in the multivariate case. As an aside, is this assumption correct or am I doing my model a disservice by not also including these extra features?

The problem I have is that I have a relatively long forecast horizons $h= 1, 4, 13, 26$ weeks and limited input data. My data often consists of short time series of 100-300 data points. At the moment to increase the number of samples I can get from this data I frame my problem as a supervised learning problem, that is, instead of saying here are 300 sequential data points predict the next $h$ I instead say here is are a fixed amount of lags $n$, predict the next $h$ where each sample is then generated using a sliding window.

Given the relatively long timeframe of my forecast horizon this still doesn't produce enough samples and is dependent on the amount of lags used, which I currently vary depending on my target forecast horizon length.

I would like to know if it is possible to increase the amount of samples available by generating some artificial ones? I had thought about taking the time series $T$ that I have and creating two more time series $T_{\text{noise}}$ and $T_{\text{smoothed}}$. This would enable me to triple the amount of samples I have available. My thought process is that having two extra time series modified in opposing ways but still related to the first will help to provide more sequence information for the LSTM but also help to combat overfitting by changing the sequences enough.

I thought this technique must have been tried as I believe it is common in computer vision / image classification problems to artificially add noise to combat overfitting and to increase an available dataset but I haven't found much on the subject.

Is my idea flawed? I realise it can come across as a bit of a chicken and egg scenario, in that I want to predict new data points as a sequence and in order to extend an existing time series to help my network train I technically need to do just that. However by adding noise to the time series and framing it as a problem like "heres the last $x$ points, predict the next $y$" am I able to mitigate it?

Disclaimer: For those that will say, consider another forecasting method etc I am. At the same time I wish to provide a thorough review of several different methods and for example if I can find out that, an LSTM would be perfect for forecasting any time series from my dataset that has a minimum of 1000 data points, and I can verify that using this noise/smoothing process then it is an avenue I wold like to explore.

Best Answer

I am not sure if it could work but I have seen this approach in:

https://github.com/chickenbestlover/RNN-Time-series-Anomaly-Detection

It uses a RNN autoencoder for prediction and the it finds out anomalies. Before training the autoencoder it augments data adding many different levels of noise. Please check it.

Related Question