Time Series – Synthetic Multivariate Time Series for Anomaly Detection Using Copula Simulation

anomaly detectioncopulasimulationsynthetic datatime series

I built an anomaly detection classifier which worked perfectly with the anomaly detection task in my dataset (multivariate time series). Now I'm trying to understand what are its weakness and my idea was to generate a synthetic multivariate time series which I can manipulate mathematically by changing its distribution. I found a Copula-based method in the library called Copulas, but I do not understand how to distinguish and generate coherent data in the training set and test set.
If I fit an already existing dataset and the library simulates its distribution but changing the values, how can I expect my classifier will perform differently?

Best Answer

You need to identify the process that you're simulating. For instance, if it's VAR(1): $$\vec x_t=\vec c + \mathbf \Phi_1 \vec x_{t-1}+\vec\varepsilon_t,\\ \vec\varepsilon_t\sim\mathcal N(0,\mathbf \Sigma)$$

You can easily simulate $\vec\varepsilon_t$ from $\mathcal N(0,\mathbf \Sigma)$ using your favorite software library methods or Cholesky approach. Even if it's not normal, you can apply copulas to sample disturbances from other distributions.

Next, you induce anomalies by replacing some of generated $\vec\varepsilon_t$ with outliers or non-random patterns, such as mean shifts.