Wavelets are useful to detect singularities in a signal (see for example the paper here (see figure 3 for an illustration) and the references mentioned in this paper. I guess singularities can sometimes be an anomaly?
The idea here is that the Continuous wavelet transform (CWT) has maxima lines that propagates along frequencies, i.e. the longer the line is, the higher is the singularity. See Figure 3 in the paper to see what I mean! note that there is free Matlab code related to that paper, it should be here.
Additionally, I can give you some heuristics detailing why the DISCRETE (preceding example is about the continuous one) wavelet transform (DWT) is interesting for a statistician (excuse non-exhaustivity) :
- There is a wide class of (realistic (Besov space)) signals that are transformed into a sparse sequence by the wavelet transform. (compression property)
- A wide class of (quasi-stationary) processes that are transformed into a sequence with almost uncorrelated features (decorrelation property)
- Wavelet coefficients contain information that is localized in time and in frequency (at different scales). (multi-scale property)
- Wavelet coefficients of a signal concentrate on its singularities.
I think the key is "unexpected" qualifier in your graph. In order to detect the unexpected you need to have an idea of what's expected.
I would start with a simple time series model such as AR(p) or ARMA(p,q). Fit it to data, add seasonality as appropriate. For instance, your SAR(1)(24) model could be: $y_{t}=c+\phi y_{t-1}+\Phi_{24}y_{t-24}+\Phi_{25}y_{t-25}+\varepsilon_t$, where $t$ is time in hours. So, you'd be predicting the graph for the next hour. Whenever the prediction error $e_t=y_t-\hat y_t$ is "too big" you throw an alert.
When you estimate the model you'll get the variance $\sigma_\varepsilon$ of the error $\varepsilon_t$. Depending on your distributional assumptions, such as normal, you can set the threshold based on the probability, such as $|e_t|<3\sigma_\varepsilon$ for 99.7% or one-sided $e_t>3\sigma_\varepsilon$.
The number of visitors is probably quite persistent, but super seasonal. It might work better to try seasonal dummies instead of the multiplicative seasonality, then you'd try ARMAX where X stands for exogenous variables, which could be anything like holiday dummy, hour dummies, weekend dummies etc.
Best Answer
Twitter algorithm is based on
I'm sure there have been many techniques and advances since 1983!. I have tested on my internal data, and Twitter's anomaly detection does not identify obvious outliers. I would use other approaches as well to test for outliers in time series. The best that I have come across is Tsay's outlier detection procedure which is implemented in SAS/SPSS/Autobox and SCA software. All of which are commercial systems. There is also tsoutliers package which is great but needs specification of
arima
model in order to work efficiently. I have had issues with its defaultauto.arima
with regards to optimization and model selection.Tsay's article is a seminal work in outlier detection in time series. Leading journal in forecasting research International Journal of Forecasting mentioned that Tsay's article is one of the most cited work and most influential papers in an article linked above (also see below). Diffusion of this important work and other outlier detection algorithms in forecasting software(especially in open source software) is a rarity.