Solved – Shape detection for time series data

clusteringcurvestime series

I have a large collection of time series – measurements taken every 15 minutes (96 measurements in a day) over the span of 1 year at various different locations.

I've broken up each time series into 365 separate smaller time series, 1 for each day of the year. Looking at these time series, there are certainly many distinct shapes for a single day.
Some look sinusoidal, some are constant, some look like a random stochastic process, some look parabolic, and some look like U's.

What I would like to do is use an algorithm that can find these common shapes. I thought about clustering, and using the cluster centroids to define common shapes, but wanted to check with the community if this is right. So far, I've looked at Dynamic Time Warp as a metric, but it seems like that metric requires a lot of computation. I've also found

http://mox.polimi.it/it/progetti/pubblicazioni/quaderni/13-2008.pdf from SE.

I also saw Is it possible to do time-series clustering based on curve shape? but this question was from 2010 and might be outdated.

Another idea I had was to take eigendecompositions of matrices that were formatted as:

Matrix $M_i$ is a matrix of all time series observed on day $i$. Every row of matrix $M_i$ is a time series of length 96. Then, I would do 365 eigendecompositions, and use the eigenvectors as common shapes. Does this sound reasonable?

Thanks!

Best Answer

I wouldn't go too deep into the clustering of the time series based on a complex curve analysis, as you have probably a lot of noise in your data and you will probably get strange (meaningless) clusters.

I think that an easier way will be to discover the major pattern of your data, which will most probably based on trends and seasonality (days of the week, weekends, holidays...). You can find it with plotting some statistics from each day (mean, morning trend, evening trend...) with time (day of the year, day of the week, day of the month...) on the x-axis. This will give you the baseline of your data, and therefore your basic clusters.

For example in R, if you have your date in col 1, and your mean in col 2, you can easily plot your base line of weekdays pattern by:

data[,3] <- as.factor(weekdays(data[,1]))
plot(data[,3],data[,2],main='mean by Day of Week')

The next step can be to identify your outliers and check if you can find a patterns there.

If you prefer to work in a reverse order; running automatic analysis like the ones you suggested or more Time-Series tuned as LB_Keogh or kml, this is OK. But you need to return to the meaningful interpretation of the findings with the logic above.

Related Question