Solved – Time-series classification – very poor results

classificationtime series

I am working on a time series classification problem where the input is time series voice usage data (in seconds) for the first 21 days of a cell phone account. The corresponding target variable is whether or not that account cancelled in the 35-45 day range. So it is a binary classification problem.

I am getting very poor results from all of the methods that I have tried so far (to varying degrees). First I tried k-NN classification (with various modifications) and got extremely bad results. This lead me to extract features from the time series – i.e. mean, variance, max, min, total zeros days, total trailing zero days, difference between first half average and second half average, etc. and the most predictive features seemed to be total zeros days and total trailing zero days (using several classification algorithms). This performed the best but the performance was still not very good.

My next strategy was to oversample the negative instances in my training set since there were so few of them. This resulted in more correct cancellation prediction but at the expense of more false-positives.

I'm starting to think that perhaps the time series usage data itself is simply not very predictive (though common sense says that it should be). Perhaps there is some latent variable that I am not considering. Looking at the data also shows some strange behaviour. i.e. some examples show very little or decreasing usage (or sometimes none at all) and do not cancel, and some show a ramp up in usage that do cancel. Perhaps this contradictory behaviour does not generate a very clear decision boundary for a classifier.

Another possible source for error is the fact that many training examples are very sparse (i.e. many days with 0 usage). One idea that I have not tried yet is to split the time series into segments and generate some features that way, but I do not have high hopes.

Best Answer

I've had pretty good success applying KNN with Dynamic Time warping as the distance metric.

My research (pdf) suggests that this approach is very difficult to beat. The below schematic is from my python implementation of KNN and DTW on github. Or view in IPython Notebook

KNN and DTW

If you're training data set is very large, I suggest performing a hierarchical clustering of the distance matrix. Then sampling from desired clusters to produce your smaller training data set. The hclust will ensure you have time series that represent a broad range of time series characteristics in your data.

Related Question