Machine Learning – Using Correlation Matrix as Features in Classification

classificationcorrelationmachine learningpythonsvm

Can I use correlation between the training data as features, and if possible how will I test the test data with the model coefficients

I will try to explain more

If the training data are

X = [X1, X2,....., Xn]
and Xi = [Xi1,Xi2,....., Xi100]

where X are the training data and Xi is one sample of the data

and

K = [Xcorr1,1 ... Xcorr1,n
     ...      ...   ...
     Xcorrn,1 ... Xcorrn,n]

K will be something like that

K = [1 .2 .3 .4
     .2 1 .5 .6
     .3 .5 1 .7
     .4 .6 .7 1]

Can I will use this K to train my model ?

If it is possible How will I test my test data which will be

X = [X1, X2,....., Xn]

Best Answer

I think I may now understand the proposed algorithm, and it seems reasonable, but I don't think it is the way I would go about it.

The correlation in question is not the usual Pearson correlation, but the sort used in signal processing, which is related to convolutions and autocorrelations:

enter image description here (source: Wikipedia)

This function describes the similarities between the signals at different spatial distances.

As I understand it, the idea is to compute the correlation functions between the test signal and each of the training signals and compute a weighted sum, which will hopefully (if the genetic algorithm worked well) would pick out parts of the correlation function that contain discriminative information. So the feature is actually a similarity metric based on the correlation function.

This means the features of the new dataset are measures of the similarity between the test signals and each of the training signals. To generate the features for the test cases, it should be sufficient to just compute the covariance function between each test signal and each training signal and compute the weighted sums as before.

However, I think there is a much simpler approach. Just compute the covariance function between the signal and all of the training signals and just concatenate them together to make one very long vector and then use that as the input to a support vector machine. The performance bounds for the SVM don't depend on the dimensionality of the input vector, so provided you tune the regularisation parameter properly, over-fitting shouldn't be a problem and the SVM will identify the correct weightings for each lag in the correlation functions for you. As you can use the dual formulation of the SVM, there will be one parameter per training signal, rather than one for each element of the feature vector, which in this case will be reasonably efficient. This is likely to be much more effective than a GA, where there is little or no control over over-fitting.

Related Question