Solved – Partitioning for 10-fold cross validation using neural networks in MATLAB

cross-validationMATLABneural networks

I am working on an assignment which is set to recognize on of 6 basic human emotions based on facial expression data.

The data set looks like this:

  • input data: Nx136, where N is the total number of examples and 136 is the dimensionality of the feature vector computed by concatenating the coordinates of 68 facial points.
  • target data: Nx1, containing the emotion labels. These labels are labeled from 1 to 6.

I would like to perform 10 fold cross-validation evaluation of my neural network using the parameters and an optimal learning rule.

I understand that I need to partition my data into 10 non overlapping folds. In my case there are 448 samples to work with. Each sample corresponds to a target label (based on the column number). I assume that the folds has to be almost the same size, so I'll have 8 folds with 45 samples and 2 with 44 samples. The target labels are ordered (first there are all the label 1s, then the label 2s and so on), hence the corresponding facial coordinate samples are as well. There different number of samples for each facial expression, in my case [47 60 68 113 58 102].

Shall I aim to have an equal number of the mixture of each label group in my folds? e.g. roughly 8 of each

I am stuck with implementing a generic function that would ensure an equal distribution of samples for each target label per partition. I am finding it difficult as the number of samples in the folds are not the same (there two with 44 samples the rest with 45) and also there are different number of samples for each label.

Could anyone suggest a logical way of doing this so that it could be apply for any k-fold partitioning in MATLAB?

Best Answer

You can use Optunity's cross-validation facilities to generate the folds. Whether or not you want to stratify (i.e. distribute instances of each class equally over folds) depends on you (the software allows it).

With unequal numbers of instances per class you can't expect to get a fully equalized coverage in each fold; you'll probably end up with some folds that have one extra instance for a given class compared to others..