MATLAB: Training and group matrices for classifying data

analysisclassifydatadiscriminantdiscriminategroupmultivariatetraining

I am getting this error when trying to classify matrix data:
Error using classify (line 220) TRAINING must have more observations than the number of groups.
My classification data matrix is 10×5, my training matrix is 2×5, and my group vector is of length 2:
classificationFeatureValues =
1.0e+004 *
0.0006 0.0761 0.0065 3.7003 0.0113
0.0005 0.0683 0.0063 3.3502 0.0114
0.0006 0.0761 0.0065 3.7003 0.0113
0.0005 0.0683 0.0063 3.3502 0.0114
0.0006 0.0761 0.0065 3.7003 0.0113
0.0005 0.0683 0.0063 3.3502 0.0114
0.0006 0.0761 0.0065 3.7003 0.0113
0.0005 0.0683 0.0063 3.3502 0.0114
0.0006 0.0761 0.0065 3.7003 0.0113
0.0005 0.0683 0.0063 3.3502 0.0114
training =
1.0e+004 *
0.0005 0.0683 0.0063 3.3502 0.0113
0.0006 0.0761 0.0065 3.7003 0.0114
group =
1 2
I can't seem to find the error here…
Steve

Best Answer

In order to use CLASSIFY, each class should have enough training points, Ntrni (i=1:c), to obtain accurate estimates of the mean and covariance matrix. The typical rule of thumb is that the number of vector measurements is much greater than the number of estimated parameters. For each class of p-dimensional vectors the Bayesian quadratic classifier requires
Ntrni >> numel(mean) + numel(cov) = p + p*(p+1)/2 = p*(p+3)/2
In addition, each class should have enough testing points Ntsti, to obtain accurate performance estimates on nontraining data (generalization). For classification, the errors are assumed to be Binomially distributed with approximate standard deviation
stdvei = sqrt(ei*(1-ei)/Ntsti ), ~0.05 <= ei <= ~0.95
It is desirable that stdvei << ei. Since max(ei*(1-ei)) = 0.25 , the typical rule of thumb for ei > ~0.05
Ntsti >> 19 >= (1-ei)/ei
For smaller errors Ntsti should be larger. Check a stats handbook for a more accurate estimate of stdvei for ei < 0.05.
If N is not large enough to obtain an adequate Ntrni/Ntsti division, crossvalidation or bootstrapping should be used.
For 10-fold crossfold validation of the the 3-class Fisher iris data with 50 4-dimensional inputs per class, Ntrni = 45 and the ratio per class is
r = 2*Ntrni/(p*(p+3)) < 90/(4*7) = 3.2
For the Bayesian linear classifier, the pooled covariance matrix is estimated yielding
3*Ni >> 3*p + p*(p+1)/2 = p*(p+7)/2
r = 6*Ni/(p*(p+7)) = 270/(4*11) = 6.1
For a LMSE (e.g.,backslash) linear classifier
Ni >> p + 1
r = 45/5 = 9
Therefore I suggest that you use
1. Raw data (i.e., NOT medians or means)...increases Ni
2. A backslash LMSE classifier... decreases no. of estimated parameters
W*[ones(1,N);traininput] = target % Columns of eye(c) for c classes
W = target/[ones(1,Ntrn);traininput] % Ntrn = sum(Ntrni)
output = W*[ones(1,size(input,2);input]
3. Bootstrapping or crossvalidation
Hope this helps.
Greg