Solved – Issues in pattern recognition and plots

machine learningMATLABpattern recognition

I am trying to compare the performance of a classification result with Bayes classifier, K-NN, principal component analysis (PCA).

I have doubts regarding the following (please excuse my lack of programming skills since I am a biologist and not a programmer thus finding hard to follow the matlab documentation)

  1. In the Matlab code

    Class = knnclassify(Sample, Training, Group, k)
    Group =  [1;2;3]   //where 1,2,3 represents Class A,B,C respectively.
    

    What goes in the sample because my data is a 100 row 1 column for each of the classes? So Group 1 contains data like $[0.9;0.1;……n]$ where $n=100$. Would the sample be a vector containing random mixtures of the data points from the three classes? Same query/doubt for Training matrix.

  2. I cannot follow crossval() & cvpartition() function given in MATLAB documentation crossval(). Would be obliged if a simpler version of it is provided here.

  3. Is there a code for multi-class ROC since Internet resources provide information about binary class ROC. How to proceed for ROC plot when there are three or more classes like in my case?

  4. Is it possible to obtain a surface plot from the confusion mmatrix from the command confusemat()?

Best Answer

I did a webinar titled An Introduction to Classification with MATLAB. You can download the code and the dataset from the MATLAB file exchange:

http://www.mathworks.com/matlabcentral/fileexchange/28770-introduction-to-classification

I'm attaching some code directly that might be helpful

%%  Use a Naive Bayes Classifier to develop a classification model

% Some of the features exhibit significant correlation, however, its
% unclear whether the correlated features will be selected for our model

% Start with a Naive Bayes Classifier

% Use cvpartition to separate the dataset into a test set and a training set
% cvpartition will automatically ensure that feature values are evenly
% divided across the test set and the training set

% Create a cvpartition object that defined the folds
c = cvpartition(Y,'holdout',.2);

% Create a training set

X_Train = X(training(c,1),:);
Y_Train = Y(training(c,1));

%%  Train a Classifier using the Training Set

Bayes_Model = NaiveBayes.fit(X_Train, Y_Train, 'Distribution','kernel');

%%  Evaluate Accuracy Using the Test Set

clc

% Generate a confusion matrix
[Bayes_Predicted] = Bayes_Model.predict(X(test(c,1),:));
[conf, classorder] = confusionmat(Y(test(c,1)),Bayes_Predicted);
conf

% Calculate what percentage of the Confusion Matrix is off the diagonal
Bayes_Error = 1 - trace(conf)/sum(conf(:))


%%  Naive Bayes Classification using Forward Feature Selection

% Create a cvpartition object that defined the folds
c2 = cvpartition(Y,'k',10);

% Set options
opts = statset('display','iter');

fun = @(Xtrain,Ytrain,Xtest,Ytest)...
      sum(Ytest~=predict(NaiveBayes.fit(Xtrain,Ytrain,'Distribution','kernel'),Xtest));

[fs,history] = sequentialfs(fun,X,Y,'cv',c2,'options',opts)
White_Wine.Properties.VarNames(fs)

Ad in an illustration of how to calculate an ROC curve. Please note: this example is using a bagged decision tree rather than a Naive Bayes classifier

%%  Run Treebagger Using Sequential Feature Selection
tic
f = @(X,Y)oobError(TreeBagger(50,X,Y,'method','classification','oobpred','on'),'mode','ensemble');
opt = statset('display','iter');
[fs,history] = sequentialfs(f,X,Y,'options',opt,'cv','none');
toc
%%  Evaluate the accuracy of the model using a performance curve

Test_Results = dataset(Y_Test, Predicted, Class_Score);
[xVal,yVal,~,auc] = perfcurve(Test_Results.Predicted, ...
    Test_Results.Class_Score(:,4),'6'); 

plot(xVal,yVal)
xlabel('False positive rate'); ylabel('True positive rate')
Related Question