MATLAB: Questions on classification learner App

classification learner app

Hello everyone.

I have started using the classification learner app and I have some questions I would like to ask. I will use Matlab's ovarian cancer data-set as an example to illustrate my issues.

1) In the case where we might be missing the response class for an observation (e.g. if response type was coming from histology and histology was not performed for the specific observation, but the predictors'data is available), is it preferable to set the missing observation's response to another, extra, class (e.g. 'unknown') or is it better not to use the observation at all?

2) When enabling PCA to reduce the dimensionality of the observations (in the ovarian cancer data-set, PCA reduces the number of predictors from 4000 to 215 and is using 7/215 features), can we know which features (obs in the ovarian cancer data-set) are the ones that PCA has kept?

3) When exporting a trained model to make predictions for new data and PCA was used dung training, what extra arguments do we need to use when calling: newPredictions = myExportedModel.predictFcn(newData) to ensure that the function knows that PCA was used during training myExportedModel?

Many thanks in advance for your help!

Regards, Ioannis

Best Answer

1. It depends on the classifier and the data/application you are using. For example, if you are trying to solve your classification problem using a linear classifier that predicts whether cancer is there or not? In this case making a third category (unknown) is not going to help. Whereas if you are trying to group all the data into clusters () then making them as "NAN" or 'Unknown' helps you.

2. Principal component analysis is a quantitatively rigorous method for achieving this simplification. The method generates a new set of variables, called principal components. Each principal component is a linear combination of the original variables. All the principal components are orthogonal to each other, so there is no redundant information. The principal components as a whole form an orthogonal basis for the space of the data. For example, in the cancer dataset, if you are using x predictors and then MATLAB PCA reduces this to y (<=x). These are not the actual data (columns) which you are using, these are derived columns out of the predictors by MATLAB. If you want to see the data of these 7 components out of the trained classifier, then you can use the following command

   >> trainedClassifier.PCACoefficients

Also, for seeing how to use the trained classifier, use the following command, this command will give the whole description on how this particular model should be used and how to predict the response variable from the input data

   >> trainedClassifier.HowToPredict

3. MATLAB trained model will know whether PCA is used or not, so it will handle the conversions, you just need to pass the observation which you want to test. However, if you want to ensure that if the trained classifier used PCA before then, you can use the above suggested 'HowToPredict' function.

See the following documentation link that explains about PCA:

https://www.mathworks.com/help/stats/principal-component-analysis-pca.html

Related Solutions

MATLAB: After training data How to test data in classification learner app ?

Hi, Nilima Gautam

In classification learner app, the app will split your data either by holdout or kfold depend on your selection. It will split your data into training dataset and validation dataset. When you are training your model in the app, it uses training dataset to train it and later uses validation dataset to test it and reflect the accuracy for you. In other word, the accuracy you got in the app is the accuracy of your model based on validation dataset. When you click on the confusionchart in the apps, you can realize the number is smaller than original dataset (because it is the validation dataset splitted out from your original dataset).

However, after you export your model to workspace as trainedModel (variable), you may predict your model with any sample

signalTemp2 = trainedModel.predictFcn(outSample);

Or before you dump your dataset into classifical learning apps, you may split your dataset out first, this dataset, some people call it as testing dataset. the function is cvpartition. The workflow should be :

%dataset
c = cvpartition(dataset.label,'HoldOut',0.1); %10% use for testing data
triidx = training(c);
testingdata = dataset(~triidx,:);
training_validation_data = dataset(triidx,:);
%Use classification learning apps
%Select training_validation_data to train and validate
%Export Model
%Classify the testing dataset using trainedModel
signalTemp2 = trainedModel.predictFcn(testingdata);
% Perform evaluation yourself. for example,loss, accuracy, confusion matrix...

MATLAB: How to give inputs to classifier

If you have extracted the trained SVM from the Classification Learner app this is how you do it:

%you trained it to predict - let's say - images:
trainedClassifier = trainedModel.ClassificationSVM;  %You export the trainedModel struct from the app.
[imagepred] = predict(trainedClassifier,imagefeatures);
%imagepred - output data (usually 'categorical' type) for more information look up the predict function.

If this is not your problem please try to specify more details :)

Best Answer

Related Solutions

MATLAB: After training data How to test data in classification learner app ?

MATLAB: How to give inputs to classifier

Related Question