Solved – Outlier detection using k-means in a binary classification problem

k-meansMATLABoutliers

I'm using k-means in every class of a binary classification problem and remove samples that have high distance from center of my features (21 features so 21 dimensions problem) before inserting data set to a neural network. After designing neural network model, now i want use this model for a new data set (out sample).

As you know we must use outlier detection parameters in per-process stage for out sample data (like normalization x-min(x)/max(x)-min(x) that will use max(x) and min(x) for normalization of out sample). what parameters of k-means algorithm should i use for out sample and how can I do that in MATLAB ?

Thanks.

Best Answer

You have to calculate the distance of your test samples (out sample) from the previously defined centroids (second output in Matlab kmeans function) using the same metric (see pdist) and then apply the same cutoff values on the distances obtained.

Related Question