I have a normalized data table of 3568 rows and 24 columns. I calculate mahalanobis distance for each row of data using the code below. But how can I use mahalanobis distance I found to remove outliers?Is there any principle like distance above or below how many percent should be removed? Please advice me as I try to create several scenarios for my dataset.
For example,
- scenario 0, just clean missing data but no outlier remove
- scenario 1, remove outliers by using mean method
- scenario 2, remove outliers by mahalanobis distance
Thank you for all your help
%DATA = 3568 x 24 table
k = size(DATA); n = k(1); %row
m = k(2); %column
Y = DATA; a = zeros(1,m); %one observation
b = zeros(n-1,m); %new table dif dimension
c = zeros(1,m);d_mahal_DATA = zeros(n,1); %mahalonobis
format short efor i=1:n if i==1 a(i,:)=Y(i,:); c = removerows(Y(i,:)); Y(1,:)=[]; d_mahal_DATA(i,:) = mahal(c,Y); elseif i>1 a(i,:)=Y(1,:); %row 1:i
c = removerows(Y(1,:)); %row i only
Y(1,:)=[]; %row i+1 onwards
b = [a(1:i-1,:);Y]; %row 1:i-1;i+1:-end (skip row i)
d_mahal_DATA(i,:) = mahal(c,b); endendd_mahal_DATA % size 3568 x 1
Best Answer