MATLAB: How to put zero or nan instead of rejecting the data in Chauvenet-Script

chauvenet outliers

hello, my task is to detect outliers in large dataset using chauvenet criterion.. Chauvenet-Test said: A reading may be rejected if the probability of obtaining the particular deviation is less than 1/2n. in other words it compares the probability of data deviation and reject the data from a list, if this distance is to large.. So, my question is not to Reject a data, but to replace bad data with 0 or NaN ..
I have following script:
`function [ data_bio2, data_percent_rejected, data_cv ] = chauvenet( x )
% remove zero entries
data_zeros=find(x==0.0);
data_nonzeros=find(x>0.0);
data_bio2 = x(data_nonzeros);
% compute length, mean, std, min max of non-zero data
data_length2=length(data_bio2); %





data_mean2 =mean(data_bio2); %
data_standard2 = std(data_bio2); %
data_max2 = max(data_bio2); %
data_min2 = min(data_bio2); %
% Part three - Identify outliers using Chauvenets criterion
% Z-score data and compute two-sided Z-score for Chauvenets criteria
data_probability = 1/(2*length(data_nonzeros)); %
data_zscore = (data_bio2 - data_mean2)/(data_standard2);
data_ptest = 1 - data_probability/2;
zc=norminv(data_ptest, 0, 1);
% Hence, reject data with biomass > std*zc
data_limit = zc * data_standard2;
data_cv = data_bio2( data_zscore >= -zc & data_zscore <= zc );
data_cvlength = length(data_cv);
index_rejected = find(data_zscore > zc | data_zscore < -zc);
%!!! index_rejected: these are the indices of the rejected values in your data vector
data_rejected = data_bio2(data_zscore > zc | data_zscore < -zc)
index_rejected_original = data_nonzeros(index_rejected); %!!!FLAG THOSE LINES!!!
biomass_rejected_original = data_bio(index_rejected_original);
%!!!index/biomass_rejected_original: these are the lines/biomasses
%of your original data file that need to be flagged
% percent of data rejected by Chavenets criterion
data_percent_rejected = (1- data_cvlength/length(data_bio2))* 100
% compute histogram using linear bin-size
[M,Y]=hist(data_bio2,1000);
[M_cv]=hist(data_cv,Y);
end
So, how can I change the script to put zero or Nan for my bad data and not to reject it from the list Thank you in advance!

Best Answer

If I understand your code correctly, this will replace your ‘data_rejected’ selections withto NaN:
data_bio2(index_rejected) = NaN;
I would replace them with NaN instead of zero because zero could enter into your calculations and be considered a valid number. NaN will not be considered a valid number.