Dear experts, I have a list of variables where I need to remove duplicate variables based on the variable in column 2. Variables with a '1' in column 2 are of better quality than variables with a '0'.
1) In case of duplicate variables, I want to keep the variables that have value 1 in the second column. In cases when there are multiple duplicates with a 1 then it needs to keep randomly only one variable. See example below: Here I want to keep the variable BG1028 where the data in the third column is 1.3. For BG1030, I want to keep the variable with 3.0 or 0.3 in the third column.
2) In case of duplicate variables which all have a zero in the second column then it needs to keep randomly only one variable. See example below: I need to keep one variable of BG1027 (random choice).
I hope it is clear. Im puzzling how to do this. This is the code I came up with so far with help from Kirby Fear.
ppn = [ {'BG1026';'BG1027';'BG1027';'BG1028';'BG1028';'BG1028';'BG1029';'BG1029';... 'BG1030';'BG1030';'BG1030';'BG1030'},... % start col 2 {'0';'0';'0';'1';'0';'0';'1';'0';'0';'1';'0';'1'},... % start col 3 {'1.2';'2.2';'5.2';'4.2';'0.2';'8.9';'3.4';'3.0';'0.3';'1.3';'0.3';'1.7'} ];% Storing ppn column 2 as numerical values
bPpn=cell2mat(cellfun(@(c)str2double(c),ppn(:,2),... 'UniformOutput',false));% Get names of duplicates
chooseNames = ppn([strcmp(ppn(1:end-1,1),ppn(2:end,1));false],1);% Loop over chooseNames and keep one at random.
if numel(chooseNames)>0, for j=1:numel(chooseNames), dupidx=find(strcmp(chooseNames{j},ppn(:,1))); dupidx(randi(numel(dupidx)))=[]; ppn(dupidx,:)=[]; endend
Best Answer