MATLAB: Remove duplicate rows in table

table

I have a table with four columns and roughly 45,000 rows (example below). The first column is the name of statistical test (of which there are several hundred different tests). For every statistical test the values in the 4th column are duplicated (at .25 and 0.5). Can anyone advise how I delete the first of these rows (the first one of the .25 and the first one of the 0.5 rows) for every statistical test?

'Perm t-test equal [250ms,500ms 92, 108]: Avg: 11_right   FCL'	-1.349	0.185	0.492
'Perm t-test equal [250ms,500ms 92, 108]: Avg: 11_right   FCL'	-1.457	0.155	0.496
'Perm t-test equal [250ms,500ms 92, 108]: Avg: 11_right   FCL'	-1.544	0.134	0.500
'Perm t-test equal [500ms,900ms 92, 108]: Avg: 11_right   FCL'	-1.544	0.129	0.500
'Perm t-test equal [500ms,900ms 92, 108]: Avg: 11_right   FCL'	-1.615	0.112	0.503
'Perm t-test equal [500ms,900ms 92, 108]: Avg: 11_right   FCL'	-1.665	0.100	0.507

Best Answer

Follow the demo.

T is a table
T.Test contains the test names which can be strings, character vectors, categoricals, or numeric.
T.col4 is the name of column 4.

The demo removes the first line where column 4 equals 0.25 or 0.50 for each test. The tests do not have to be in order.

% Create table
rng('default') % for reproducibility
T = table(repelem({'A';'B';'C'},5,1),rand(15,1), rand(15,1), repmat([0;.25;.25;.5;.5],3,1),...
    'VariableNames',{'Test','col2','col3','col4'});
T.col4([7,14]) = .33; 
disp(T)
    Test      col2        col3      col4
    _____    _______    ________    ____

    {'A'}    0.81472     0.14189       0
    {'A'}    0.90579     0.42176    0.25
    {'A'}    0.12699     0.91574    0.25
    {'A'}    0.91338     0.79221     0.5
    {'A'}    0.63236     0.95949     0.5
    {'B'}    0.09754     0.65574       0
    {'B'}     0.2785    0.035712    0.33
    {'B'}    0.54688     0.84913    0.25
    {'B'}    0.95751     0.93399     0.5
    {'B'}    0.96489     0.67874     0.5
    {'C'}    0.15761     0.75774       0
    {'C'}    0.97059     0.74313    0.25
    {'C'}    0.95717     0.39223    0.25
    {'C'}    0.48538     0.65548    0.33
    {'C'}    0.80028     0.17119     0.5
% For each testtype, identify the first row where col4 is .25 and .50
[testID, testNames] = findgroups(T.Test);
rowNum1 = arrayfun(@(i) {find(testID==i & T.col4==0.25, 2)}, unique(testID));
rowNum2 = arrayfun(@(i) {find(testID==i & T.col4==0.50, 2)}, unique(testID));
rowNums = cell2mat(cellfun(@(c){padarray(c,[2-numel(c),0],NaN,'post')},[rowNum1', rowNum2']));
rmRows = rowNums(2, ~isnan(rowNums(2,:)));
% remove rows from table
T(rmRows, : ) = []
T = 11x4 table
    Test      col2        col3      col4
    _____    _______    ________    ____

    {'A'}    0.81472     0.14189       0
    {'A'}    0.90579     0.42176    0.25
    {'A'}    0.91338     0.79221     0.5
    {'B'}    0.09754     0.65574       0
    {'B'}     0.2785    0.035712    0.33
    {'B'}    0.54688     0.84913    0.25
    {'B'}    0.95751     0.93399     0.5
    {'C'}    0.15761     0.75774       0
    {'C'}    0.97059     0.74313    0.25
    {'C'}    0.48538     0.65548    0.33
    {'C'}    0.80028     0.17119     0.5

Related Solutions

MATLAB: Select the row that contains the minimum of a column

Use an auxiliary variable:

tmp = results(:,6); tmp(tmp==0) = NaN;
[minVal rowInd]=min(tmp)

MATLAB: Subsetting equal data from a array to different arrays

a = [    0.0010    0.0310       NaN    0.5873
    0.0010    0.0590       NaN    0.8092
    0.0050    0.0310    3.0958    0.7419
    0.0050    0.0310    3.8532    0.7570
    0.0050    0.0310    6.4800    0.6803
    0.0050    0.0310   24.3356    0.6091
    0.0050    0.0310   37.2512    0.6321
    0.0050    0.0310   37.2633    0.5996
    0.0050    0.0310   75.1829    0.6125
    0.0050    0.0310   93.9991    0.6680
    0.0050    0.0590    2.2801    0.8573
    0.0050    0.0590    2.7944    0.8585
    0.0050    0.0600    2.7647    0.8750
    0.0050    0.0600   18.1790    0.8311
    0.0050    0.0600   27.5549    0.8176
    0.0050    0.0600   27.6349    0.8064];
[~,~,c] = unique(a(:,1:2),'rows');
out = accumarray(c,(1:numel(c))',[],@(x){a(x,:)});

Best Answer

Related Solutions

MATLAB: Select the row that contains the minimum of a column

MATLAB: Subsetting equal data from a array to different arrays

Related Question