MATLAB: Table with a few identical column names

identicaltable

Hello Matlab Community,
I got a not-so-regular data table (I converted it to .csv) containing identically named columns, and I have to calculate mean values both horizontally and vertically.
The schame is the following: The first cells of each row is a string, the others are doubles.
a b b b c c [...]
x
x
y
z
z
z
[...]
The output table should be:
a b c [...]
x
y
z
[...]
Where the values are the mean values, like the x row is the means of
-all the x rows (like the 'grup by' command in sql) and
-all the b, c, … values
I know that in standard database management I would have to re-structure and normalize the table, but is there any solution in Matlab to solve this?
All I could try is to group the rows:
function [ C ] = CsvAvg( in_csv )
T = readtable(in_csv);
C = table;
C(1,1)=T(1,1);
for i=2:height(T)-1
if strcmp(strjoin(table2cell(T(i,1))),strjoin(table2cell(T(i-1,1))))==0
C(i,1)=T(i,1);
for j=2:width(T)
%
end
else C(i,1)=cell2table(cellstr(' '));
end
end
end
I noticed that Matlab re-name the identical column names (which is correct in most cases).
Thank you in advance and Im sorry for my lack of English language skills. Im a beginner in Matlab programming too.

Best Answer

Grouping the rows is trivially achieved with varfun in just one line:
newT = varfun(@mean, T, 'GroupingVariables', 1)
The tricky bit is indeed the grouping of the columns as it's not something that's really supported by tables. Probably the easiest way is to use a custom function with rowfun. This can do the row grouping at the same time. Something like:
function varargout = groupingfunction(columngroups, columns)
%this function to be used with rowfun with the option 'SeparateInputs', false
%columngroup: row vector of integers from 1:n indicating how to group the column together (all 1 columns together, all 2 columns together, etc.)
%columns contains the content of the columns of the table
columngroups = repmat(columngroups, size(columns, 1), 1); %replicate grouping for each row
varargout = num2cell(accumarray(columngroups(:), columns(:), [], @mean));
end
You can then use that with rowfun:
columgroups = [1 1 1 2 2 3 3 3 4 4 4 4 5 6 6 6]; %generate that however you want. Indicates how to group the columns together, ignoring the first column
columnames = {'a', 'b', 'c', 'd', 'e', 'f'}; %as many as there are unique values in columngroups
newT = rowfun(@(cols) groupingfunction(columngroups, cols), T, 'GroupingVariables', 1,'SeparateInputs', false, 'NumOutputs', numel(unique(columngroups)), 'OutputVariableNames', columnnames);
A possible way of generating columngroups and columnnames, assuming they're all named prefix_number
prefixes = regexp(T.Properties.VariableNames(2:end), '.*?(?=(_\d+)?$)', 'match', 'once');
[columnnames, ~, columngroups] = unique(prefixes);
edit, now that you've posted some demo data: The above works without issue on your demo data, as long as you transpose the columngroups vector returned by unique:
T = readtable('test.csv');
prefixes = regexp(T.Properties.VariableNames(2:end), '.*?(?=(_\d+)?$)', 'match', 'once');
[columnnames, ~, columngroups] = unique(prefixes);
newT = rowfun(@(cols) groupingfunction(columngroups.', cols), T, 'GroupingVariables', 1,'SeparateInputs', false, 'NumOutputs', numel(unique(columngroups)), 'OutputVariableNames', columnnames);