MATLAB: Merging specific Excel files according to a string in the filename

excelfor loopisequalmerging files

After searching through the forums I've found answers relating to how to merge Excel files together, but none that relate specifically to what I wish to do. I have thousands of Excel files, from a sample set of 120 samples, all in one folder. There are thus 120 sets of files which are separated by time and within each set they all have the same particular string in their filename which allows me to identify them, based on the time the measurements in each file start and end. I wish to merge each set of files together into one file but as the number of files in each set is variable, I wonder if there is a way to count the number of occurrences and only merge those occurrences together. hist and unique doesn't work because I have both alphabetical and numerical characters for the identifiers.
What I've done so far is first load all of the files, tabulate the filenames and split up the filenames so that the unique identifiers are isolated. I adapted a
Files=dir('\\fscbeng\Project\Lab_Measurements\Results\20190424-1\*.csv'); % Looks out for Excel spreadsheets specifically
num_files=1:length(Files);
Cell_logs = cell(1,num_files(end));
for r = num_files
filename=([Files(r).folder '\' Files(r).name]);
fileID = fopen(filename,'r');
Cell_logs(r) = textscan(fileID, '%s', 'delimiter', '\r'); % Stores each Excel file in a separate cell
fclose(fileID);
end
Filenames = table({Files.name}.');
for i = 1:size(Filenames,1)
x = cell2mat(Filenames{i,1});
Filenames{i,1} = {x(1:7)};
Filenames{i,2} = {x(8:23)};
Filenames{i,3} = {x(24:25)};
end
Filenames.Properties.VariableNames = {'Sensor_type' 'UID' 'Sensor_position'};
Sensor_position contains the identifier as follows:
Filenames.Sensor_position
ans =
1320×1 cell array
'80'
'80'
'80'
'80'
'80'
'80'
'80'
'80'
'80'
'80'
'80'
'81'
'81'
'81'
'81'
'81'
...
So there are 11 files for each sensor position, so one way I've thought of (but don't know how to fully execute as I haven't done this before) is a for loop with a logical expression where:
v=Filenames.Sensor_position;
for i = 1:length(v)-1
if isequal(v(i), v(i+1))
... % command to continue until the values differ
else
... % the consecutive values have been evaluated to be different so the column of values would be cut here and the files merged up until the particular row number
end
end
Idea being that when the for loop encounters the next value in the column which differs from the previous value it would split it off into a new variable (if I include an else condition) or else earmark each section of the Filenames.Sensor_position so that the file merger only happens for those earmarked files, if that makes any sense. The other option I've thought of is a find command which would process only those files with a particular string (e.g. '80', '81' etc.) but it would require me to specify each sensor position in turn (there are 120 of them) which seems quite clunky to me.

Best Answer

folder = '\\fscbeng\Project\Lab_Measurements\Results\20190424-1';
filelist = dir(fullfile(folder, '*.csv'));
filedata = cell(numel(filelist), 2);
for fileidx = 1:numel(filelist)
filecontent = fileread(fullfile(folder, filelist(fileidx).name)); %read whole file at once
[header, data] = regexp(filecontent, '^.*=+ DATA[^\n\r]+[\n\r]+[^\n\r+]+[\n\r]+', 'match', 'split', 'once'); %identify line with '==== DATA'. Keep everything up to that and the next line as header. After that it's data
filedata{fileidx, 1} = header;
filedata{fileidx, 2} = data{2}; %data{1} is text before the regexp match which will always be empty since we match from the start
end
filenames = char({filelist.name});
filenames = filenames(:, 1:25);
filedata = cell2table([mat2cell(filenames, ones(1, numel(filelist)), [7, 16, 2]), filedata], ...
'VariableNames', {'Sensor_type', 'UID', 'Sensor_position', 'Header', 'Data'});
merged = rowfun(@(header, data) {[header{1}, data{:}]}, filedata, 'InputVariables', {'Header', 'Data'}, 'GroupingVariables', 'Sensor_position', 'OutputVariableNames', 'Content');
The above simply concatenates horizontally the header of the first file with the data of each file with the same Sensor_position. It doesn't add any newline between each, relying on the fact that your data already ends with a newline (at least in the sample file you shared). At the end you get a new table with the unique Sensor_position and the merged data. To actually write it to files you can use a loop (or another rowfun):
destfolder = '\\somewhere\somefolder'
for row = 1:height(merged)
fid = fopen(fullfile(destfolder, sprintf('%s.csv', merged.Sensor_position{row})), 'w'); %no idea what the output file name should be. Just using sensor_position here
fwrite(fid, merged.Content{row});
fclose(fid);
end
Code is completely untested. There may be typos, bugs.
edit: fixed many typos