MATLAB: Merging specific Excel files according to a string in the filename

After searching through the forums I've found answers relating to how to merge Excel files together, but none that relate specifically to what I wish to do. I have thousands of Excel files, from a sample set of 120 samples, all in one folder. There are thus 120 sets of files which are separated by time and within each set they all have the same particular string in their filename which allows me to identify them, based on the time the measurements in each file start and end. I wish to merge each set of files together into one file but as the number of files in each set is variable, I wonder if there is a way to count the number of occurrences and only merge those occurrences together. hist and unique doesn't work because I have both alphabetical and numerical characters for the identifiers.

What I've done so far is first load all of the files, tabulate the filenames and split up the filenames so that the unique identifiers are isolated. I adapted a

Files=dir('\\fscbeng\Project\Lab_Measurements\Results\20190424-1\*.csv'); % Looks out for Excel spreadsheets specifically
num_files=1:length(Files);
Cell_logs = cell(1,num_files(end)); 
for r = num_files
    filename=([Files(r).folder '\' Files(r).name]);
    fileID = fopen(filename,'r');
    Cell_logs(r) = textscan(fileID, '%s', 'delimiter', '\r'); % Stores each Excel file in a separate cell
    fclose(fileID);
end
Filenames = table({Files.name}.');
for i = 1:size(Filenames,1)
    x = cell2mat(Filenames{i,1});
    Filenames{i,1} = {x(1:7)};
    Filenames{i,2} = {x(8:23)};
    Filenames{i,3} = {x(24:25)};
end
Filenames.Properties.VariableNames = {'Sensor_type' 'UID' 'Sensor_position'};

Sensor_position contains the identifier as follows:

Filenames.Sensor_position
ans =
  1320×1 cell array
    '80'
    '80'
    '80'
    '80'
    '80'
    '80'
    '80'
    '80'
    '80'
    '80'
    '80'
    '81'
    '81'
    '81'
    '81'
    '81'
    ...

So there are 11 files for each sensor position, so one way I've thought of (but don't know how to fully execute as I haven't done this before) is a for loop with a logical expression where:

v=Filenames.Sensor_position;
for i = 1:length(v)-1
    if isequal(v(i), v(i+1)) 
        ... % command to continue until the values differ
    else
        ... % the consecutive values have been evaluated to be different so the column of values would be cut here and the files merged up until the particular row number
    end
end

Idea being that when the for loop encounters the next value in the column which differs from the previous value it would split it off into a new variable (if I include an else condition) or else earmark each section of the Filenames.Sensor_position so that the file merger only happens for those earmarked files, if that makes any sense. The other option I've thought of is a find command which would process only those files with a particular string (e.g. '80', '81' etc.) but it would require me to specify each sensor position in turn (there are 120 of them) which seems quite clunky to me.

folder = '\\fscbeng\Project\Lab_Measurements\Results\20190424-1'; filelist = dir(fullfile(folder, '*.csv')); filedata = cell(numel(filelist), 2); for fileidx = 1:numel(filelist) filecontent = fileread(fullfile(folder, filelist(fileidx).name)); %read whole file at once [header, data] = regexp(filecontent, '^.*=+ DATA[^\n\r]+[\n\r]+[^\n\r+]+[\n\r]+', 'match', 'split', 'once'); %identify line with '==== DATA'. Keep everything up to that and the next line as header. After that it's data filedata{fileidx, 1} = header; filedata{fileidx, 2} = data{2}; %data{1} is text before the regexp match which will always be empty since we match from the start end filenames = char({filelist.name}); filenames = filenames(:, 1:25); filedata = cell2table([mat2cell(filenames, ones(1, numel(filelist)), [7, 16, 2]), filedata], ... 'VariableNames', {'Sensor_type', 'UID', 'Sensor_position', 'Header', 'Data'}); merged = rowfun(@(header, data) {[header{1}, data{:}]}, filedata, 'InputVariables', {'Header', 'Data'}, 'GroupingVariables', 'Sensor_position', 'OutputVariableNames', 'Content');

destfolder = '\\somewhere\somefolder' for row = 1:height(merged) fid = fopen(fullfile(destfolder, sprintf('%s.csv', merged.Sensor_position{row})), 'w'); %no idea what the output file name should be. Just using sensor_position here fwrite(fid, merged.Content{row}); fclose(fid); end

Best Answer

The above simply concatenates horizontally the header of the first file with the data of each file with the same Sensor_position. It doesn't add any newline between each, relying on the fact that your data already ends with a newline (at least in the sample file you shared). At the end you get a new table with the unique Sensor_position and the merged data. To actually write it to files you can use a loop (or another rowfun):

Code is completely untested. There may be typos, bugs.

edit: fixed many typos

Best Answer

Related Solutions

MATLAB: How to open a text file, and then read in the headers of that text file using textread? (Errors 166 and 67)

MATLAB: Exporting numerical and string values to *.txt file (in batch)

Related Question