MATLAB: How to read text files from each subfolder

data acquisitionpattern matchingregexpstatisticsStatistics and Machine Learning Toolbox

Hi,

I have a main folder which contains several sub folders, now I want to read text files from each subfolder, save the data into ".xlsx" of each subfolders data by its subfolder name. For example read data from subfolder1 and save the data as "subfolder1.xlsx", and subfolder2 data as "subfolder2.xlsx".

Read text files, and extract the data as mentioned below:

1. Above the first dotted (———–) line, extract the information: RainFallID, IINT, Rain Result, Start Time

2. Between two dotted lines(——-), the first column and 3rd column, in 3rd column if the data is mixed only keep the first part (for example 0.67 mm–> 0.67, 60.67+34e %–>60.67+34e, and if it is text like "False End"–>False End).

Please help some one kindly,

Best Answer

Try something along this line:

 % - Define output header.
 header = {'RainFallID', 'IINT', 'Rain Result', 'Start Time', 'Param1.pipe', ...
    '10 Un Para2.pipe', 'Verti 2 mixing.dis', 'Rate.alarm times'} ;
 nHeaderCols = numel( header ) ;
 % - Build listing sub-folders of main folder.
 D_main = dir( 'Mainfolder' ) ;
 D_main = D_main(3:end) ;             % Eliminate "." and ".."
 % - Iterate through sub-folders and process.
 for dId = 1 : numel( D_main )
    % - Build listing files of sub-folder.
    D_sub = dir( fullfile( 'Mainfolder', D_main(dId).name, '*.txt' )) ;
    nFiles = numel( D_sub ) ;
    % - Prealloc output cell array.
    data = cell( nFiles, nHeaderCols ) ;
    % - Iterate through files and process.
    for fId = 1 : nFiles
        % - Read input text file.
        inLocator = fullfile( 'Mainfolder', D_main(dId).name, D_sub(fId).name ) ;
        content = fileread( inLocator ) ;
        % - Extract relevant data.
        rainfallId = str2double( regexp( content, '(?<=RainFallID\s+:\s*)\d+', 'match', 'once' )) ;
        iint       = regexp( content, '(?<=IINT\s+:\s*)\S+', 'match', 'once' ) ;
        rainResult = regexp( content, '(?<=Rain Result\s+:\s*)\S+', 'match', 'once' ) ;
        startTime  = strtrim( regexp( content, '(?<=Start Time\s+:\s*).*?(?= -)', 'match', 'once' )) ;
        param1Pipe = str2double( regexp( content, '(?<=Param1.pipe\s+[\d\.]+\s+\w+\s+)[\d\.]+', 'match', 'once' )) ;
        tenUn      = str2double( regexp( content, '(?<=10 Un Para2.pipe\s+[\d\.]+\s+\w+\s+)[\d\.]+', 'match', 'once' )) ;
        verti2     = regexp( content, '(?<=Verti 2 mixing.dis\s+\S+\s%\s+)\S+', 'match', 'once' ) ;
        rateAlarm  = strtrim( regexp( content, '(?<=Rate.alarm times\s+\S+\s+)[^\r\n]+', 'match', 'once' )) ;
        % - Populate data cell array.
        data(fId,:) = {rainfallId, iint, rainResult, startTime, ...
            param1Pipe, tenUn, verti2, rateAlarm} ;
    end
    % - Output to XLSX.
    outLocator = fullfile( 'OutputFolder', sprintf( '%s.xlsx', D_main(dId).name )) ;
    fprintf( 'Output XLSX: %s ..\n', outLocator ) ;
    xlswrite( outLocator, [header; data] ) ;
 end

Note that if you have a recent version of MATLAB, you can use the `folder` field of the struct outputed by DIR, and simplify most FULLFILE calls.

EDIT 4:09pm

Just a few extra comments. While it may look complicated, you should be fine with most of the code here. The general approach is

 Iterate through sub folders of 'Mainfolder'
      Iterate through files of sub folder
           Extract data from file and store in data array
      Export data array to relevant Excel file

The part that will likely be the most complex for you is the data extraction. One quick option for this is pattern matching using regular expressions. You can see a series of calls to REGEXP:

 .. = regexp( content, pattern, option1, option2, .. )

This extract from content a string that matches the pattern. When you need to export a number we convert it to double using STR2DOUBLE. When it may capture extra white spaces we trim it using STRTRIM.

Regular expressions are a big topic, so it is normal if you don't really understand the patterns. In short,

 aAb,123 etc : are literals; they are simply matched and they don't have
               any special meaning 
 \s, \S, \d  : match a single white-space, non white-space, numeric digit respectively
 *, +        : mean zeros or more, and one or more respectively times the pattern that precedes
               \d+ hence means one or more numeric digit
 [..], [^..] : defines a set of characters (-sets) to match or not to match respectively
               [\d\s]+ hence means one or more element of either \d or \s
 (?<=..)     : defines a look behind
               (?<=hello )world matches 'world' when it is preceded by 'hello '
 (?=..)      : defines a look forward
               hello(?= world) matches 'hello' when it is followed by ' world'
 .           : matches any character. To match a the character '.', it has to be escaped with \
               .[\d\.]+ matches any character followed by one or more characters that
               are either a numeric digit or a '.'

Given this information, you can understand the pattern for extracting e.g. the value of IINT:

 '(?<=IINT\s+:\s*)\S+'

which is, match

 (?<=..)\S+    : one or more non white-space preceded by something

and the something is

 IINT\s+:\s*   : the literal 'IINT' followed by one or more white-spaces,
                 followed by the literal ':', followed by zero or more white-spaces

Cheers,

Cedric

Best Answer

Related Solutions

MATLAB: How to read text files form sub-sub folders

MATLAB: How to delete header from a .txt file and save it

Related Question