MATLAB: Use of dir – too slow!

dir

I have a main folder on a network containing a lot of subfolders(~1000), each subfolder has ~1000 DICOM files as well. My code needs to find a string in the DICOM header fields. All the files for each subfolder will have the same field so I only need to compare a file of each subfolder…but the problem I find is that for each subfolder I have to user the dir command and that is time consuming.
My code is:
all_folders=dir(path_browse); %struct containing every folder
no_folders=length(all_folders)-2; %number of folders, excluding '.' and '..'
for i=1:no_folders
name_folder=all_folders(i+2).name; %subfolder to find match
aux_dir=dir(name_folder); %files in subfolder
cd(name_folder) %moves to subfolder
test_file=dicominfo(aux_dir(3,1).name); %DICOM header from first file in the folder
search_field(i)=strcmp(lower(test_file.field),field_query); %compare fields
cd(path_browse) %back to main folder
end
Then I would just need to find the 1s in search_field. Is there any option to open a file without using dir or ls? The code works but I want it to be more efficient.
Regards,
Sergio

Best Answer

Do you have any evidence that dir is the time consuming command? This is not likely, but it could happen if you work on a network drive which is connected over a slow connection. Even then dir is not the problem, but the connection.
It is not documented, that '.' and '..' are the first 2 replies of dir. So better remove these special names explicitly.
% UNTESTED CODE!
all_folders = dir(path_browse);
all_folders(ismember({all_folders.name}, {'.', '..'})) = []; % exclude '.' and '..'
no_folders = numel(all_folders);
search_field = false(1, no_folders); % Pre-allocate!!!
for k = 1:no_folders
name_folder = fullfile(path_browse, all_folders(k).name); % subfolder to find match
aux_dir = dir(name_folder); % files in subfolder
aux_dir(ismember({aux_dir.name}, {'.', '..'})) = [];
test_file = dicominfo(fullfile(name_folder, aux_dir(1).name));
search_field(k) = strcmpi(test_file.field, field_query); %compare fields
end
This is the method to use absolute paths instead of hopping through the disk by cd().
strcmpi(a,b) is faster and nicer than strcmp(lower(a), b).
I assume, that this is not much faster than your version, because the most time is spent in dicominfo. But the code is safer.