MATLAB: How to use an image datastore for an image regression with custom pre-processing steps

augmentationcustomdatastoredeepefficientimagelearningmemoryregression

I want to train a deep learning network (CNN) for predicting numeric arrays (i.e response variable) using input images and have followed this image-regression documentation example.

https://www.mathworks.com/help/releases/R2020a/deeplearning/ug/train-a-convolutional-neural-network-for-regression.html

The above example loads all images (after pre-processing) in a 4D-array in memory at once before network training. Loading 50k images is not feasible for me.

I have also checked the "augmentedImageDatastore" which resolves the memory issue. But during pre-processing, I want to crop each image using individual crop parameters which the "augmentedImageDatastore" does not support.

https://www.mathworks.com/help/releases/R2020a/deeplearning/ref/augmentedimagedatastore.html

Please suggest a workaround to achieve this workflow.

Best Answer

This can be achieved by saving metadata for all samples in MAT-files, and then create a "fileDatastore" using custom a read function.

As a workaround, please try the following steps:

1) Save metadata such as "imgFilename", "cropData", and "responseData" related to each sample in an individual MAT-file. A script can be created to automate this process.

2) Create a datastore for input images by providing the folder path that stores all MAT-files. Either "fileDatastore" or "imageDatastore" can be used with a custom read function "ReadFcn".

imds = fileDatastore(pathToFolder,'FileExtensions','.mat','ReadFcn',@matImgRead);
%imds = imageDatastore(pathToFolder,'FileExtensions','.mat','ReadFcn',@matImgRead);
function img = matImgRead(filename)
data = load(filename);
img = imread(data.imgFilename);
img = imcrop(img,data.cropData);
img = imresize(img,[28 28]);
end

3) Create a datastore for the response variable as well.

responseds = fileDatastore(pathToFolder,'FileExtensions','.mat','ReadFcn',@(filename) load(filename, 'responseData'));

4) The image and the response variable datastores can then be combined. "combine" function concatenates the outputs from the datastores.

combinedds = combine(imds, responseds);

Related Solutions

MATLAB: How can i crop images from multiple folders that are located under one main folder to a specific size before i feed it through a CNN

Hi,

You can use augmentedImageDatastore to augment the images read by imageDatastore. The cropping function can be achieved using the OutputSizeMode as ‘centercrop’. You may ‘randcrop’ also if you want cropping to be random instead of focusing at center. Below code might help.

rootFolder = 'C:\Users\roro\Downloads\Data'; 
imds = imageDatastore(fullfile(rootFolder, categories), ... 
     'LabelSource', 'foldernames'); 
% imageSize must be defined according to the final cropped image size 
imageSize = [28 28 1]; 
augimds = augmentedImageDatastore(imageSize,imds,'OutputSizeMode','centercrop');

augimds can then feed to CNN instead of imds.

MATLAB: Do I get error “Invalid training data for multiple-input network” while training deep neural networks with multiple image inputs

The following documentation page mentions that the datastore should output outputs as a cell array with (numInputs + 1) columns.

https://www.mathworks.com/help/deeplearning/ug/multiple-input-and-multiple-output-networks.html

In this case with two inputs, the final datastore being used for "trainNetwork" should output a Mx3 cell array as following:

>> read(train_ds) 
ans = 1×3 cell array
{32×32×3 uint8} {32×32×3 uint8} {[Anomalies]}

The first two columns should correspond to inputs and third column should corresponds to categorical response variable. Currently, the datastore outputs a Mx2 cell array with image inputs only and results in an error during network training.

To resolve this issue, the datastore should output data in the required format (i-e Mx3 cell array).

This can be achieved by saving metadata (i-e filepaths and class labels) for all the images in individual MAT-files and then creating a "fileDatastore" from MAT-files using a custom "ReadFcn". Make sure that the output of the data from "ReadFcn" is a cell array with 3 columns.

For example:

% save metadata for all samples in indivdual MAT-files
save('sample1.mat','filepath1','filepath2','label')
% create a fileDatastore from MAT-files using a custom "ReadFcn"
trainds = fileDatastore(matFileFolder,'FileExtensions','.mat','ReadFcn',@matRead);
function output = matRead(fn)
s = load(fn);
img1 = imread(s.filepath1);
img2 = imread(s.filepath2);
label = s.label;
output = {img1,img2,label};
end

Best Answer

Related Solutions

MATLAB: How can i crop images from multiple folders that are located under one main folder to a specific size before i feed it through a CNN

MATLAB: Do I get error “Invalid training data for multiple-input network” while training deep neural networks with multiple image inputs

Related Question