MATLAB: How to use an image datastore for an image regression with custom pre-processing steps

augmentationcustomdatastoredeepefficientimagelearningmemoryregression

I want to train a deep learning network (CNN) for predicting numeric arrays (i.e response variable) using input images and have followed this image-regression documentation example.
The above example loads all images (after pre-processing) in a 4D-array in memory at once before network training. Loading 50k images is not feasible for me. 
I have also checked the "augmentedImageDatastore" which resolves the memory issue. But during pre-processing, I want to crop each image using individual crop parameters which the "augmentedImageDatastore" does not support.
Please suggest a workaround to achieve this workflow. 

Best Answer

This can be achieved by saving metadata for all samples in MAT-files, and then create a "fileDatastore" using custom a read function.
As a workaround, please try the following steps:
1) Save metadata such as "imgFilename", "cropData", and "responseData" related to each sample in an individual MAT-file. A script can be created to automate this process.
2) Create a datastore for input images by providing the folder path that stores all MAT-files. Either "fileDatastore" or "imageDatastore" can be used with a custom read function "ReadFcn".
imds = fileDatastore(pathToFolder,'FileExtensions','.mat','ReadFcn',@matImgRead);
%imds = imageDatastore(pathToFolder,'FileExtensions','.mat','ReadFcn',@matImgRead);
function img = matImgRead(filename)
data = load(filename);
img = imread(data.imgFilename);
img = imcrop(img,data.cropData);
img = imresize(img,[28 28]);
end
3) Create a datastore for the response variable as well.
responseds = fileDatastore(pathToFolder,'FileExtensions','.mat','ReadFcn',@(filename) load(filename, 'responseData'));
4) The image and the response variable datastores can then be combined. "combine" function concatenates the outputs from the datastores.
combinedds = combine(imds, responseds);