MATLAB: Increase GPU Throughput During training

Deep Learning Toolboxgpumachine learningneural networkneural networks

I have a single Tesla GP100 GPU with 16GB of RAM. When I'm training my neural network, I have two issues
  1. Using a imageDatastore spends a HUGE amount of time doing an fread (I'm using a custom ReadFcn because my data is asymmetric and that seemed easiest). I am able to overcome this by reading all the data into memory prior to training but that will not scale.
  2. During training I am only using 2.2GB of the 16GB available on the GPU. When I use the exact same network and data with TensorFlow, I use all 16GB. This is the case even if I preload all the data above into memory. I'm guessing that is because TensorFlow is "queuing up" batches and MATLAB is not. Is there a way to increase this?
Here is my minimum example code:
function net = run_training_public(dims, nbatch, lr, nepoch)
% Load Data
ds = imageDatastore('./data/set3', 'IncludeSubfolders',true,...
'ReadFcn',@(x)reader_public(x,dims),...
'LabelSource','foldernames',...
'FileExtensions','.dat');
% load neural network structure
network = cnn1;
% Setup options for training and execute training
options = trainingOptions('adam','MaxEpochs',nepoch,'MiniBatchSize',...
nbatch,'Shuffle','every-epoch',...
'InitialLearnRate',lr,...
'ExecutionEnvironment','gpu','Verbose',true);
net = trainNetwork(ds,network,options);
end
function data = reader_public(fileName, dims)
f=fopen(fileName,'r');
data = fread(f,[dims(2) dims(1)],'*int16').';
fclose(f);
end

Best Answer

I solved my problem with help from Joss. I had to create a custom image format via the imformats function:
I used my own binary file reader based on looking at the built in png reader functions and my code above. It is a huge speedup, and now I am able to eliminate the ReadFcn but still have my custom reader. The only issue I didn't solve was how to pass the dims variable into the reader instead of hard coding it.
function net = run_training_public(nbatch, lr, nepoch)
% Add custom image type to imread registry
create_custom_image_format()
% Load Data
ds = imageDatastore('./data/set3','IncludeSubfolders',true,...
'LabelSource','foldernames',...
'FileExtensions','.dat');
% load neural network structure
network = cnn1;
% Setup options for training and execute training
options = trainingOptions('adam','MaxEpochs',nepoch,'MiniBatchSize',...
nbatch,'Shuffle','every-epoch',...
'InitialLearnRate',lr,...
'ExecutionEnvironment','gpu','Verbose',true);
net = trainNetwork(ds,network,options);
end
function create_custom_image_format()
fmts = imformats; % don't add if already in registry
if ~any(contains([fmts.ext],'dat'))
out.ext = 'dat';
out.isa = @isdat;
out.info = [];
out.read = @custom_image_reader;
out.write = [];
out.alpha = 0;
out.description = 'Custom Data Format';
imformats('add',out);
end
end
function tf = isdat(filename)
% Returns true if file is type .dat
[~,~,extn] = fileparts(filename);
tf = strcmp(extn,'.dat');
end
function [X, map] = custom_image_reader(filename)
dims = [$m $n]; % <-HARD CODE DIMENSIONS OF DATA HERE
f=fopen(filename,'r');
X = reshape(fread(f,'*int16'),dims(2), dims(1)).';
fclose(f);
map = [];
end