MATLAB: Stochastic gradient descent neural network updating net in matlab

Deep Learning Toolboxgradient descentnetneural networktraining

Is it possible to train (net) as stochastic gradient descent in matlab. If possible how?
I observe that it completely ignores the previous trained data's information update the complete information. It will be helpful for large scale training. If I train the complete data, it takes very long time.
For example train iteratively 100 part of the data.
TF1 = 'tansig';TF2 = 'tansig'; TF3 = 'tansig';% layers of the transfer function , TF3 transfer function for the output layers
net = newff(trainSamples.P,trainSamples.T,[NodeNum1,NodeNum2,NodeOutput],{TF1 TF2 TF3},'traingdx');% Network created
net.trainfcn = 'traingdm' ; %'traingdm';
net.trainParam.epochs = 1000;
net.trainParam.min_grad = 0;
net.trainParam.max_fail = 2000; %large value for infinity
while(1) // iteratively takes 10 data point at a time.
p %=> get updated with following 10 new data points

t %=> get updated with following 10 new data points
[net,tr] = train(net, p, t,[], []);
end

Best Answer

2. Use the largest nndataset in the NNTBX for an example
help nndataset
doc nndataset
3. It is worthwhile to look at static correlation coefficients (help/doc corrcoef) and plots to help find
a. inputs that are so weakly correlated to all of the targets that those inputs can be omitted.
b. inputs that are so highly correlated with other inputs that they can be omitted
4. It may be useful to look at the input dimensionality reduction obtained with linear models (help regress)
5. Try to use as many defaults as possible when starting a NN design. Defaults that should be overridden should become evident during design trials.
6. What are the dimensions of your input and target matrices?
7. How many hidden nodes?
8. It is not necessary to use more than one hidden layer.
9. I used the largest nndataset
[ x,t] = building_dataset;
with size(x) = [ 14 4208], size(t) = [ 3 4208 ] and H = 70 hidden nodes. This yields about 10 times more training equations,3*4208= 12,624 ,than there are unknown weights (14+1)*70+(70+1)*3 = 1263.
Since the net was not close to being overfit, I only used a training set and obtained an adjusted Rsquared of 0.99 in 72 seconds with a straight forward FITNET design.
However a design by looping over 10 randomly chosen subsets took 109 seconds. The syntax after the random shuffling using randperm(4208) was
M = 420 % floor(4208/10)
imax = 10
for i=1:imax
k = 1+M*(i-1) : M*i;
[ net tr y( : , k ) ] = train( net, x( : , k ), t( : , k ) );
end
This probably doesn't show a savings because 14*4208 is not too large for the default trainlm.
I think all you have to do is use a larger data set (enough to choke trainlm) and a more appropriate training function , e.g., trainscg or trainrp.
Hope this helps.
Thank you for formally accepting my answer
Greg