User's Guide
Multilayer networks and Backpropagation
Train the Network
Training Algorithms
The fastest training function is generally trainlm, and
it is the default training function for feedforwardnet. The quasi-Newton
method, trainbfg, is also quite fast. Both of these methods tend to be
less efficient for large networks (with thousands of weights), since they
require more memory and more computation time for these cases. Also,
trainlm performs better on function fitting (nonlinear regression)
problems than on pattern recognition problems.
When training large networks, and when training pattern recognition
networks, trainscg and trainrp are good choices. Their memory
requirements are relatively small, and yet they are much faster than
standard gradient descent algorithms.
See "Speed and Memory Comparison for Training Multilayer Networks" for a
full comparison of the performances of the training algorithms shown in
the table above.
Efficiency and Memory Reduction
There are some network parameters that are helpful when training large
networks or using large data sets. For example, the parameter
net.efficiency.memoryReduction
can be used to reduce the amount of memory that you use while training or
simulating the network. If this parameter is set to 1 (the default), the
maximum memory is used, and the fastest training times will be achieved.
If this parameter is set to 2, then the data is divided into two parts.
All calculations (like gradients and Jacobians) are done first on part
one, and then later on part two. Any intermediate variables used in part
1 are released before the part 2 calculations are done. This can save
significant memory, especially for the trainlm training function. If
memoryReduction is set to N, then the data is divided into N parts, which
are computed separately. The larger the value of N, the larger the
reduction in memory use, although the amount of reduction diminishes as N
is increased. There is a drawback to using memory reduction. A
computational overhead is associated with computing the Jacobian and
gradient in submatrices. If you have enough memory available, then it is
better to leave memoryReduction set to 1 and to compute the full Jacobian
or gradient in one step. If you have a large training set, and you are
running out of memory, then you should set memoryReduction to 2 and try
again. If you still run out of memory, continue to increase
memoryReduction.
Post-Training Analysis (Network Validation)
Limitations and Cautions
You would normally use Levenberg-Marquardt training for small and medium
size networks, if you have enough memory available. If memory is a
problem, then there are a variety of other fast algorithms available. For
large networks you will probably want to use trainscg or trainrp.
Advanced Topics
Speed and Memory Comparisons for Training Multilayer Networks
It is very difficult to know which training algorithm will be the
fastest for a given problem. It depends on many factors, including the
complexity of the problem, the number of data points in the training
set, the number of weights and biases in the network, the error goal,
and whether the network is being used for pattern recognition
(discriminant analysis) or function approximation (regression). This
section compares the various training algorithms. Feedforward networks
are trained on six different problems. Three of the problems fall in the
pattern recognition category and the three others fall in the function
approximation category. Two of the problems are simple "toy" problems,
while the other four are "real world" problems. Networks with a variety
of different architectures and complexities are used, and the networks
are trained to a variety of different accuracy levels.
Best Answer