MATLAB: Marquardt adjustment parameter (trainbr)

Deep Learning Toolboxtrainbr

In "trainbr" of neural network toolbox, the following lines have been used

% net.trainParam.mu 0.005 Marquardt adjustment parameter % net.trainParam.mu_dec 0.1 Decrease factor for mu % net.trainParam.mu_inc 10 Increase factor for mu % net.trainParam.mu_max 1e-10 Maximum value for mu

Could you please explain the meaning of "mu" and "mu_max"? what is their significance here?

thank you

Best Answer

http://en.wikipedia.org/wiki/Levenberg%E2%80%93Marquardt_algorithm

Related Solutions

MATLAB: About ANN and Levenberg Marquardt theory

I have found that the default parameters work well about 99% of the time.

So, you really don't have to set any parameters unless your training is having a problem with the defaults.

I typically design 100 nets at a time. If training is slow (typically because of a large training set), I increase the MSEgoal and MinGrad to shorten training time

 net.trainParam.goal     = MSEgoal = 0.01*mean(var(target',1))
 net.trainParam.min_grad = MinGrad = MSEgoal/100

When mse=MSEgoal, the net has successfully modeled 99% of the target variance. That is good enough for me.

For examples, search the NEWSGROUP and ANSWERS using

 greg MSEgoal

Hope this helps.

Thank you for formally accepting my answer

Greg

MATLAB: Problem in using neural network toolbox

Documentation Excerpts:

help trainlm

 trainlm is often the fastest backpropagation algorithm in the toolbox,
 and is highly recommended as a first choice supervised algorithm,
 although it does require more memory than other algorithms.

doc trainlm

 The parameter mem_reduc indicates how to use memory and speed to
 calculate the Jacobian jX. If mem_reduc is 1, then trainlm runs the
 fastest, but can require a lot of memory. Increasing mem_reduc to 2 cuts
 some of the memory required by a factor of two, but slows trainlm
 somewhat. Higher states continue to decrease the amount of memory needed
 and increase training times.

doc nnet

Neural Network Toolbox

   User's Guide
      Multilayer networks and Backpropagation
         Train the Network 
            Training Algorithms
 The fastest training function is generally trainlm, and
 it is the default training function for feedforwardnet. The quasi-Newton
 method, trainbfg, is also quite fast. Both of these methods tend to be
  less efficient for large networks (with thousands of weights), since they
 require more memory and more computation time for these cases. Also,
 trainlm performs better on function fitting (nonlinear regression)
 problems than on pattern recognition problems.
 When training large networks, and when training pattern recognition
 networks, trainscg and trainrp are good choices. Their memory
 requirements are relatively small, and yet they are much faster than
 standard gradient descent algorithms.
 See "Speed and Memory Comparison for Training Multilayer Networks" for a
 full comparison of the performances of the training algorithms shown in
 the table above.
                Efficiency and Memory Reduction
 There are some network parameters that are helpful when training large
 networks or using large data sets. For example, the parameter
     net.efficiency.memoryReduction
 can be used to reduce the amount of memory that you use while training or
 simulating the network. If this parameter is set to 1 (the default), the
 maximum memory is used, and the fastest training times will be achieved.
 If this parameter is set to 2, then the data is divided into two parts.
 All calculations (like gradients and Jacobians) are done first on part
 one, and then later on part two. Any intermediate variables used in part
 1 are released before the part 2 calculations are done. This can save
 significant memory, especially for the trainlm training function. If
 memoryReduction is set to N, then the data is divided into N parts, which
 are computed separately. The larger the value of N, the larger the
 reduction in memory use, although the amount of reduction diminishes as N
 is increased. There is a drawback to using memory reduction. A
 computational overhead is associated with computing the Jacobian and
 gradient in submatrices. If you have enough memory available, then it is
 better to leave memoryReduction set to 1 and to compute the full Jacobian
 or gradient in one step. If you have a large training set, and you are
 running out of memory, then you should set memoryReduction to 2 and try
 again. If you still run out of memory, continue to increase
 memoryReduction.
          Post-Training Analysis (Network Validation)
             Limitations and Cautions
 You would normally use Levenberg-Marquardt training for small and medium
 size networks, if you have enough memory available. If memory is a
 problem, then there are a variety of other fast algorithms available. For
 large networks you will probably want to use trainscg or trainrp.
          Advanced Topics
             Speed and Memory Comparisons for Training Multilayer Networks
It is very difficult to know which training algorithm will be the
fastest for a given problem. It depends on many factors, including the
complexity of the problem, the number of data points in the training
set, the number of weights and biases in the network, the error goal,
and whether the network is being used for pattern recognition
(discriminant analysis) or function approximation (regression). This
section compares the various training algorithms. Feedforward networks
are trained on six different problems. Three of the problems fall in the
pattern recognition category and the three others fall in the function
approximation category. Two of the problems are simple "toy" problems,
while the other four are "real world" problems. Networks with a variety
of different architectures and complexities are used, and the networks
are trained to a variety of different accuracy levels.

The benchmark problems can be found in the rest of the documentation.