MATLAB: How to find the right neural network architecture

baysian regularisationDeep Learning Toolboxlevenberg-marquardt algorithmneural network

Good morning,
I am trying to learn how to use the neural network to fit functions. I did read a little bit into the subject but I am still not sure how to find the right architecture (the number of neuron in a hidden layer. I use networks with 1 hidden layer and my training algorithm are 'trainlm' and 'trainbr'. Currently I am aware of 4 problems that can occur:
+ Algorithm reaches a local minimum: the best training performance (tr.best_perf) is too large?
+ Overfitting: the best validation performance (tr.best_vperf) is much larger than the best training performance (tr.best_perf)?
+ Underfitting: the best validation performance (tr.best_vperf), the best training performance (tr.best_perf), the best test performance (tr.best_tperf) are in the similar size but they are still too large.
+ Extrapolating: the best test error (tr.best_tperf) is much larger than the two other ones.
Currently, I wrote a loop that examine networks with 1 neuron to 50 neurons. Each network (e.g. a network with 20 neurons) is trained for 10 times and the one with the lowest training performance (tr.best_perf) is chosen in order to avoid the local minimum. Afterwards, I store tr.best_tperf, tr.best_vperf and tr.best_perf of that network in a array. Finally I compare those 50 networks to each other and take the one with the lowest error, with error = max([tr.best_tperf, tr.best_vperf, tr.best_perf]).
The other way to go would be to train each network (e.g. a network with 20 neurons) for 10 times and choose the lowest error, with error = max([tr.best_tperf, tr.best_vperf, tr.best_perf]). Then I store this error for each network in a vector. Finally, I choose the network with the lowest element of that vector.
Can someone tell me which way is the correct way? I really appreciate any help you can provide.

Best Answer

Search the NEWSREADER and ANSWERS using
fitnet Hmin Hmax Ntrials
Minimization of the number of hidden nodes subject to the MSEtrn upper bound
MSEtrn <= 0.01*mean(var(targettrn',1))
<= 0.01*var(targettrn,1) for 1-dim
this yields a training subset Rsquaretrn exceeding 0.99.
Many of the posts don't have the training subset subscript trn and/or may have used t instead of target. So, there are probably a jillion variations posted including
MSEgoal = 0.01*vart1
The best way I have found to obtain relatively unbiased results is to use 2 loops.
1. Outer loop over # of hidden nodes Hmin:dH:Hmax
with Hmax <= Hub, the upper bound for not
having more unknown weights, Nw, than training
equations Ntrneq.
2. Inner loop over Ntrials >= 10 different
random distributions of initial weights.
Nets are initially ranked by their validation subset performance. Then unbiased estimates of performance are obtained from the test subset performance.
However, I usually rank the nets by their combined nontraining validation AND test subset performance.
Again, I have jillions of examples posted in the NEWSREADER and ANSWERS. The best search words are probably
Hmin Hmax Ntrials
Hope this helps.
Thank you for formally accepting my answer
Greg