Normalization helps to prevent large signals from causing bounded activation functions in the first hidden layer ( like tansig and logsig) from operating in the asymptotic end regions where derivatives are so small that training may be stalled.
Keeping singals in the "active" nonasymptotic regime also helps numerical stability (See the comp.ai.neural-nets FAQ ).
What you have written in your post is very confusing.
How many classes, c do you have?
How large are your training and test sets?
[I Ntrn] = size(xtrn) % I = 9?
[O Ntrn] = size(ttrn) % O = c
[I Ntst] = size(xtst)
[O Ntst] = size(ttst)
Are the columns in your target matrices columns from the c-dimensional unit matrix eye(c)?
The number of training equations is Neq = Ntrn*O
For a I-H-O node topology the number of unknown weights is Nw = (I+1)*H+(H+1)*O.
If you are training to convergence, Neq >= Nw is required but Neq >> Nw is desired to mitigate measurement errors and noise in addition to obtaining good generalization with performance on nontraining data. The first requirement yields the following upper bound on the number of hidden nodes, H.
Hub = floor((Neq-O)/(I+O+1))
The second requirement yields H << Hub.
If you need a larger value of H to obtain satisfactory performance, use "Early Stopping" with a validation set and/or use regulatization with the objective function MSEREG.
Hope this helps.
Greg
Best Answer