1. It is very seldom that you will need
a. That many inputs
b. More than 1 hidden layer
c. Anywhere near that many hidden nodes.
2. Typically, if you transform your variables to zero-mean/unit-variance via ZSCORE or MAPSTD, the coefficients of a linear model will indicate which variables can probably be ignored because they are either weakly correlated to the target OR are highly correlated with other variables.
Alternatives are
a. Add squares and/or cross-products to the linear (in coefficients) model
b. Use functions STEPWISE and/or STEPWISEFIT
3. PLEASE
a. Do not post commands that assign default values.
b. Include results of applying your code to an accessible data set so
that we know we are on the same page.
c. Instead of posting your huge dataset, just pick one of the MATLAB
example sets
help nndatasets
doc nndatasets
4. For the purpose of reproducibility, initialize the RNG before obtaining the random initial weights and random trn/val/tst data division.
5. I have posted many tutorials that emphasize minimizing the number of hidden nodes, H, to obtain better performance on non-training (validation, test and unseen) data.
6. Basically, you would like the number of unknown weights
to be much less than the number of training equations
A necessary condition is
H <= Hub = floor((Ntrneq-O)/(I + O +1))
However H << Hub is preferable.
With I = 20, O = 1, N = 1000
Ntrneq = Ntrn = 700
Hub = 45
7. My tutorials will explain how to perform a double loop search for
a. No. of hidden nodes
b. Initial RNG state (reproducible initial weights & datadivision).
8. For regression, search on subsets of
greg fitnet tutorial Ntrials
Hope this helps.
Greg
Best Answer