Nets with one hidden layer can be universal approximators.
From 1979 to 2003 I used FORTRAN to design 2 class classifiers. They had a single RBF hidden layer with nodes connected to one (not both) of two outputs.
The number of hidden nodes was dependent on how many types of radar missile target data was available.
Over the past 12 years of retirement, 99.9% of my designs (regression, classification and time-series) are fully connected MATLAB functions with a single hidden layer.
Typically, my designs automatically search for the minimum number of hidden nodes that will yield the desired error rate.
I have posted zillions of examples in both the NEWSGROUP and ANSWERS. Just search including the phrase
As far as choosing inputs, if there aren't too many, I can tolerate a few ineffective ones. Otherwise, I just use STEPWISEFIT on a Linear Model for input variable selection.
I have used other more sophisticated NN techniques like PCA, PLS, forward-search and backward-search. In general, however, the Linear Model feature selection has been fast and satisfactory.
Hope this helps.
Thank you for formally accepting my answer
Greg
Best Answer