I have a binary classification problem for financial ratios and variables. When I use newff
(with trainlm
and mse
and threshold of 0.5 for output) I have a high classification accuracy (5-fold cross validation – near 89-92%) but when I use patternnet
(trainscg
with crossentropy
) my accuracy is 10% lower than newff
. (I normalized data before insert it to network – mapminmax
or mapstd
)
When I use these models for out-sample data (for current year- created models designed based one previous year(s) data sets) I have better classification accuracies in patternnet
with better sensitivity and specificity. For example I have these results in my problem:
Newff:
Accuracy: 92.8% sensitivity: 94.08% specificity: 91.62%
Out sample results: accuracy: 60% sensitivity: 48% and specificity: 65.57%
Patternnet:
Accuracy: 73.31% sensitivity: 69.85% specificity: 76.77%
Out sample results: accuracy: 70% sensitivity: 62.79% and specificity: 73.77%
Why we have these differences between newff
and patternent
. Which model should I use?
Thanks.
Best Answer
On face value I would recommend using
patternnet
as it gives you better out of sample performance; the results fromnewff
seems suspiciously good leading me to believe some over-fitting occurs. On that matter check the following link: Improve Neural Network Generalization and Avoid Overfitting.To comment on the different results: For
newff
a Levenberg-Marquardt backpropagation is utilized while forpatternnet
, scaled conjugate gradient backpropagation. In general, different optimization procedures are not guaranteed to arrive in the same result even if they had the target function to optimize against. In your case through you are also using different target functions (mse
andcrossentropy
respectively). It would probably be alarming you if did got the same results as you are fitting different criteria. :)Having said that, using
newff
seems a bit odd. It is considered obsolete since R2010b and you are recommend (by the docs) to usefeedforwardnet
. Try usingfeeforwardnet
first and then decide on which procedure you will ultimately use. As it stands it seems like you comparing the performance of a function (newff
) people have not worked on for at least 4 years (if not more) against the performance of a function (patternnet
) that is actively developed. It is not really surprising that the latter one it does a better job.