Two possibilities
1. The training data does not adequately characterize the total data set.
2. The net is overfit with too many weights AND the net is overtrained past the point where it
trades the ability to work well on nontraining data to further the decrease in training error.
Are you using validation stopping?
Are your training, validation and test sets randomly chosen?
What are the data division ratios?
Hope this helps.
Thank you for formally accepting my answer
Greg
Best Answer