Solved – What are the ways to calculate the error rate of a deep Convolutional Neural Network, when the network produces different results using the same data

conv-neural-networkdeep learningkerasobject detectionpython

I am new to the object recognition community. Here I am asking about the broadly accepted ways to calculate the error rate of a deep CNN when the network produces different results using the same data.

1. Problem introduction

Recently I was trying to replicate some classic deep CNNs for the object recognition tasks. Inputs are some 2D image data including objects and the output are the identification/classification results of the object. The implementation involves the use of Python and Keras.

The problem I was facing is that, I may get different validation results among multiple runs of the training even using the same training/validation data sets. To me, that made it hard to report the error rate of the model since every time the validation result may be different.

I think this difference is because of the randomness involved in different aspects of deep CNN, such as random initialization, the random ‘dropout’ used in the regulation, the ‘shuffle’ process used in the choosing of epochs, etc. But I do not know yet the “right” ways to deal with this difference when I want to calculate the error rate in object recognition field.

2. My exploration – online search

I have found some answers online here. The author proposed two ways, and he/she recommended the first one shown below:

The traditional and practical way to address this problem is to run your network many times (30+) and use statistics to summarize the performance of your model, and compare your model to other models.

The second way he/she introduced is to go to every relevant aspect of the deep CNN, to "freeze" their randomness nature on purpose. This kind of approach has also been introduced from Keras Q&A here. They call this issue the “making reproductive results”.

3. My exploration – in academia community (no result yet, need your help!)

Since I was not sure whether the two ways mentioned above are the “right” ones broadly accepted, I was going further exploring in the object recognition academia community.

Now I just begin to read from imageNet website. But I have not found the answer yet. Maybe you could help me knowing the answer easier. Thanks!

Daqi

Best Answer

The short answer to this is, cross validation, when appropriate. Most of the time this is not possible due to model size and time needed, so that's why lots of public datasets have a standardized validation dataset, on which all models are evaluated on. At least that way, two different models can still be evaluated against each-other on data neither has seen. You can get a sense of your model's variance by selecting large, equal sized subsets from your test and validation data, to see how much the accuracy changes between the two.