Solved – What are the current state-of-the-art convolutional neural networks

conv-neural-networkneural networksreferences

I'm interested in understanding which neural network architecture is currently the state of the art (sometimes abbreviated "SOTA") with respect to standard image classification tasks such as MNIST, STLN-10 and CIFAR. This is challenging because new results are published frequently, and it can be hard to keep up. Is there a resource or website which tracks the best results for these tasks?

Best Answer

The best suggestion is from shimao:

typically any new paper which claims good or state-of-the-art performance on any task will have a fairly comprehensive results table comparing with previous results, which can be good way to keep track.

Any leaderboard will soon become useless, because they're basically always maintained by (undergrad/grad) students, who stop updating them as soon as they get their degree/land a job. Anyway, if CIFAR-10 and CIFAR-100 are good enough for you, this is pretty good:

https://github.com/arunpatala/cifarSOTA

This one is more general (includes ImageNet) and it has more recent results:

https://github.com/Lextal/SotA-CV

This is the one I used to use, but the owner has stopped updating it, as it often happens:

https://github.com/RedditSota/state-of-the-art-result-for-machine-learning-problems/

Finally, you may be interested in this Jupyter notebook released just today by Ali Rahimi, based on data scraped from LSVRC and the COCO website.

One last note: if you’re looking for the latest results because you want to compare your results to SotA, great. However, if your goal is applying the “best” architecture on ImageNet to an industrial application using transfer learning, you should know (if you don’t already) that the latest architectures are worse, in terms of translation invariance, than the older ones. This is a risk if your dataset doesn’t have photographer bias of if you don’t have enough compute and data to retrain the architecture on a more useful image distribution. See the excellent preprint:

Azulay & Weiss, 2018, more “Why do deep convolutional networks generalize so poorly to small image transformations?”