Solved – One-shot object detection with Deep Learning

computer visiondeep learningimage processingneural networksobject detection

In the recent years, the field of object detection has experienced a major breakthrough after the popularization of the Deep Learning paradigm. Approaches such as YOLO, SSD or FasterRCNN hold the state of the art in the general task of object detection [1].

However, in the specific application scenario where we are given only one reference image for the object/logo we want to detect, deep learning-based methods seem to be less applicable and local feature descriptors such as SIFT and SURF appear as more suitable alternatives, with a near-zero deployment cost.

My question is, can you point out some application strategies (preferably with implementations available rather than just research papers describing them) where Deep Learning is successfully used for object detection with just one training image per object class?

Example application scenario:

Single training/reference image
Sample image where we have to detect the logo

In this case, SIFT successfully detects the logo in the image:

Successful detection with SIFT

Best Answer

As it turns out, just training a ordinary object detection network with a bunch of data augmentation will get you some decent results.

I took the "coca cola" logo from your post, and performed some random augmentations on it. Then I downloaded 10000 random images from flickr and randomly pasted the logo onto these images. I also added random red regions to the images so the network wouldn't learn that any red blob was a valid object. Some samples from my training data:

enter image description here enter image description here enter image description here enter image description here

I then trained an RCNN model on this dataset. Here are some test-set images I found on google images, and the model seems to do pretty ok.

enter image description here enter image description here enter image description here

The results aren't perfect, but I slapped this together in about 2 hours. I expect with a bit more care spent with data generation and with training the model, you could get far better results.

I think ideas from papers such as Learning to Model the Tail could be used to allow learning of new object categories with just one or a few examples, instead of needing to generate a bunch of data like I did, but I'm not aware of them doing any experiments with object detection.