Solved – What are the best methods for reducing false positives in TensorFlow Mask-RCNN object detection framework using transfer learning

I am training a single object detector with mask rcnn and I have tried several methods for reducing false positives. I started with a few thousand examples of images of the object with bounding boxes and trained that, got decent results, but when running on images that don't contain that object, would often get false matches with high confidence (sometimes .99).

The first thing I tried was adding the hard example miner in the config file. I believe I did this correctly because I added a print statement to ensure the object gets created. However none of the configs for faster rcnn have hard example mining in them. So I am suspicious that the miner only works correctly for ssd. I would expect a noticeable improvement with a hard example miner but I did not see it

The second thing I tried was to add "background" images. I set the minimum number of negatives to a non-zero value in the hard example miner config and added tons of background images that previously got false detections as part of the training. I even added these images into the tfrecords file so that it would be balanced evenly with images that do have the object. This approach actually made things worse – and gave me more false detections

The last thing I tried was creating another category, called "object-background" and took all the false matches and assigned them to this new category. This approach worked pretty well, but I view it as a hack.

I guess to summarize my main question is – what is the best method for reducing false positives within the current tensorflow object detection framework? Would SSD be a better approach since that seems to have a hard example miner built into it by default in the configs?

thanks

Best Answer

A lot of people I see online have been running into the same issue using Tensorflow API. I think there are some inherent problems with the idea/process of using the pretrained models with custom classifier(s) at home. For example people want to use SSD Mobile or Faster RCNN Inception to detect objects like "Person w/ helmet," "pistol," or "tool box," etc. The general process is to feed in images of that object, but most of the time, no matter how many images...200 to 2000, you still end up with false positives when you go actually run it at your desk.

The object classifier works great when you show it the object in its own context, but you end up getting 99% match on every day items like your bedroom window, your desk, your computer monitor, keyboard, etc. People have mentioned the strategy of introducing negative images or soft images. I think the problem has to do with limited context in the images that most people use. The pretrained models were trained with over a dozen classifiers in many variety of environments like in one example could be a Car on the street. The CNN sees the car and then everything in that image that is not a car is a negative image which includes the street, buildings, sky, etc.. In another image, it can see a Bottle and everything in that image which includes desks, tables, windows, etc. I think the problem with training custom classifiers is that it is a negative image problem. Even if you have enough images of the object itself, there isn't enough data of that that same object in different contexts and backgrounds. So in a sense, there is not enough negative images even if conceptually you shouldn't need negative images. When you run the algorithm at home you get false positives all over the place identifying objects around your own room. I think the idea of transfer learning in this way is flawed. We just end up seeing a lot of great tutorials online of people identifying playing cards, Millenium Falcons, etc., but none of those models are deployable in the real world as they all would generate a bunch of false positives when it sees anything outside of its image pool. The best strategy would be to retrain the CNN from scratch with a multiple classifiers and add the desired ones in there as well. I suggest re-introducing a previous dataset from ImageNet or Pascal with 10-20 pre-existing classifiers and add your own ones and retrain it.

Best Answer

Related Solutions

Related Question