Solved – Object Localisation without Classification

image processingmachine learning

I have a data set of photos containing an object in each of them. I want to find out the coordinates of rectangle enclosing the object.

Note that each photo contains exactly 1 object (for example, if there is a pair of shoes in the photo it is to be treated as one object), and the photos are taken in a simple white background. But the images do not contain one class of objects, the object can be anything.

I have a training set, consisting of photos, and the coordinates of the rectangle enclosing the object for these photos. And I want to find the coordinates of the enclosing rectangle, given a new photo (exactly 1 object, photos taken in simple white background).

I searched a lot for a method to do so, and found resources for achieving localization with classification, but neither do I want to classify the objects nor do I have class labels in my training set.

I also thought edge detection and object segmentation methods could be useful.

However, I feel that my task is much simpler since I know that I have to localize only 1 object in an image and the background is also simple, so there must be some simple methods I am overlooking.

Any guidance is much appreciated, and I am relatively new to machine learning so I would be grateful for guidance to implement the appropriate technique.

Best Answer

If photos are taken in a simple white background, and the object appearance are pretty distinguishable from the background. You do not really have to do as heavy as deep learning based method.

The task might fall into multiple aspects in computer vision, for example, foreground/background segmentation using Markov Random Field / Conditional Random Field / GraphCut.

If insisting using deep learning method, a look into the saliency detection topic might be helpful. This is a widely studied area with both traditional and deep learning methodology.

Related Solutions

Solved – Use Edge detection in Image classification

Your approach goes in the line of the popular histogram of gradients approach. See here and the corresponding Wikipedia entry. Now unless you have some already labelled data, training such a system is quite laborious. If possible, I would start by using some available implementation to experiment with, like the one offered by scikit-image.

There are some other features, like Linear Binary Pattern, but they're not as powerful as HOG. See in the module corresponding of scikit-image for a list of features and their implementations.

As for CNN, you should not need to extract any features. The system learns the features automatically. That is one of the nice properties of deep architectures. A huge number of papers show that these systems learn some edge oriented filters features (in the same line as the idea you are considering).

Note that these features do not consider color. That may be an interesting feature for you to consider. Or extract the features for each of the color channels.

Hope this helps.

Solved – Detecting manipulation (e.g, photo copy-pasting) in images

In general, it's hard to detect tampering and it's a whole field of research in digital image forensics. I'll try to summarise some of the key approaches to this problem. What you're talking about is sometimes called image forgery or image tampering. And the copy-paste operation is called image composition or image splicing.

From a practical perspective there are number of different variants to this problem:

add something to the image (source)
removing something from the image

(source)

changing global properties of the image (source)
using one image vs. multiple images e.g. this use of the clone tool: (source)
detecting whether if an image has been tampered vs. localising the tampering
determining the type of tampering

How you solve the problem is going to be very different depending on whether you are involved in a reviewing video surveillance footage, examining a single photo at a court case or running a photo sharing site. The problem is substantially harder if the problem is adversarial and the image manipulation may have been hidden.

Another point is that there is a lot of legitimate postprocessing that happens in images. To take an extreme example new digital camera introduce bokeh and blurring effects even though this is not present in the finished image. So if you are interested in detecting more general types of image manipulation beyond image splicing it's helpful to be aware of what's happening in cameras and apps.

A digital image is acquired on a camera as follows:

scene $\rightarrow$ imaging sensor $\rightarrow$ on camera postprocessing $\rightarrow$ storage

where

the scene is the external geometry of the image
the image sensor is a CCD or CMOS photodetector which converts light into electrical charge
postprocessing is where the camera is where the electrical charge is converted into a digital signal and several corrective steps are taken to account for camera geometry, colour correction, etc.
storage of is where the finished image written to memory. Often it's converted into a compressed format such as JPEG and stored along with relevant metadata.

By considering the acquisition process you can see several possible points where tampering will result in inconsistencies in the image:

physical scene geometry
sensor and acquisition noise
postprocessing and compression artifacts
metadata

Metadata. An obvious thing to look at is the metadata associated with the image, often it can have camera information, time information and possibly location information. All of these can possible identify inconsistency. If you have the statue of Liberty in your image but the GPS coordinates say you are at McMurdo Station in Antartica then the image is probably a forgery. But the metadata is itself easy altered or stripped so this is not reliable.

Sensor noise. Sensor noise can be quite distinctive for digital camera, so much so that it can used to fingerprint the sensors in different camera models. There are several distinct types of noise introduced by sensors in digital cameras, but a very useful kind is photo-response nonuniformity (PRNU). This is a fingerprint associated with sensor noise and postprocessing, and it is robust to several image processing transformations, including lossy compression such as downsampling. You can calculate the PRNU across blocks in the image, and introducing a new element from a different camera will introduce and inconsistency in the image. This seems to work pretty well, but it works best if you know the camera type. It's still possible to estimte PRNU from a single image. Color filter array interpolation should also be consistent across the image, and will be distrupted by splicing.

Compression and processing artifacts. All image processing techniques will leave a trace on the image statistics. Digital images are very commonly compressed via JPEG which compresses things using the discrete cosine transform. This process leaves traces in the image statistics. One interesting technique is to detect JPEG ghosts, that is parts of an image which have been compressed twice via DCT. As you mention, I believe that downsampling will remove some of these artifacts although the downsampling itself will be detectable.

Scene consistency. An image acquire from single source should have consistent perspective (vanishing points), and illumination. Moreover it's hard to fake these fake these with a composite image. I recommend looking through (Redi et al., 2011) for more details here.

Finally, if you say "Okay I give up. There's too many possible method, I just want a detector" you can look at this recent ICCV paper where they train a detector to find where an image has been manipulated. This may give you some more insight into training a blackbox model.

Bappy, Jawadul H., et al. "Exploiting Spatial Structure for Localizing Manipulated Image Regions." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.

Datasets/Contests:

Casia V1.0 and V2.0 (image splicing) http://forensics.idealtest.org/

coverage (copy-move manipulations) https://github.com/wenbihan/coverage

Media Forensics Challenge 2018 (various manipulations, requires registration) https://www.nist.gov/itl/iad/mig/media-forensics-challenge-2018

IEEE IFS-TC Image Forensics Challenge Dataset. (website currently unavailable)

Raise (raw, unprocessed images along with camera metadata) http://mmlab.science.unitn.it/RAISE/index.php

Surveys:

Redi, Judith A., Wiem Taktak, and Jean-Luc Dugelay. "Digital image forensics: a booklet for beginners." Multimedia Tools and Applications 51.1 (2011): 133-162. https://pdfs.semanticscholar.org/8e85/c7ad6cd0986225e63dc1b4264b3e084b3f9b.pdf

Fridrich, Jessica. "Digital image forensics." IEEE Signal Processing Magazine 26.2 (2009). http://ws.binghamton.edu/fridrich/Research/full_paper_02.pdf

Farid, Hany. Digital Image Forensics: lecture notes, exercises, and matlab code for a survey course in digital image and video forensics. http://www.cs.dartmouth.edu/farid/downloads/tutorials/digitalimageforensics.pdf

Kirchner, Matthias. Notes on digital image forensics and counter-forensics. Diss. Dartmouth College, 2012. http://ws.binghamton.edu/kirchner/papers/image_forensics_and_counter_forensics.pdf

Memon, Nasir. "Photo Forensics–There Is More to a Picture than Meets the Eye." International Workshop on Digital Watermarking. Springer, Berlin, Heidelberg, 2011.

Mahdian, Babak, and Stanislav Saic. "A bibliography on blind methods for identifying image forgery." Signal Processing: Image Communication 25.6 (2010): 389-399.

Image Tampering Detection and Localization (includes recent deep learning references) https://github.com/yannadani/image_tampering_detection_references

Best Answer

Related Solutions

Solved – Use Edge detection in Image classification

Solved – Detecting manipulation (e.g, photo copy-pasting) in images

Related Question