Solved – Detecting manipulation (e.g, photo copy-pasting) in images

image processingmachine learningmanipulation-detectionneural networkssupervised learning

I am looking for a solution to detect photos that are manipulated with tools such as Photoshop.
For a start, I want to detect copy-pasted images.

Any idea how to detect photos that are manipulated by pasting another photo on the top of the original photo?

For example, detecting a photo of an id card with a photo of a face pasted in the place of an original face.

To make it even more difficult, let's assume we down sample the image after pasting the face in place. This will smooth the sharp edges of the pasted image.

Update 1:

1) It seems that compression techniques as well as straight forward cnn training don't work.

2) This is a relevant post

3) This, is a summary of photo forensic methods.

Update 2:

Since there was no real progress in here, I am starting a bounty.

Update 3:

Thank to the bounty and @machine-epsilon, we have a valid answer!

Update 4:

Since this paper came out at ICCV2019, I just add it here.

Best Answer

In general, it's hard to detect tampering and it's a whole field of research in digital image forensics. I'll try to summarise some of the key approaches to this problem. What you're talking about is sometimes called image forgery or image tampering. And the copy-paste operation is called image composition or image splicing.

From a practical perspective there are number of different variants to this problem:

add something to the image (source)
removing something from the image

(source)

changing global properties of the image (source)
using one image vs. multiple images e.g. this use of the clone tool: (source)
detecting whether if an image has been tampered vs. localising the tampering
determining the type of tampering

How you solve the problem is going to be very different depending on whether you are involved in a reviewing video surveillance footage, examining a single photo at a court case or running a photo sharing site. The problem is substantially harder if the problem is adversarial and the image manipulation may have been hidden.

Another point is that there is a lot of legitimate postprocessing that happens in images. To take an extreme example new digital camera introduce bokeh and blurring effects even though this is not present in the finished image. So if you are interested in detecting more general types of image manipulation beyond image splicing it's helpful to be aware of what's happening in cameras and apps.

A digital image is acquired on a camera as follows:

scene $\rightarrow$ imaging sensor $\rightarrow$ on camera postprocessing $\rightarrow$ storage

where

the scene is the external geometry of the image
the image sensor is a CCD or CMOS photodetector which converts light into electrical charge
postprocessing is where the camera is where the electrical charge is converted into a digital signal and several corrective steps are taken to account for camera geometry, colour correction, etc.
storage of is where the finished image written to memory. Often it's converted into a compressed format such as JPEG and stored along with relevant metadata.

By considering the acquisition process you can see several possible points where tampering will result in inconsistencies in the image:

physical scene geometry
sensor and acquisition noise
postprocessing and compression artifacts
metadata

Metadata. An obvious thing to look at is the metadata associated with the image, often it can have camera information, time information and possibly location information. All of these can possible identify inconsistency. If you have the statue of Liberty in your image but the GPS coordinates say you are at McMurdo Station in Antartica then the image is probably a forgery. But the metadata is itself easy altered or stripped so this is not reliable.

Sensor noise. Sensor noise can be quite distinctive for digital camera, so much so that it can used to fingerprint the sensors in different camera models. There are several distinct types of noise introduced by sensors in digital cameras, but a very useful kind is photo-response nonuniformity (PRNU). This is a fingerprint associated with sensor noise and postprocessing, and it is robust to several image processing transformations, including lossy compression such as downsampling. You can calculate the PRNU across blocks in the image, and introducing a new element from a different camera will introduce and inconsistency in the image. This seems to work pretty well, but it works best if you know the camera type. It's still possible to estimte PRNU from a single image. Color filter array interpolation should also be consistent across the image, and will be distrupted by splicing.

Compression and processing artifacts. All image processing techniques will leave a trace on the image statistics. Digital images are very commonly compressed via JPEG which compresses things using the discrete cosine transform. This process leaves traces in the image statistics. One interesting technique is to detect JPEG ghosts, that is parts of an image which have been compressed twice via DCT. As you mention, I believe that downsampling will remove some of these artifacts although the downsampling itself will be detectable.

Scene consistency. An image acquire from single source should have consistent perspective (vanishing points), and illumination. Moreover it's hard to fake these fake these with a composite image. I recommend looking through (Redi et al., 2011) for more details here.

Finally, if you say "Okay I give up. There's too many possible method, I just want a detector" you can look at this recent ICCV paper where they train a detector to find where an image has been manipulated. This may give you some more insight into training a blackbox model.

Bappy, Jawadul H., et al. "Exploiting Spatial Structure for Localizing Manipulated Image Regions." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.

Datasets/Contests:

Casia V1.0 and V2.0 (image splicing) http://forensics.idealtest.org/

coverage (copy-move manipulations) https://github.com/wenbihan/coverage

Media Forensics Challenge 2018 (various manipulations, requires registration) https://www.nist.gov/itl/iad/mig/media-forensics-challenge-2018

IEEE IFS-TC Image Forensics Challenge Dataset. (website currently unavailable)

Raise (raw, unprocessed images along with camera metadata) http://mmlab.science.unitn.it/RAISE/index.php

Surveys:

Redi, Judith A., Wiem Taktak, and Jean-Luc Dugelay. "Digital image forensics: a booklet for beginners." Multimedia Tools and Applications 51.1 (2011): 133-162. https://pdfs.semanticscholar.org/8e85/c7ad6cd0986225e63dc1b4264b3e084b3f9b.pdf

Fridrich, Jessica. "Digital image forensics." IEEE Signal Processing Magazine 26.2 (2009). http://ws.binghamton.edu/fridrich/Research/full_paper_02.pdf

Farid, Hany. Digital Image Forensics: lecture notes, exercises, and matlab code for a survey course in digital image and video forensics. http://www.cs.dartmouth.edu/farid/downloads/tutorials/digitalimageforensics.pdf

Kirchner, Matthias. Notes on digital image forensics and counter-forensics. Diss. Dartmouth College, 2012. http://ws.binghamton.edu/kirchner/papers/image_forensics_and_counter_forensics.pdf

Memon, Nasir. "Photo Forensics–There Is More to a Picture than Meets the Eye." International Workshop on Digital Watermarking. Springer, Berlin, Heidelberg, 2011.

Mahdian, Babak, and Stanislav Saic. "A bibliography on blind methods for identifying image forgery." Signal Processing: Image Communication 25.6 (2010): 389-399.

Image Tampering Detection and Localization (includes recent deep learning references) https://github.com/yannadani/image_tampering_detection_references

Best Answer

Related Solutions

Solved – Detecting a given face in a database of facial images

Solved – Are there mathematical reasons for convolution in neural networks beyond expediency