Solved – Which image format is better for machine learning .png .jpg or other

computer visionimage processingmachine learningneural networkspython

I'm trying to train a neural network with images. Since I'm extracting images from a video feed I can convert them either to .png or .jpg. Which format is preferred for machine learning and deep learning. My neural network model contains convolutional layers, max pooling layers and image resizing.

Best Answer

Here is a real-life case: accurate segmentation pipeline for 4K video stream (here are some examples). I do rely on conventional computer vision as well as on neural nets, so there is a need to prepare high-quality training sets. Also, it is somewhat impossible to find training sets for some specific objects:

(See in action)

Long story short it is about 1TB of data required to create a training set and do additional post-processing. I use ffmpeg and store extracted frames as JPG. There is no reason to use PNG because of the following:

video stream is already compressed
any single frame from the compressed stream will contain some artefacts
it might look a bit strange to use lossless compression for lossy compressed data
there is no reason to consume more space storing the same data
also, there is no reason to consume additional bandwidth

Let's do a quick test (really quick). Same 4K stream, same settings, extracting a frame as PNG and as JPG. If you see any difference -- good for you :) Any real-life problem will likely be related to a compressed video stream because bandwidth is critical.

PNG

JPG

Finally

If you need more details -- use 4K (or 8K if you need even more valuable details). Pretty much all the examples I have are based on 4K input. FPS is what actually matters when you try to deal with real-life scenes and fast moving objects.

(see in action)

It goes without saying camera and light conditions are the most critical preconditions for getting proper level of the details.

Best Answer

Related Solutions

Solved – Why neural and convolutional neural network detect edges first

Solved – Why does VGG16 double number of features after each maxpooling layer

Related Question