Solved – What might a correct approach be, to image preprocessing for CNN for these specialised images

image processingmedicineneural networks

I am trying to understand the the best practices for preprocessing images prior to neural network training and I have some uncertainties despite reading extensively online about it. I am also uncertain if what I am reading (which is about color natural images with an RGB channel applies to my images, which are greyscale medical images like this one:

enter image description here

First, with an image such as this the R, B and G values in the RGB channel are identical. Many of the CNNs I have been experimenting on were developed on natural color images (such as AlexNet). Does this impact how such CNN might handle and greyscale image? If the input into the network is meant to be (3, 256, 256) then for an image like this, should that be changed to (1, 256, 256) or do the identical R,B and G values not matter?

Also, regarding normalisation. If I was to normalise the images with the mean pixel value and standard deviation, are those values the mean pixel value across the whole dataset or just that one image?

Does image size make a difference? For a human, a higher resolution image may be easier to interpret but does this also apply to a neural network?

I apologise for these simple questions. There are "half" answers to many of these questions (and I have followed many tutorials including Andrew Ngs) but my problem (these unusual images) is quite specialised and I have a feeling my image pre processing is incorrect. I also know if I get this wrong, I could waste weeks on data thats incorrectly prepared so some expert opinion would be great.

Best Answer

1) The input should be $(1,256,256)$. You should read about convolutional neural nets to understand better how images are processed. Your initial convolutional layer filters will have dimensions $(1,H,W)$, as there is no need to consider the color-depth of the image since you have one channel.

2) Normalization has no right answer and depends on context. Images can be normalized per image, per pixel, or even not normalized. As an example if you're dealing with medical CT scans, the images are standardized to Hounsfield units, then normalizing your images would make zero sense, since each image pixel is already the ground-truth of your CT scan and there's no concept of external contrast (unlike lighting conditions in photos). As well, sometimes whitening is applied to further normalize images.

3) Of course image size makes a difference. You need some minimal amount of resolution to identify the features you are interested in. For example, the images you posted possess complex structure of white and dark areas, along with detailed boundaries. If the details of those features are important then you need to maintain some minimal amount of resolution to detect them. In the medical example, a benign vs malignant cancerous growth can sometimes be distinguished by the regularity (waviness) of its boundary.

However image size drastically increases the time it takes to process the image. You'll only find out what a good resolution is by gradually scaling up your image until you achieve the accuracy you want. It would help to also make educated guesses about the minimal resolution. For example if the features you are interested in are on the order of 1/100 the scale of the image, then you should at least have a width and height of 100 pixels or more.