Solved – Image classification where images are of different dimensions, resolutions, etc

classificationimage processing

I am working on an image classification problem where I have black and white images of all different dimensions and resolutions. The images belong to 1 of 5 groups including an unknown category. These images are stored sparsely, by which I mean as a list of coordinates.

For example the first three observations for an image might look like this:

X Y
0 0
0 1
2 3
: :

where there are points at (0,0), (0,1), and (2,3). Since the images are black and white, these coordinates correspond to the pixels in the image that are black.

My goal is to classify these images into 1 of 5 groups. I have some metadata that I also plan to use, such as date the image was created, but ultimately I need to find meaningful features from the images themselves. My plan for this was to run PCA and feed the features into SVM or random forests model.

I am somewhat familiar with the classic handwriting digits example, however I am not familiar with cases where images are not of standard dimension or resolution.

One idea that I have had is to divide each image into a fixed number of cells, say a 100×100 grid. This way each image would be described by 1000 values which I could then run PCA on. My concern with this idea is that it reduces the information available for classifying the images.

The Question

Much of the work on image processing that I have seen uses examples where the images are of comparable size, resolution, etc. but how do we develop models for image classification where the images are not standardized in this way?

Best Answer

I would try LBP (or any other descriptor, based on what your images (and task, i.e. classes) actually are). To deal with different sizes, you can use Bag of Words encoding.

Related Solutions

Solved – How to calculate number of features based on image resolution

Perhaps a simpler case will make things clearer. Lets say we choose a 1x2 sample of pixels instead of 100x100.

Sample Pixels From the Image

+----+----+
| x1 | x2 |
+----+----+

Imagine when plotting our training set, we noticed that it can't be separated easily with a linear model, so we choose to add polynomial terms to better fit the data.

Let's say, we decide to construct our polynomials by including all of the pixel intensities, and all possible multiples that can be formed from them.

Since our matrix is small, let's enumerate them:

$$x_1,\ x_2,\ x_1^2,\ x_2^2,\ x_1 \times x_2,\ x_2 \times x_1 $$

Interpreting the above sequence of features can see that there is a pattern. The first two terms, group 1, are features consisting only of their pixel intensity. The following two terms after that, group 2, are features consisting of the square of their intensity. The last two terms, group 3, are the product of all the combinations of pairwise (two) pixel intensities.

group 1: $x_1,\ x_2$

group 2: $x_1^2,\ x_2^2$

group 3: $x_1 \times x_2,\ x_2 \times x_1$

But wait, there is a problem. If you look at the group 3 terms in the sequence ($ x_1 \times x_2$ and $x_2 \times x_1$) you'll notice that they are equal. Remember our housing example. Imagine having two features x1 = square footage, and x2 = square footage, for the same house... That doesn't make any sense! Ok, so we need to get rid of the duplicate feature, lets say arbitrarily $x_2 \times x_1$. Now we can rewrite the list of group three features as:

group 3: $x_1 \times x_2$

We count the features in all three groups and get 5.

But this is a toy example. Lets derive a generic formula for calculating the number of features. Let's use our original groups of features as a starting point.

$size group 1 + size group 2 + size group 3 = m \times n + m \times n +m \times n = 3 \times m \times n$

Ah! But we had to get rid of the duplicate product in group 3.

So to properly count the features for group 3 we will need a way to count all unique pairwise products in the matrix. Which can be done with the binomial coefficient, which is a method for counting all possible unique subgroups of size k from an equal or larger group of size n. So to properly count the features in group 3 calculate $C(m \times n, 2)$.

So our generic formula would be:

$$ m \times n + m \times n +C(m \times n, 2) = 2m \times n + C(m \times n, 2) $$

Lets use it to calculate the number of features in our toy example:

$$2 \times 1 \times 2 + C(1 \times 2, 2) = 4 + 1 = 5$$

Thats it!

Solved – What might a correct approach be, to image preprocessing for CNN for these specialised images

1) The input should be $(1,256,256)$. You should read about convolutional neural nets to understand better how images are processed. Your initial convolutional layer filters will have dimensions $(1,H,W)$, as there is no need to consider the color-depth of the image since you have one channel.

2) Normalization has no right answer and depends on context. Images can be normalized per image, per pixel, or even not normalized. As an example if you're dealing with medical CT scans, the images are standardized to Hounsfield units, then normalizing your images would make zero sense, since each image pixel is already the ground-truth of your CT scan and there's no concept of external contrast (unlike lighting conditions in photos). As well, sometimes whitening is applied to further normalize images.

3) Of course image size makes a difference. You need some minimal amount of resolution to identify the features you are interested in. For example, the images you posted possess complex structure of white and dark areas, along with detailed boundaries. If the details of those features are important then you need to maintain some minimal amount of resolution to detect them. In the medical example, a benign vs malignant cancerous growth can sometimes be distinguished by the regularity (waviness) of its boundary.

However image size drastically increases the time it takes to process the image. You'll only find out what a good resolution is by gradually scaling up your image until you achieve the accuracy you want. It would help to also make educated guesses about the minimal resolution. For example if the features you are interested in are on the order of 1/100 the scale of the image, then you should at least have a width and height of 100 pixels or more.

Best Answer

Related Solutions

Solved – How to calculate number of features based on image resolution

Solved – What might a correct approach be, to image preprocessing for CNN for these specialised images

Related Question