I'm trying to understand logistic regression by training a classifier on the MNIST dataset (a list of hand written digits represented as a list of pixel intensities).

I read about feature normalization (https://en.m.wikipedia.org/wiki/Feature_scaling) but I'm not sure how to apply it to my problem on hand. The training data looks like this:

P1, P2, P3,  ... P748
0,  0,  180, ... 240
0,  50, 150, ... 0
0,  0,  0,   ... 108

So each row describes a separate image, and each column represents the same pixel (P1 is the pixel in the upper left corner of the image, P2 is the next pixel to the right, etc.)

Question 1

When normalizing the data, do I normalize each instance (where min and max refer to the values within that row) or do I normalize each feature across the entire training dataset (where min and max of P1 refers to the values within every last training example – potentially many dozens of thousands of values)?

Question 2

After the classifier is trained with normalized data, what do I do with a new data sample that I want to run through the classifier? Do I normalize every feature against each other (where min and max refer to values across P1 – P748 within a single instance)?

Question 1

The general answer is to normalize each feature (i.e. each column) based on the values within that column in the training data set. Think of the case where variables are very different in scale, e.g. age and income. Normalization can be done by feature scaling using the maximum and minimum value of the column or the standard/z score substracting the mean and dividing by the standard deviation.

For images, there is another dimension because each column measures pixel intensity on the same scale. In order to improve contrast within the image and normalize it over the set of images, you can normalize the pixel intensity for each image, i.e. for each row. This will smooth over variation in the lighting of the images. You can also normalize using the maximum and minimum value of the full data set, as you suggest. This approach will preserve differences between images.

What does that mean in your case? You want to smooth over differences in the intensity of the number lines, so that numbers written in grey and black look the same after normalization. That is achieved by normalization over each image (row). After that you could follow the standard approach to normalize the values for each pixel position (column). For a long explanation see this wiki.

Question 2

Generally, the feature (column) values of a new observation will have to be normalized based on the values calculated from the training set. This applies also to data in the test set to avoid mixing information from the training and test set. For row-wise (within image) normalization, you normalize based on the values of the new image.

Solved – Feature normalization training dataset

Question 1

Question 2

Best Answer

Question 1

Question 2

Related Question

Question 1

Question 2

Best Answer

Question 1

Question 2

Related Solutions

Cross-Validation – The Importance of Normalization Prior to Cross-Validation

Solved – Do I apply normalization per entire dataset, per input vector or per feature

Related Question