Solved – does it make sense for non-negative data to subtract the mean and divide by the std dev

image processingmeannormalizationstandard deviation

It is a very usual procedure to subtract the mean and divide by the standard deviation in a set of data. If we deal with non-negative data, i.e. image, (in [0,1] or [0,255]), does this procedure make sense? Violating the non-negative constraint, what happens?

I add some further considerations.

Suppose you have an image and you decompose it in a set of overlapping patches. Why should you subtract the mean and divide by the std dev for each patch (violating the non-neg prior)?

This procedure is also used in dictionary learning and sparse coding. In dictionary learning, given an image ($y$), a standard approach is dividing it into a set of patches ($p$), then subtract the mean ($p_m$) and dividing by the std deviation ($p_s$).

Is it a crucial step if data are non-negative?

Best Answer

First of all, there have been several questions on standardization already, e.g.

Subtracting the mean is one way of centering your data: The average becomes the new origin in the "point cloud description" of the data (each case is a point in $p$ dimensions, for RGB images, $p = 3$). Properly centered data can lead to numerically more stable models, and centering may also help in the interpretation of data and models: it sets a "baseline", and the centered data records deviations from this.
Whether this is a sensible idea depends on your data: for some data it does make sense, for other data another center may be more appropriate, yet other data sets do already have a useful center. E.g. in the example of star photographs, you may want to find out the average background color and subtract that.

Dividing by the standard deviation (or the variance) standardizes the data. This can be useful to achieve equal weights for all input channels in the subsequent data analysis. In other cases, is is not sensible. The latter may very well be the case for your data: your variates already share their physical unit. However, you may want to calibrate them to correct the wavelength dependence of the camera's sensitivity (whitelight correction).

You may also want to adjust all channels together: that would be adjusting contrast and brightness, which are also a way to center and standardize.