Solved – How to normalise data properly without a maximum / minimum constant

normalization

I use a stock screener for investing purposes. When I'm trying to filter stocks, I can use a factor like Price-to-Book (share price divided by book value per share), Price-to-Sales, etc to rank stocks and get the Top Decile.

I want to combine several factors with an equal weight, and for this I need to normalise these factors to values between 0-1. Why? If I did not normalise it, averaging a Price-to-Book Value of 12 (in a sample ranging from 12-16) with a Price-to-Sales of 3 (in a sample ranging from 1 to 4) would obviously hugely skew the average because the Price-to-book would lift the average enormously despite being at the lower end of the sample, while the Price-to-Sales would reduce it despite being at the higher end of the spectrum.

How do I currently normalise? If I want to normalise Price-to-Book (PB), I calculate, for every stock:

("PB of current stock" - minimum PB of sample) / ( max of sample - min of sample )

The minimum and maximum values of the sample need to be looked up manually by me in the dataset right before normalising, because they will be different in a month (stock's fundamentals change all the time) and for every calculation I can only access the characteristics of the current stock being looked at, I cannot ask for the average of the dataset or the min / max programmatically. So to properly normalise the data, I have to look up the minimum and maximum values every time I want to normalise.

How can I use a future-proof normalisation process where I do not need to lookup the max and min constants manually every time for every factor? Is there a mathematical way to do so? Or a smarter way of normalising the data without the need for a min / max constant?

Best Answer

An even smarter approach would be not to normalize at all, at least for combinations among companies. The issues you raise are inevitable when you deal with ratios. Also, you lose any information about scale when you take ratios (e.g., total sales is lost in price-to-sales). Instead of combining price-to-sales with book-to-sales for aggregating, find a way to use all of price, book, and sales as variables in your combinations among companies. For a simple example, combine among companies first, take ratios at the end.

If your data are necessarily positive, working on the logarithms of the values may help, as ratios in the original scale become differences in the log scale, and the results of additive regression models can then be interpreted as ratios at the end when you back-transform.

Related Question