Solved – Normalization across columns in linear regression

multiple regressionnormalizationregression

I have a data set I would like to normalize in two different ways before building the multiple linear regression model. My data set looks as follows:

$$
x_{1} y_{1,1} y_{1,2}…y_{1,n-1}y_{1,n}$$
$$x_{2} y_{2,1} y_{2,2}…y_{2,n-1}y_{2,n}$$
$$… $$
$$x_{m} y_{m,1} y_{m,2}…y_{m,n-1}y_{m,n}
$$

…where each $x_{i}, y_{i,j}$ is a count, and each row $i$ represents a data set collected from a video with a variable length $k$.

To make it so that all the rows have values with equivalent meanings, I normalize each row by dividing all of the counts by $k$, the length of the video. Now, instead of counts, I have counts per minutes. I also want to normalize across each column (variable) to be from 0 to 1, with the idea that I can then compare the relative importance of each variables' coefficient to other variable coefficients.

I am wondering if this is even a valid normalization. Normalizing across each row is fine, but I'm having trouble figuring out whether normalizing across each column using a different normalization factor is valid. My instinct is that it isn't. If it is not valid, is there another way to achieve what I want with being able to relatively compare the importance of variables?

Best Answer

Generally, any linear transformations on columns do not have an influence on linear regression statistics. Any linear model can be treated as a collection of linear transformations over columns, such that the result is closest to the response. For example, let we have ordinal regression $y=a+b*x$. Normalizing of x results in $(x-min(x))/max(x)=1/max(x)*x-min(x)/max(x)$, and, substituting x in a regression with its normalized value gives new $y=a-min(x)/max(x)+b/max(x)*x$. The same is with multiple linear regression. So no any statistics is changing - only regression coefficients have another prespecified values. So you can perform normalizing without any cautions. The only point is to keep in mind the normalizing made when interpreting the model.