Regression – Implementing the Two Step Approach in Linear Regression Analysis

linearmeanregressionspsstwo-way

I have a continuous variable which is not normally distributed i want to transform it to normal using the two step approach method in the link below:

Abstract This article describes and demonstrates a two-step approach
for transforming non-normally distributed continuous variables to
become normally distributed. Step 1 involves transforming the variable
into a percentile rank, which will result in uniformly distributed
probabilities. The second step applies the inverse-normal
transformation to the results of Step 1 to form a variable consisting
of normally distributed z-scores. The approach is little-known outside
the statistics literature, has been scarcely used in the social
sciences, and has not been used in any IS study. The article
illustrates how to implement the approach in Excel, SPSS, and SAS and
explains implications and recommendations for IS research.

https://aisel.aisnet.org/cais/vol28/iss1/4/

Is this method going to reorder the the observations?

Best Answer

First, as noticed in the comment by Noah, almost never in statistics do you need to "transform" the data to be normally distributed. That is not the case for linear regression, nor for most of the other statistical methods. In the comment, you say that you are doing that for the residuals to be normally distributed. This approach would not make them normally distributed because it changes the marginal distribution, while in the case of linear regression we are talking about the conditional distribution. There is no simple way how you could "transform" the data to meet this assumption and the solution would be problem-specific.

Answering your main question: the method is called equipercentile equating and uses the standard normal quantile function to transform the empirical quantiles. It does not change anything about the ordering of the observations, it just transforms the values. The transformation is as follows:

  • for each value, $x_i$ find it's rank $r_i$, i.e. the index of this value if $x_i$ values were sorted in increasing order,
  • transform the ranks to quantile ranks by dividing the ranks by sample size $q_i = r_i / N$,
  • use standard normal quantile function $\Phi^{-1}$ to calculate the $z$-scoares $z_i = \Phi^{-1}(q_i)$.

As you can see, this doesn't change anything about the ordering of the observations. The order would only change if you actually sorted the $x_i$ values, but this is not needed.