Mediation analysis with a log-transformed mediator

assumptionslinearmediationrregression

The very basic framework for mediation analysis (as I understand it) is below (DV = dependent variable, IV = independent variable):

Step 1: DV ~ IV
Step 2: Mediator ~ IV
Step 3: DV ~ IV + Mediator – check if the effect of the IV is reduced or lost after controlling for the mediator

However, if the mediator has to be log-transformed in step 2 to improve normality of residuals, should it also be log-transformed in step 3 (bolded below)? I have been told yes by one mentor, as it is a carry-through of the same analysis. If it should be, it would look like below. In my case the DV also had to be log-transformed, so I’ll include that as well.

Step 1: log(DV) ~ IV
Step 2: log(Mediator) ~ IV
Step 3: log(DV) ~ IV + log(Mediator) ?

In the example above, the DV and Mediator were log-transformed in steps 1 and 2, respectively, to ensure normality of residuals in those models.

Happy to provide specific variable names and R code, but the question is a general one and may not need it.

Best Answer

First, the process you have outlined is the Baron and Kenny approach. It isn't wrong, but it's quite old-fashioned and more modern approaches are available. See e.g. McGill University

Second, I agree with Rhys. If you are going to transform, you have to do it in all steps. Otherwise, you have a mess. Maybe it won't violate any statistical "rules" but it will be very hard to interpret.

Finally, why transform to get to normal residuals? Instead, use a different kind of regression that does not assume normal residuals. Two of these are robust regression and quantile regression. I would only transform if it made substantive sense (this is often the case with monetary variables). When possible, don't fit the data to the model, fit the model to the data.

Related Solutions

Solved – How to predict with log transformed variable

There are two parts to this answer. I will consider the utility of transformations for these data. Then I will suggest a quite different analysis.

Transformation needed and useful here? No

I see no reason whatsoever to transform distance or indeed noise either.

There is no requirement that responses or predictors in regression follow a normal (Gaussian) distribution. As a thought experiment, imagine $x$ is uniform on the integers and $y$ is $a + bx$. Then $y$ is also uniform; any regression program will retrieve the linear relation and produce the best possible figures of merit. Is it a problem in any sense that neither variable is normally distributed? No.

Looking more closely at the data, here are some normal quantile plots of the original variables and of Senun's transform $\log(\text{dist} + 0.1)$. I find these immensely more useful than (e.g.) Kolmogorov-Smirnov or Shapiro-Wilk tests: they show not only how well data fit a normal but also in what ways they fall short.

The labelled values on the vertical axes are those of the five-number summary, minimum, lower quartile, median, upper quartile and maximum. In the case of the distances, there are five distinct values with equal frequency, so they are reported as just those distinct values.

Note. The quantile plots here include only minor variations on conventional axis labelling and titling, but anyone interested in the details, or in a Stata implementation, may consult this paper.

The distances are thus a distribution with 5 spikes and cannot get close to normal; any one-to-one transformation must yield another distribution with 5 spikes. If there were a problem with mild skewness, the chosen transformation makes it worse by flipping the skewness from positive to negative and increasing its magnitude. This is shown by calculation of both moments-based and L-moments-based measures. If there were a problem with mildly non-normal kurtosis (there isn't), the transformation leaves it a little closer to the normal state.

Those unfamiliar with, but interested by, L-moments should start with the Wikipedia entry and might like to know that the L-skewness $\tau_3$ is 0 for every symmetric distribution, including the normal, while the L-kurtosis $\tau_4 \approx$ 0.123 for the normal. This is Stata output using moments and lmoments from SSC: Stata uses that definition of kurtosis for which the normal yields 3. The first L-moment measures location and is identical to the mean; the second is a measure of scale. Location and scale detail is naturally just context here and not otherwise germane to discussing transformations.

----------------------------------------------------------------
         n = 60 |       mean          SD    skewness    kurtosis
----------------+-----------------------------------------------
           dist |     15.000      14.261       0.795       2.263
log(dist + 0.1) |      1.666       2.118      -1.113       2.758
            leq |     62.494       3.261      -0.192       1.979
----------------------------------------------------------------

----------------------------------------------------------------
         n = 60 |        l_1         l_2         t_3         t_4
----------------+-----------------------------------------------
           dist |     15.000       7.729       0.229       0.012
log(dist + 0.1) |      1.666       1.087      -0.301       0.084
            leq |     62.494       1.887      -0.057       0.027
----------------------------------------------------------------

Noise is close to normally distributed, so even anyone worried about non-normality should leave it alone.

That deals with the mistaken stance that the transformation here is a good idea because the marginal distribution of distance is not normal. There is no problem; if there were, the distribution being a set of spikes makes at least hard to solve; and in practice the chosen transformation makes the situation worse even on its own criteria.

I'll flag a further detail. The ad hoc constant $0.1$ added before taking logarithms minimally needs a rationale: the absence of a rationale makes the transformation even more unsatisfactory.

That still leaves scope for a transformation to make sense because the relationship on new scale(s) would be closer to linear (or, a much smaller deal, because scatter around the relationship would then be closer to equal).

Here the main evidence lies in the first instance in scatter plots. The plot in the question shows that the transformation just splits data into two groups, which doesn't seem physically or statistically sensible. The scatter plot below doesn't indicate to me that transformation would help, but it's more crucial to think what kind of model makes sense any way.

A different analysis

We need more physical thinking. There is no doubt a substantial literature here which is being ignored. As an amateur alternative arm-waving I postulate that noise is here noise locally raised by road noise above some background and should diminish more rapidly at first and then more slowly with distance from the road. In fact some such thought may lie behind the unequal spacing in the sample design. So, one model matching those ideas is $\text{noise} = \alpha + \beta \exp(\gamma\ \text{distance})$ where we expect $\alpha, \beta > 0$ and $\gamma < 0$. Such models are a little tricky to fit as nonlinear least squares is implied but I'd assert that they make more sense than any linear model implying constant slope.

I get $\alpha = 58.75, \beta = 7.3565, \gamma = -.06770$ using nl in Stata.

The larger variability of lower noise levels needs some discussion, but presumably quite different conditions may be found at equal distances from the road. Clearly, what is important is not the distance but what else is in the gap (e.g. buildings and other structures, uneven topography).

Solved – R Mediate: How to interpret output with an ordinal outcome

Here is the response I got: Our methods compute the estimates on the same scale as the outcome. That's why you are seeing an estimate for each ordinal category. I hope this makes sense. The details about the methods are in the papers available here: https://imai.princeton.edu/projects/mechanisms.html You might want to look at our APSR or PM papers.

I don't really see how to use the probabilities for each value of the factor to express the statistics that you usually see with mediation analysis, such as percent mediated. Maybe that has to be expressed as a probability. How to turn it into 1 number is not so clear to me. I have more work to do, clearly.

Best Answer

Related Solutions

Solved – How to predict with log transformed variable

Solved – R Mediate: How to interpret output with an ordinal outcome

Related Question