Solved – Need to transform data before running mediation/model with bootstrapping (PROCESS)

assumptionsbootstrapmediationnormality-assumptionregression

I am reading through Hayes' book on mediation and moderation analysis (2013) which describes the PROCESS macro he created to use bootstrapping in order to arrive to confidence intervals to check the model.

I am not clear on a point: does the data need to be normally distributed? In general, does it need to respond to "classic" regression assumptions?

Hayes describes the assumptions (linearity, normality, homoscedasticity, independence) and how some are more important than others and which might be the consequences if they're not met.
In particular, about normality he writes in bold: "Throughout this book, I assume you have contemplated the appropriateness of OLS regressions for your problem and have decided you are comfortable and want to forge ahead." (p. 55). Which sorts of means "there are different routs that you can follow and you have to decide, good luck!".

Then, in chapter 6 he says that if the indirect effect is significant (X predicts Y through M), we don't really need the single regressions to be significant (X predicts M, M predicts Y and X predicts Y) to validate the model. However, if they're not significant it might mean that there are some confounds, spurious associations or epiphenomenal associations.

* In conclusion, my question is: since bootstrapping doesn't assume normal distribution, and the indirect effect (measured with bootstrapping) is the main element to look at to see if the model holds, can the data inputted in this type of analysis be non-normally distributed?
If then one wants to check the single regressions to have a more complete idea about what's going on, does one have to re-run the model with transformed data?

(of course this question is fundamental in case significance changes using transformed vs. untransformed data)

Thanks!
Rebecca

Best Answer

The bootstrap shouldn’t care if the data is normally distributed. What bootstrapping basically does is to generate a custom distribution around your data and then test for the significance of your results against that custom-made distribution. If you are concerned about assumptions involving indirect effect, bootstrapping shouldn’t be terribly affected by it, but you might want to try using RMA regression rather than OLS regression in that case. Bootstrapping is just effective in both cases.

Related Question