Solved – Cohen’s d from a linear regression model

Problem:

I have a multiple regression model with categorical/binary ($c_i$) and continuous ($x_i$) variables:

$ v = \alpha + \beta_1 x_1 + \beta_2 x_2 + … + \gamma_1 c_1 + \gamma_2 c_2 + $

I am interested in the categorical binary variable $c_1$. I want to compute the Cohen D of the two groups defined by $c_1$ controlling for the other variables.

What do I know so far:

There are some formulas on how to convert F to D (for example https://www.campbellcollaboration.org/media/k2/attachments/converting_between_effect_sizes.pdf pg 13) but they assume a ANCOVA model, that is a linear model with one continuous and one categorical variable. The formula uses the r.square of the covariate (the continuous variable) in relation to the dependent variable $v$.

The formula is:

$ d = \sqrt{\frac{(n1+n2)(1-r^2) F}{n1 n2}} $

This is the formula also used in the compute.es R package to convert effect sizes, but the reference they use to justify the formula Borenstein (2009). Effect sizes for continuous data. In H. Cooper, L. V. Hedges, & J. C. Valentine (Eds.), The handbook of research synthesis and meta analysis (pp. 279-293) also is explicit on the ANCOVA modeling (one categorical and one continuous variable.

I am not sure that the formula is valid for a multiple variable regression.

Furthermore I assume that the r.square would be of the whole regression without the categorical variable of interest ($c_1$).

Finally, the is the issue of how is the F computed. I assume that it is a Type II ANOVA, but again not 100% sure.

There was at least two similar questions on CV: Can I calculate Cohen's $d$ from multiple regression coefficient? , unanswered, and Cohen's d from regression coefficient? whose answer refer to the residual standard error which I am reasonable sure is not the correct answer to my case.

Thus my questions:

1) Does the formula holds for multiple variables regression?

2) Does the r.squared refer to the regression without the categorical variable of interest?

3) is the F calculated using a Type II ANOVA?

Best Answer

In your regression model, $c_1$ (an unstandardized regression coefficient for a 0/1 indicator variable) is an adjusted mean difference, adjusting for the other variables in the model. As such, you can use $c_1$ as the numerator in the Cohen's $d$ computation, such as: $$ d = \frac{B}{s_{pooled}} ~, $$ where $B$ is the coefficient $c_i$, and $s_{pooled}$ is: $$ s_{pooled} = \sqrt{ \frac{ s_1^2 \left( n_1-1 \right) + s_2^2 \left( n_2-1 \right) } { n_1 + n_2 - 2 } } ~ $$ The subscripts in the above are for the two groups that are part of $c_i$. The challenge is that unless you have the raw data (which you might but someone conducting a meta-analysis might not), these values are typically not reported for a complex regression model. However, $s_{pooled}^2$ is simply the overall variance for $y$, minus the variance attributed to the treatment effect (e.g., $c_i$). Thus, assuming you have both the overall standard deviation for $y$ and the sample sizes for the two groups created by the $c_i$ indicator variable, $s_{pooled$} can be computed as follows:

$$ s_{pooled} = \sqrt{ \frac{s_y^2(N-1) - B^2\left(\frac{n_1n_2}{n_1+n_2}\right) } {N-2}} ~. $$

Best Answer

Related Solutions

Solved – When should one use multiple regression with dumthe coding vs. ANCOVA

Solved – multiple continuous independent variables, single dependent var: which analysis

Related Question