From what I understood, standardized coefficients can be used as indices of effect size (with the possibility of using rules of thumb such as Cohen's 1988). I also understood that standardized coefs are expressed in terms of standard deviation, which makes them relatively close to a Cohen's d.
I also understood that one way of obtaining standardized coefs is to standardize the data beforehand. Another is to use the std.coef
function from the MuMIn
package.
These two methods are equivalent when using a linear predictor:
library(tidyverse)
library(MuMIn) # For stds coefs
df <- iris %>%
select(Sepal.Length, Sepal.Width) %>%
scale() %>%
as.data.frame() %>%
mutate(Species = iris$Species)
fit <- lm(Sepal.Length ~ Sepal.Width, data=df)
round(coef(fit), 2)
round(MuMIn::std.coef(fit, partial.sd = TRUE), 2)
In both cases, the coefficient is -0.12. I interpret it as follows: for each increase of 1 standard deviation of Sepal.Width, Sepal.Length diminishes of 0.12 of its SD.
And yet, these two methods give different results with a categorical predictor:
fit <- lm(Sepal.Length ~ Species, data=df)
round(coef(fit), 2)
round(MuMIn::std.coef(fit, partial.sd = TRUE), 2)
Which gives, for the effect of versicolor as compared to setosa (the intercept), 1.12 and 0.46.
Which should I believe to be able to say "the difference between versicolor and setosa is … of Sepal.Length's SD"? Thanks a lot
Best Answer
First, recall that categorical variables are silently recoded into dummies in
lm
function.MuMIn
package standardizes these dummy variables (which is straightforward since they contain only 0's and 1's).On the other hand, you did not standardize them when creating your
df
object.This is why you get two different results.
To find difference between setosa and versicolor in units of Sepal Length SD, you need to standardize only Sepal Length, which is exactly what you did in your
df
object.