If data are MCAR, one would like to find an unbiased estimated of alpha. This could possibly be done via multiple imputation or listwise deletion. However, the latter might lead to severe loss of data. A third way is something like pairwise deletion which is implemented via an na.rm
option in alpha()
of the ltm
package and in cronbach.alpha()
of the psych
package.
At least IMHO, the former estimate of unstandardized alpha with missing data is biased (see below). This is due to the calculation of the total variance $\sigma^2_x$ via var(rowSums(dat, na.rm = TRUE))
. If the data are centered around 0, positive and negative values cancel each other out in the calculation of rowSums
. With missing data, this leads to a bias of rowSums
towards 0 and therefore to an underestimation of $\sigma^2_x$ (and alpha, in turn). Contrarily, if the data are mostly positive (or negative), missings will lead to a bias of rowSums
towards zero this time resulting in an overestimation of $\sigma^2_x$ (and alpha, in turn).
require("MASS"); require("ltm"); require("psych")
n <- 10000
it <- 20
V <- matrix(.4, ncol = it, nrow = it)
diag(V) <- 1
dat <- mvrnorm(n, rep(0, it), V) # mean of 0!!!
p <- c(0, .1, .2, .3)
names(p) <- paste("% miss=", p, sep="")
cols <- c("alpha.ltm", "var.tot.ltm", "alpha.psych", "var.tot.psych")
names(cols) <- cols
res <- matrix(nrow = length(p), ncol = length(cols), dimnames = list(names(p), names(cols)))
for(i in 1:length(p)){
m1 <- matrix(rbinom(n * it, 1, p[i]), nrow = n, ncol = it)
dat1 <- dat
dat1[m1 == 1] <- NA
res[i, 1] <- cronbach.alpha(dat1, standardized = FALSE, na.rm = TRUE)$alpha
res[i, 2] <- var(rowSums(dat1, na.rm = TRUE))
res[i, 3] <- alpha(as.data.frame(dat1), na.rm = TRUE)$total[[1]]
res[i, 4] <- sum(cov(dat1, use = "pairwise"))
}
round(res, 2)
## alpha.ltm var.tot.ltm alpha.psych var.tot.psych
## % miss=0 0.93 168.35 0.93 168.35
## % miss=0.1 0.90 138.21 0.93 168.32
## % miss=0.2 0.86 110.34 0.93 167.88
## % miss=0.3 0.81 86.26 0.93 167.41
dat <- mvrnorm(n, rep(10, it), V) # this time, mean of 10!!!
for(i in 1:length(p)){
m1 <- matrix(rbinom(n * it, 1, p[i]), nrow = n, ncol = it)
dat1 <- dat
dat1[m1 == 1] <- NA
res[i, 1] <- cronbach.alpha(dat1, standardized = FALSE, na.rm = TRUE)$alpha
res[i, 2] <- var(rowSums(dat1, na.rm = TRUE))
res[i, 3] <- alpha(as.data.frame(dat1), na.rm = TRUE)$total[[1]]
res[i, 4] <- sum(cov(dat1, use = "pairwise"))
}
round(res, 2)
## alpha.ltm var.tot.ltm alpha.psych var.tot.psych
## % miss=0 0.93 168.31 0.93 168.31
## % miss=0.1 0.99 316.27 0.93 168.60
## % miss=0.2 1.00 430.78 0.93 167.61
## % miss=0.3 1.01 511.30 0.93 167.43
Best Answer
Listwise deletion deletes cases when any variable is missing.
Pairwise deletion only deletes cases when one of the variables in the particular model you are evaluating is missing.
One way to compare is with a correlation matrix of a set of variables that have different missing patterns. With listwise deletion, N will be the same for every correlation - it will be the number of complete cases. With pairwise deletion, N will vary. It will be the number of cases where both variables in the correlation are present (but other variables may be missing).