Solved – How to apply pairwise deletion for missing values

correlationfeature selectionmissing data

This method is always explained with listwise deletion which is very easy to understand. But I can't understand pairwise. Google didn't help as everywhere there is only definiton of it.

Please explain this by taking any toy example.

Best Answer

Listwise deletion deletes cases when any variable is missing.

Pairwise deletion only deletes cases when one of the variables in the particular model you are evaluating is missing.

One way to compare is with a correlation matrix of a set of variables that have different missing patterns. With listwise deletion, N will be the same for every correlation - it will be the number of complete cases. With pairwise deletion, N will vary. It will be the number of cases where both variables in the correlation are present (but other variables may be missing).

Related Solutions

Solved – Cronbach’s Alpha with missing data

If data are MCAR, one would like to find an unbiased estimated of alpha. This could possibly be done via multiple imputation or listwise deletion. However, the latter might lead to severe loss of data. A third way is something like pairwise deletion which is implemented via an na.rm option in alpha() of the ltm package and in cronbach.alpha() of the psych package.

At least IMHO, the former estimate of unstandardized alpha with missing data is biased (see below). This is due to the calculation of the total variance $\sigma^2_x$ via var(rowSums(dat, na.rm = TRUE)). If the data are centered around 0, positive and negative values cancel each other out in the calculation of rowSums. With missing data, this leads to a bias of rowSums towards 0 and therefore to an underestimation of $\sigma^2_x$ (and alpha, in turn). Contrarily, if the data are mostly positive (or negative), missings will lead to a bias of rowSums towards zero this time resulting in an overestimation of $\sigma^2_x$ (and alpha, in turn).

require("MASS"); require("ltm"); require("psych")
n <- 10000
it <- 20
V <- matrix(.4, ncol = it, nrow = it)
diag(V) <- 1
dat <- mvrnorm(n, rep(0, it), V)  # mean of 0!!!
p <- c(0, .1, .2, .3)
names(p) <- paste("% miss=", p, sep="")
cols <- c("alpha.ltm", "var.tot.ltm", "alpha.psych", "var.tot.psych")
names(cols) <- cols
res <- matrix(nrow = length(p), ncol = length(cols), dimnames = list(names(p), names(cols)))
for(i in 1:length(p)){
  m1 <- matrix(rbinom(n * it, 1, p[i]), nrow = n, ncol = it)
  dat1 <- dat
  dat1[m1 == 1] <- NA
  res[i, 1] <- cronbach.alpha(dat1, standardized = FALSE, na.rm = TRUE)$alpha
      res[i, 2] <- var(rowSums(dat1, na.rm = TRUE))
      res[i, 3] <- alpha(as.data.frame(dat1), na.rm = TRUE)$total[[1]]
  res[i, 4] <- sum(cov(dat1, use = "pairwise"))
}
round(res, 2)
##            alpha.ltm var.tot.ltm alpha.psych var.tot.psych
## % miss=0        0.93      168.35        0.93        168.35
## % miss=0.1      0.90      138.21        0.93        168.32
## % miss=0.2      0.86      110.34        0.93        167.88
## % miss=0.3      0.81       86.26        0.93        167.41
dat <- mvrnorm(n, rep(10, it), V)  # this time, mean of 10!!!
for(i in 1:length(p)){
  m1 <- matrix(rbinom(n * it, 1, p[i]), nrow = n, ncol = it)
  dat1 <- dat
  dat1[m1 == 1] <- NA
  res[i, 1] <- cronbach.alpha(dat1, standardized = FALSE, na.rm = TRUE)$alpha
      res[i, 2] <- var(rowSums(dat1, na.rm = TRUE))
      res[i, 3] <- alpha(as.data.frame(dat1), na.rm = TRUE)$total[[1]]
  res[i, 4] <- sum(cov(dat1, use = "pairwise"))
}
round(res, 2)
##            alpha.ltm var.tot.ltm alpha.psych var.tot.psych
## % miss=0        0.93      168.31        0.93        168.31
## % miss=0.1      0.99      316.27        0.93        168.60
## % miss=0.2      1.00      430.78        0.93        167.61
## % miss=0.3      1.01      511.30        0.93        167.43

Solved – Fit multiple regression model with pairwise deletion (or on a correlation/covariance matrix) in R

The regtools package was just released and allows pairwise deletion in multiple regression and principal components analysis. The package is introduced in this post, and is accompanied by a JSM article that compares pairwise deletion (aka available cases), with listwise deletion and multiple imputation.

Best Answer

Related Solutions

Solved – Cronbach’s Alpha with missing data

Solved – Fit multiple regression model with pairwise deletion (or on a correlation/covariance matrix) in R

Related Question