Tal,
My understanding based on this article, is that you cannot obtain the individual variables related to each separate class when the class is a factor. However, using rpart.control, you are able to identify the variables that are important at each node.
http://cran.r-project.org/web/packages/caret/vignettes/caretVarImp.pdf
Recursive Partitioning: The reduction in the loss function (e.g. mean squared error)
attributed to each variable at each split is tabulated and the sum is returned. Also, since there may be candidate variables that are important but are not used in a split, the top competing variables are also tabulated at each split. This can be turned on using the maxcompete argument in rpart.control. This method does not currently provide class specific measures of importance when the response is a factor.
Edit: I misunderstood your question. There are two aspects:
a) na.omit
and na.exclude
both do casewise deletion with respect to both predictors and criterions. They only differ in that extractor functions like residuals()
or fitted()
will pad their output with NA
s for the omitted cases with na.exclude
, thus having an output of the same length as the input variables.
> N <- 20 # generate some data
> y1 <- rnorm(N, 175, 7) # criterion 1
> y2 <- rnorm(N, 30, 8) # criterion 2
> x <- 0.5*y1 - 0.3*y2 + rnorm(N, 0, 3) # predictor
> y1[c(1, 3, 5)] <- NA # some NA values
> y2[c(7, 9, 11)] <- NA # some other NA values
> Y <- cbind(y1, y2) # matrix for multivariate regression
> fitO <- lm(Y ~ x, na.action=na.omit) # fit with na.omit
> dim(residuals(fitO)) # use extractor function
[1] 14 2
> fitE <- lm(Y ~ x, na.action=na.exclude) # fit with na.exclude
> dim(residuals(fitE)) # use extractor function -> = N
[1] 20 2
> dim(fitE$residuals) # access residuals directly
[1] 14 2
b) The real issue is not with this difference between na.omit
and na.exclude
, you don't seem to want casewise deletion that takes criterion variables into account, which both do.
> X <- model.matrix(fitE) # design matrix
> dim(X) # casewise deletion -> only 14 complete cases
[1] 14 2
The regression results depend on the matrices $X^{+} = (X' X)^{-1} X'$ (pseudoinverse of design matrix $X$, coefficients $\hat{\beta} = X^{+} Y$) and the hat matrix $H = X X^{+}$, fitted values $\hat{Y} = H Y$). If you don't want casewise deletion, you need a different design matrix $X$ for each column of $Y$, so there's no way around fitting separate regressions for each criterion. You can try to avoid the overhead of lm()
by doing something along the lines of the following:
> Xf <- model.matrix(~ x) # full design matrix (all cases)
# function: manually calculate coefficients and fitted values for single criterion y
> getFit <- function(y) {
+ idx <- !is.na(y) # throw away NAs
+ Xsvd <- svd(Xf[idx , ]) # SVD decomposition of X
+ # get X+ but note: there might be better ways
+ Xplus <- tcrossprod(Xsvd$v %*% diag(Xsvd$d^(-2)) %*% t(Xsvd$v), Xf[idx, ])
+ list(coefs=(Xplus %*% y[idx]), yhat=(Xf[idx, ] %*% Xplus %*% y[idx]))
+ }
> res <- apply(Y, 2, getFit) # get fits for each column of Y
> res$y1$coefs
[,1]
(Intercept) 113.9398761
x 0.7601234
> res$y2$coefs
[,1]
(Intercept) 91.580505
x -0.805897
> coefficients(lm(y1 ~ x)) # compare with separate results from lm()
(Intercept) x
113.9398761 0.7601234
> coefficients(lm(y2 ~ x))
(Intercept) x
91.580505 -0.805897
Note that there might be numerically better ways to caculate $X^{+}$ and $H$, you could check a $QR$-decomposition instead. The SVD-approach is explained here on SE. I have not timed the above approach with big matrices $Y$ against actually using lm()
.
Best Answer
This is where the surrogate variables come in - for each split, observations where the split variable is missing are split based on the best surrogate variable, if that's missing by the next best and so on, this is detailed in:
The document is accessible through
rpart
help (pdf).