Solved – Why I got different variance-covariance matrices for different subjects from getVarCov function from R nlme package

covariance-matrixgeneralized-least-squareslme4-nlmer

I fit a linear model using generalized least squares with gls {nlme} function in R. Then I use a getVarCov {nlme} function to extract the variance-covariance matrix from a fitted model. To do so, I define an individual for which to retrieve this variance-covariance matrix (thus the function does work).

I am now confused why these matrices are different from subject to subject (please see the reproducible example below).

  • My understanding is that here: weights = varIdent(form = ~ 1 | factor(time)) I allow for errors different from homoscedastic and thus variance-covariance matrix will possibly have different values on diagonal (in other words: possibly different estimated variance for each factor(time) level), but I wouldn't expect different matrices for each individual. Could somebody clarify to me why does it work like this?

Reproducible example:

Step 1.

# Build data frame
ozone.ID <- c(11,12,13,14,15,16,17,18,19,20,11,12,13,14,15,16,17,18,19,20,11,12,13,14,15,16,17,18,19,20,11,12,13,14,15,16,
              17,18,19,20,11,12,13,14,15,16,17,18,19,20,11,12,13,14,15,16,17,18,19,20,11,12,13,14,15,16,17,18,19,20)
ozone.time <- c("1","1","1","1","1","1","1","1","1","1","2","2","2","2","2","2","2","2","2","2","3","3","3","3","3","3","3",
                "3","3","3","4","4","4","4","4","4","4","4","4","4","5","5","5","5","5","5","5","5","5","5","6","6","6","6",
                "6","6","6","6","6","6","7","7","7","7","7","7","7","7","7","7")
ozone.FEV1 <- c(4132,4458,3730,4365,4599,4611,3853,4355,4339,4175,4124,4370,3645,4397,4599,4685,3636,4120,4155,3834,4027,
                4090,3784,4338,4533,4685,3636,4039,4134,3734,3678,4328,3829,4412,4715,4647,3540,3928,4092,3362,3690,4283,
                3664,4483,4647,4647,2644,4042,3939,3561,3417,4276,3573,4393,4707,4472,2547,3923,3578,3056,3590,4322,3483,
                4516,4641,4342,2874,3935,3319,2189)
ozone.long <- data.frame(ID = ozone.ID, time = ozone.time, FEV1 = ozone.FEV1)

Step 2.

# Construct a model 
library(nlme)
ozone.fit.nostruct <- 
  gls(FEV1 ~ 0 + factor(time), 
      correlation = corSymm(form = ~ 1 | ID),
      weights = varIdent(form = ~ 1 | factor(time)),
      data = ozone.long)

Step 3.

Individual with ID "11":

# Extract the variance-covariance matrix from a fitted model
> getVarCov(ozone.fit.nostruct, individual = "11")
Marginal variance covariance matrix
       [,1]   [,2]   [,3]   [,4]   [,5]   [,6]   [,7]
[1,]  85972 128330 197160  77457 169200  85555 115870
[2,] 128330 216020 350520 141390 288130 151810 230620
[3,] 197160 350520 618990 247310 483790 255730 390870
[4,]  77457 141390 247310 116200 205010 116650 189280
[5,] 169200 288130 483790 205010 470920 245370 341750
[6,]  85555 151810 255730 116650 245370 135280 205510
[7,] 115870 230620 390870 189280 341750 205510 375620
  Standard Deviations: 293.21 464.78 786.76 340.88 686.24 367.8 612.87

Individual with ID "12":

# Extract the variance-covariance matrix from a fitted model
> getVarCov(ozone.fit.nostruct, individual = "12")
Marginal variance covariance matrix
       [,1]   [,2]   [,3]   [,4]   [,5]   [,6]   [,7]
[1,] 135280 212270  92170 132480 243330  99464 162750
[2,] 212270 375620 172250 254220 435590 185530 340510
[3,]  92170 172250  85972 125670 206710  88329 163110
[4,] 132480 254220 125670 216020 320470 147400 288980
[5,] 243330 435590 206710 320470 618990 260720 438710
[6,]  99464 185530  88329 147400 260720 116200 213270
[7,] 162750 340510 163110 288980 438710 213270 470920
  Standard Deviations: 367.8 612.87 293.21 464.78 786.76 340.88 686.24 

Update 2.:

> devtools::session_info()
Session info ------------------------------------------------------------------------
 setting  value                       
 version  R version 3.2.3 (2015-12-10)
 system   x86_64, darwin13.4.0        
 ui       RStudio (0.99.491)          
 language (EN)                        
 collate  en_US.UTF-8                 
 tz       America/Indiana/Indianapolis
 date     2016-02-16                  

Packages ----------------------------------------------------------------------------
 package    * version date       source        
 assertthat   0.1     2013-12-06 CRAN (R 3.2.0)
 colorspace   1.2-6   2015-03-11 CRAN (R 3.2.0)
 corrplot   * 0.73    2013-10-15 CRAN (R 3.2.0)
 curl       * 0.9.5   2016-01-23 CRAN (R 3.2.3)
 cvTools      0.3.2   2012-05-14 CRAN (R 3.2.0)
 DBI          0.3.1   2014-09-24 CRAN (R 3.2.0)
 DEoptimR     1.0-4   2015-10-23 CRAN (R 3.2.0)
 devtools   * 1.10.0  2016-01-23 CRAN (R 3.2.3)
 digest       0.6.9   2016-01-08 CRAN (R 3.2.3)
 dplyr      * 0.4.3   2015-09-01 CRAN (R 3.2.0)
 gdata      * 2.17.0  2015-07-04 CRAN (R 3.2.0)
 ggplot2    * 2.0.0   2015-12-18 CRAN (R 3.2.3)
 gtable       0.1.2   2012-12-05 CRAN (R 3.2.0)
 gtools       3.5.0   2015-05-29 CRAN (R 3.2.0)
 htmltools    0.3     2015-12-29 CRAN (R 3.2.3)
 knitr      * 1.12    2016-01-07 CRAN (R 3.2.3)
 labeling     0.3     2014-08-23 CRAN (R 3.2.0)
 lattice    * 0.20-33 2015-07-14 CRAN (R 3.2.3)
 lazyeval     0.1.10  2015-01-02 CRAN (R 3.2.0)
 magrittr     1.5     2014-11-22 CRAN (R 3.2.0)
 memoise      1.0.0   2016-01-29 CRAN (R 3.2.3)
 munsell      0.4.2   2013-07-11 CRAN (R 3.2.0)
 nlme       * 3.1-122 2015-08-19 CRAN (R 3.2.3)
 plyr       * 1.8.3   2015-06-12 CRAN (R 3.2.0)
 R6           2.1.1   2015-08-19 CRAN (R 3.2.0)
 Rcpp         0.12.3  2016-01-10 CRAN (R 3.2.3)
 reshape2   * 1.4.1   2014-12-06 CRAN (R 3.2.0)
 rmarkdown    0.9.2   2016-01-01 CRAN (R 3.2.3)
 Rmisc      * 1.5     2013-10-22 CRAN (R 3.2.0)
 robustbase   0.92-5  2015-07-22 CRAN (R 3.2.0)
 scales       0.3.0   2015-08-25 CRAN (R 3.2.0)
 stringi      1.0-1   2015-10-22 CRAN (R 3.2.0)
 stringr    * 1.0.0   2015-04-30 CRAN (R 3.2.0)
 yaml         2.1.13  2014-06-12 CRAN (R 3.2.0)

Best Answer

I have searched for code of getVarCov().gls and there it is:

getS3method("getVarCov","gls")

function (obj, individual = 1, ...) {
S <- corMatrix(obj$modelStruct$corStruct)[[individual]]
if (!is.null(obj$modelStruct$varStruct)) {
    ind <- obj$groups == individual
        vw <- 1/varWeights(obj$modelStruct$varStruct)[ind]
    }
    else vw <- rep(1, nrow(S))
    vars <- (obj$sigma * vw)^2
result <- t(S * sqrt(vars)) * sqrt(vars)
class(result) <- c("marginal", "VarCov")
attr(result, "group.levels") <- names(obj$groups)
result
}

It seems that getVarCov() for gls has already "individual" set to 1, so in if we will get that "ind" is a vector with all false, so "vw" is empty and it causes problems ahead.

When I set ozone.ID to c(1,2,3,4,5,6,7,8,9,0, ...) and do then getVarCov(ozone.fit.nostruct) get me what you have for getVarCov(ozone.fit.nostruct, individual = "11")

UPDATE:

I backtracked some more and I finally know where is your problem.

Above I wrote "individual" is already "hardcoded" so it's needed for getVarCov().

So let individual be "11" or "12" and do this loops

a <- c()
for(i in 1:length(ozone.ID)){
if(ozone.ID[i] == "11"){a <- c(a,i)}
}

b <- c()
for(i in 1:length(ozone.ID)){
if(ozone.ID[i] == "12"){b <- c(b,i)}
}

Now we can compare this:

ozone.time[a] == ozone.time[b]

and it's false, so for each individual we get a different order

What it does with function:

Take a look at:

ind <- obj$groups == individual
vw <- 1/varWeights(obj$modelStruct$varStruct)[ind]
vars <- (obj$sigma * vw)^2
result <- t(S * sqrt(vars)) * sqrt(vars)

ind is an boolean vector and it has "TRUE" values when function get the object with the same individual.

Next is vw:

vw <- 1/varWeights(obj$modelStruct$varStruct)[ind]
a <- 1/varWeights(obj$modelStruct$varStruct)

1/varWeights(...) function output is modified var weights, so it gives a vector with 7 values repeated 10 times

b <- a[ind]

ind have one true value in k place where k is from 1 to 10 and then for every ten values six times, but vw is repeating 7 values, so we get:

k + 10a (mod 7) ≡ k + 3a (mod 7)

Let k be equal to 1 or 2, then:

1 + 3a (mod 7) => (1,4,7,3,6,2,5) order of var weights
2 + 3a (mod 7) => (2,5,1,4,7,3,6) order of var weights

It indicates that "vw" for each individual will have the same values but in different orders, but R doesn't see it and do the rest of calculations, so:

vars <- (obj$sigma * vw)^2
result <- t(S * sqrt(vars)) * sqrt(vars)

"vw" have different orders from each individual, so it indicates for "vars" too and then in "result" we are multiplying the matrix "S" with different vectors, because of it possible orders

I think that problem will disappear when you sort "ID" differently

Related Question