Solved – Why I got different variance-covariance matrices for different subjects from getVarCov function from R nlme package

covariance-matrixgeneralized-least-squareslme4-nlmer

I fit a linear model using generalized least squares with gls {nlme} function in R. Then I use a getVarCov {nlme} function to extract the variance-covariance matrix from a fitted model. To do so, I define an individual for which to retrieve this variance-covariance matrix (thus the function does work).

I am now confused why these matrices are different from subject to subject (please see the reproducible example below).

My understanding is that here: weights = varIdent(form = ~ 1 | factor(time)) I allow for errors different from homoscedastic and thus variance-covariance matrix will possibly have different values on diagonal (in other words: possibly different estimated variance for each factor(time) level), but I wouldn't expect different matrices for each individual. Could somebody clarify to me why does it work like this?

Reproducible example:

Step 1.

# Build data frame
ozone.ID <- c(11,12,13,14,15,16,17,18,19,20,11,12,13,14,15,16,17,18,19,20,11,12,13,14,15,16,17,18,19,20,11,12,13,14,15,16,
              17,18,19,20,11,12,13,14,15,16,17,18,19,20,11,12,13,14,15,16,17,18,19,20,11,12,13,14,15,16,17,18,19,20)
ozone.time <- c("1","1","1","1","1","1","1","1","1","1","2","2","2","2","2","2","2","2","2","2","3","3","3","3","3","3","3",
                "3","3","3","4","4","4","4","4","4","4","4","4","4","5","5","5","5","5","5","5","5","5","5","6","6","6","6",
                "6","6","6","6","6","6","7","7","7","7","7","7","7","7","7","7")
ozone.FEV1 <- c(4132,4458,3730,4365,4599,4611,3853,4355,4339,4175,4124,4370,3645,4397,4599,4685,3636,4120,4155,3834,4027,
                4090,3784,4338,4533,4685,3636,4039,4134,3734,3678,4328,3829,4412,4715,4647,3540,3928,4092,3362,3690,4283,
                3664,4483,4647,4647,2644,4042,3939,3561,3417,4276,3573,4393,4707,4472,2547,3923,3578,3056,3590,4322,3483,
                4516,4641,4342,2874,3935,3319,2189)
ozone.long <- data.frame(ID = ozone.ID, time = ozone.time, FEV1 = ozone.FEV1)

Step 2.

# Construct a model 
library(nlme)
ozone.fit.nostruct <- 
  gls(FEV1 ~ 0 + factor(time), 
      correlation = corSymm(form = ~ 1 | ID),
      weights = varIdent(form = ~ 1 | factor(time)),
      data = ozone.long)

Step 3.

Individual with ID "11":

# Extract the variance-covariance matrix from a fitted model
> getVarCov(ozone.fit.nostruct, individual = "11")
Marginal variance covariance matrix
       [,1]   [,2]   [,3]   [,4]   [,5]   [,6]   [,7]
[1,]  85972 128330 197160  77457 169200  85555 115870
[2,] 128330 216020 350520 141390 288130 151810 230620
[3,] 197160 350520 618990 247310 483790 255730 390870
[4,]  77457 141390 247310 116200 205010 116650 189280
[5,] 169200 288130 483790 205010 470920 245370 341750
[6,]  85555 151810 255730 116650 245370 135280 205510
[7,] 115870 230620 390870 189280 341750 205510 375620
  Standard Deviations: 293.21 464.78 786.76 340.88 686.24 367.8 612.87

Individual with ID "12":

# Extract the variance-covariance matrix from a fitted model
> getVarCov(ozone.fit.nostruct, individual = "12")
Marginal variance covariance matrix
       [,1]   [,2]   [,3]   [,4]   [,5]   [,6]   [,7]
[1,] 135280 212270  92170 132480 243330  99464 162750
[2,] 212270 375620 172250 254220 435590 185530 340510
[3,]  92170 172250  85972 125670 206710  88329 163110
[4,] 132480 254220 125670 216020 320470 147400 288980
[5,] 243330 435590 206710 320470 618990 260720 438710
[6,]  99464 185530  88329 147400 260720 116200 213270
[7,] 162750 340510 163110 288980 438710 213270 470920
  Standard Deviations: 367.8 612.87 293.21 464.78 786.76 340.88 686.24

Update 2.:

> devtools::session_info()
Session info ------------------------------------------------------------------------
 setting  value                       
 version  R version 3.2.3 (2015-12-10)
 system   x86_64, darwin13.4.0        
 ui       RStudio (0.99.491)          
 language (EN)                        
 collate  en_US.UTF-8                 
 tz       America/Indiana/Indianapolis
 date     2016-02-16                  

Packages ----------------------------------------------------------------------------
 package    * version date       source        
 assertthat   0.1     2013-12-06 CRAN (R 3.2.0)
 colorspace   1.2-6   2015-03-11 CRAN (R 3.2.0)
 corrplot   * 0.73    2013-10-15 CRAN (R 3.2.0)
 curl       * 0.9.5   2016-01-23 CRAN (R 3.2.3)
 cvTools      0.3.2   2012-05-14 CRAN (R 3.2.0)
 DBI          0.3.1   2014-09-24 CRAN (R 3.2.0)
 DEoptimR     1.0-4   2015-10-23 CRAN (R 3.2.0)
 devtools   * 1.10.0  2016-01-23 CRAN (R 3.2.3)
 digest       0.6.9   2016-01-08 CRAN (R 3.2.3)
 dplyr      * 0.4.3   2015-09-01 CRAN (R 3.2.0)
 gdata      * 2.17.0  2015-07-04 CRAN (R 3.2.0)
 ggplot2    * 2.0.0   2015-12-18 CRAN (R 3.2.3)
 gtable       0.1.2   2012-12-05 CRAN (R 3.2.0)
 gtools       3.5.0   2015-05-29 CRAN (R 3.2.0)
 htmltools    0.3     2015-12-29 CRAN (R 3.2.3)
 knitr      * 1.12    2016-01-07 CRAN (R 3.2.3)
 labeling     0.3     2014-08-23 CRAN (R 3.2.0)
 lattice    * 0.20-33 2015-07-14 CRAN (R 3.2.3)
 lazyeval     0.1.10  2015-01-02 CRAN (R 3.2.0)
 magrittr     1.5     2014-11-22 CRAN (R 3.2.0)
 memoise      1.0.0   2016-01-29 CRAN (R 3.2.3)
 munsell      0.4.2   2013-07-11 CRAN (R 3.2.0)
 nlme       * 3.1-122 2015-08-19 CRAN (R 3.2.3)
 plyr       * 1.8.3   2015-06-12 CRAN (R 3.2.0)
 R6           2.1.1   2015-08-19 CRAN (R 3.2.0)
 Rcpp         0.12.3  2016-01-10 CRAN (R 3.2.3)
 reshape2   * 1.4.1   2014-12-06 CRAN (R 3.2.0)
 rmarkdown    0.9.2   2016-01-01 CRAN (R 3.2.3)
 Rmisc      * 1.5     2013-10-22 CRAN (R 3.2.0)
 robustbase   0.92-5  2015-07-22 CRAN (R 3.2.0)
 scales       0.3.0   2015-08-25 CRAN (R 3.2.0)
 stringi      1.0-1   2015-10-22 CRAN (R 3.2.0)
 stringr    * 1.0.0   2015-04-30 CRAN (R 3.2.0)
 yaml         2.1.13  2014-06-12 CRAN (R 3.2.0)

Best Answer

I have searched for code of getVarCov().gls and there it is:

getS3method("getVarCov","gls")

function (obj, individual = 1, ...) {
S <- corMatrix(obj$modelStruct$corStruct)[[individual]]
if (!is.null(obj$modelStruct$varStruct)) {
    ind <- obj$groups == individual
        vw <- 1/varWeights(obj$modelStruct$varStruct)[ind]
    }
    else vw <- rep(1, nrow(S))
    vars <- (obj$sigma * vw)^2
result <- t(S * sqrt(vars)) * sqrt(vars)
class(result) <- c("marginal", "VarCov")
attr(result, "group.levels") <- names(obj$groups)
result
}

It seems that getVarCov() for gls has already "individual" set to 1, so in if we will get that "ind" is a vector with all false, so "vw" is empty and it causes problems ahead.

When I set ozone.ID to c(1,2,3,4,5,6,7,8,9,0, ...) and do then getVarCov(ozone.fit.nostruct) get me what you have for getVarCov(ozone.fit.nostruct, individual = "11")

UPDATE:

I backtracked some more and I finally know where is your problem.

Above I wrote "individual" is already "hardcoded" so it's needed for getVarCov().

So let individual be "11" or "12" and do this loops

a <- c()
for(i in 1:length(ozone.ID)){
if(ozone.ID[i] == "11"){a <- c(a,i)}
}

b <- c()
for(i in 1:length(ozone.ID)){
if(ozone.ID[i] == "12"){b <- c(b,i)}
}

Now we can compare this:

ozone.time[a] == ozone.time[b]

and it's false, so for each individual we get a different order

What it does with function:

Take a look at:

ind <- obj$groups == individual
vw <- 1/varWeights(obj$modelStruct$varStruct)[ind]
vars <- (obj$sigma * vw)^2
result <- t(S * sqrt(vars)) * sqrt(vars)

ind is an boolean vector and it has "TRUE" values when function get the object with the same individual.

Next is vw:

vw <- 1/varWeights(obj$modelStruct$varStruct)[ind]
a <- 1/varWeights(obj$modelStruct$varStruct)

1/varWeights(...) function output is modified var weights, so it gives a vector with 7 values repeated 10 times

b <- a[ind]

ind have one true value in k place where k is from 1 to 10 and then for every ten values six times, but vw is repeating 7 values, so we get:

k + 10a (mod 7) ≡ k + 3a (mod 7)

Let k be equal to 1 or 2, then:

1 + 3a (mod 7) => (1,4,7,3,6,2,5) order of var weights
2 + 3a (mod 7) => (2,5,1,4,7,3,6) order of var weights

It indicates that "vw" for each individual will have the same values but in different orders, but R doesn't see it and do the rest of calculations, so:

vars <- (obj$sigma * vw)^2
result <- t(S * sqrt(vars)) * sqrt(vars)

"vw" have different orders from each individual, so it indicates for "vars" too and then in "result" we are multiplying the matrix "S" with different vectors, because of it possible orders

I think that problem will disappear when you sort "ID" differently

Related Solutions

Solved – Different variance-covariance matrices of random effects per fixed-effect group in lme4

Thanks to @amoeba and using @BenBolker's brief remark here

Extensions such as allowing different residual variances or different variance-covariance matrices of random effects per (fixed-effect) group can be achieved, somewhat clunkily, by using the dummy() helper function to construct an indicator variable to multiply by individual levels of interest.

we got to the bottom of the problem.

The solution is following:

Y ~ X*Condition + 
    (X*Condition | Subject) + 
    (0 + dummy(Condition, "A") + X:dummy(Condition, "A") | Trial) + 
    (0 + dummy(Condition, "B") + X:dummy(Condition, "B") | Trial)

The summary(model) in random effects yields:

Groups  Name                        Variance Std.Dev. Corr             
 subject (Intercept)                 0.89343  0.9452                    
         X                           0.11695  0.3420   -0.85            
         ConditionB                  0.66731  0.8169   -0.33  0.06      
         X:ConditionB                0.07391  0.2719    0.34 -0.05 -0.47
 Trial   dummy(Condition, "A")       0.63854  0.7991                    
         dummy(Condition, "A"):X     0.09372  0.3061   -0.76            
 Trial.1 dummy(Condition, "B")       0.88833  0.9425                    
         dummy(Condition, "B"):X     0.12175  0.3489   -0.60

which now makes perfect sense, because only correlations that can be calculated are between intercepts and slopes for a certain condition (because they are estimated for the same trials). Correlations between Conditions are senseless.

Furthermore, coef(model)$Trial now shows logical values:

      dummy(Condition, "A")   dummy(Condition, "A"):X    dummy(Condition, "B")   dummy(Condition, "B"):X      (Intercept)          X           ConditionB    X:ConditionB
A1                0.9198822                 0.0209849                        0                         0       2.703544   -0.9929765          -0.07102448        0.2415836
A2               -1.3029020                 0.3894812                        0                         0       2.703544   -0.9929765          -0.07102448        0.2415836
A3                1.1294702                -0.2475288                        0                         0       2.703544   -0.9929765          -0.07102448        0.2415836
B1              0.000000000              0.0000000000               1.21725268              -0.305314643       2.703544   -0.9929765          -0.07102448        0.2415836
B2              0.000000000              0.0000000000               0.88317976              -0.209529267       2.703544   -0.9929765          -0.07102448        0.2415836
B3              0.000000000              0.0000000000               0.27859781              -0.065708851       2.703544   -0.9929765          -0.07102448        0.2415836

Fixed effects are the same for all trials
dummy(Condition, "A") intercepts and dummy(Condition, "A"):X slopes are calculated only for Condition A, in Condition B trials they are estimated to be 0, and vice versa.

N.B. When specifying random effects for this purpose, it is important to:

specify random effects for different groups of trials independently, not under the same |Trial. If you don't do that, lme4 will estimate random effects for all trials, not just for the given condition.
include 0, so as to prevent lme4 from including a general random intercept across all trials.

Best Answer

Related Solutions

Solved – Different variance-covariance matrices of random effects per fixed-effect group in lme4

Related Question