Solved – How to get correct Covariance components in SAS Proc mixed? Struggling with infinite likelihoods and negative variance components

covariancecovariance-matrixgeneticsmixed modelsas

I am trying to estimate covariance components from PROC MIXED and having trouble with convergence problems in the REML approach when I use the NOBOUNDS option. Let me explain why I am using NOBOUNDS in case this is the problem.

I am trying to estimate genetic covariances among 12 traits in a group of self-fertilizing plants. There are ~400 plants and 160 genotypes. The easy approach to estimating covariances is to calculate the covariance among genotype means. This approach introduces some bias into covariance estimates, so I want to compare estimates of variance and covariance components. I’ll refer to these approaches as “Among-genotype covariance” and “covariance components” approaches.

To calculate between trait covariance components, I organized my data (for 160 genotypes) like this:

Genotype Trait  Value
1       Biomass 1.28
1       Biomass 1.07
1       Biomass 1.97
1       TLength 1070.7
1       TLength 875.14
1       TLength 1547.93

To get the (genetic) covariance component between Biomass and TLength, I used this code:

PROC MIXED DATA=multi56 covtest cl;
WHERE trait IN ("Biomass","TLength")
CLASS genotype trait;
MODEL value=trait;
RANDOM trait/SUBJECT=genotype TYPE=un;
/* The RANDOM statement tells SAS to estimate the 2 x 2 
among genotype variance component matrix for the two traits 
listed in the where statement */ 
RUN;

The resulting covariance parameter output was:

Covariance Parameter Estimates
Cov Parm    Subject Estimate    Standard Error  Z Value Pr Z    Alpha   Low Upper
UN(1,1) Genotype    0   .   .   .   .   .   .
UN(2,1) Genotype    50450   19630   2.57    0.0102  0.05    11975   88924
UN(2,2) Genotype    316010  90306   3.50    0.0002  0.05    193511  606860
Residual        55618   3437.23 16.18   <.0001  0.05    49451   63021

The covariance estimate of 50450 is waaay bigger than the estimate I got from the covariance among the genotype means for Biomass and TLength (the Among genotype approach) which equals 257.1. SO I figured the 0 variance component above could be the problem. I ran the same PROC MIXED code above only adding the NOBOUND option:

PROC MIXED DATA=multi56 covtest cl NOBOUND ;

Now my covariance component estimates nearly match (as they should):

Iteration History
Iteration   Evaluations -2 Res Log Like Criterion
0            1                  10109.62259781  

WARNING: Stopped because of infinite likelihood.

Covariance Parameter Values
At Last Iteration
Cov Parm    Subject    Estimate
UN(1,1)    genotype   -34470
UN(2,1)    genotype    249.06
UN(2,2)    genotype    188855
Residual               83814

NOTE: 125 observations are not included because of missing values.
WARNING: Stopped because of infinite likelihood.

The covariance component estimate of 249 I got by allowing negative variances is much more reasonable than 50450, considering the estimate from the among genotype covariance was 257. The problem is that the REML approach stopped and gave me warnings. Does this mean I cannot trust this covariance component output?
I thougth about running PROC MIXED with METHOD=TYPE3, but I can’t do that because I have SUBJECT=genotype TYPE=un as part of my random statement. What is the best way to get the correct between trait covariance components?

Thank You!

Tarek

Best Answer

The SAS manual gives extensive advice on convergence issues that you should check out. One easy and obvious appraoch is to use the parms statement to input the starting values for the unstructured G matrix. Please be aware that the covariance between genotype means is likely quite differnt than the covariance between genotypes, so it is not clear how useful the means covariance is. Most importantly, you have defined trait as both a fixed (model statement) and a random (random statement) effect. You may be getting the 0 variance because the fixed effect portion of your mixed model accounts for all the variation in that trait. A better appraoch may be an hierarchical variance coponents model

Related Question