Solved – Providing starting values for a Generalized Linear Mixed Model with glmmPQL

modelingpcar

I am trying to run a Generalized Linear Mixed Model on some data. What I am trying to do is use distances from habitat features to predict a distance between 2 animal locations. I ran a PCA on the habitat variables, and selected 5 Principal Components to include. The data is badly uneven and has pseudoreplication as I sampled animals over multiple years (Random factor of Animal and Year). I was told to use the glmmPQL code from the MASS package and have had generally good luck early on with it, until however, I tried to run this code.

Recruitment<-glmmPQL(Distance~PC1+PC2+PC3+PC4+PC5,random=list(~1|Year,~1|Animal),data=RecruitmentScores,family=Gamma)

I get the error:

Error: no valid set of coefficients has been found: please supply starting values
In addition: Warning message:
In log(ifelse(y == 0, 1, y/mu)) : NaNs produced

After quite a lot of looking I found a helpful answer here ->
https://stackoverflow.com/questions/6043841/glm-nb-with-sqrt-link-using-r

I tried running it, beginning with a simplified model

Recruitmentsimple<-glmmPQL(Distance~PC1,random=list(~1|Year,~1|Animal),data=RecruitmentScores,family=Gamma)

But when I tried to use the coefficients in the form given as starting values

start=c(coef(Recruitmentsimple),0,0,0,0)

I realized that for glmmPQL you get multiple different coefficients for the intercept, as far as I can tell as a result of using Random factors. So essentially my question at long last is, can I use the intercept coefficient given when I run

summary(Recruitmentsimple)

And not worry about the multiple different values for the intercept coefficient, or is there another way that I just haven't managed to track down.

I have attached the dataset that is giving me problems below. I've tried my best to include everything that one might need to understand what I am asking, but I never seem to get it all down right, so if you need anymore info or clarification, please let me know.

Also, as a side, if anyone knows what the warning "In log(ifelse(y == 0, 1, y/mu)) : NaNs produced" means, I would love a quick lesson. I haven't really looked that one up yet though, so I imagine it has likely been answered before elsewhere.

Thanks in advance,

Ayden

EDIT: I just now realized that this may be better placed in Cross Validated. Not sure if it should be moved or not. If anyone thinks this isn't the appropriate place for it please let me know and I'll move it over. Thanks.

structure(list(Distance = c(8652.28218489429, 34515.0499360429, 
13422.36479805, 2198.723668712, 2466.26534136143, 1172.74821765, 
21201.8807833714, 20849.9242741857, 8380.37620707143, 9084.62233994286, 
76662.0061293571, 7440.67952042857, 5056.96455740143, 10929.25085166, 
8070.531642568, 4498.00208371571, 4619.498988172, 1764.5402885695, 
13130.1924201333, 4744.33628052857, 8697.47577671429, 79927.6021957714, 
30080.6011465286, 21873.5380787143, 566.245715504857, 15138.1604619429, 
1859.50570863329, 14039.9461915, 5022.93125377857, 7728.33268236286
), PC1 = c(-9.20961054440026, -8.22881214940447, -1.26320485068402, 
-1.88583141566964, 2.99834208412174, 3.04215174230419, -0.766695340745148, 
0.735229906834534, 2.32686247490184, 2.31780475706044, 0.308929257579307, 
-2.20775667813299, -1.87371154127968, 1.38637016290993, 2.1448692184077, 
2.66036523350783, 2.29882505245925, -1.15403039166013, -0.940347362165179, 
-1.53660098100142, -1.91793301909288, -2.59815155458387, 1.56595604420367, 
1.23556277091171, 0.96452943223471, 0.969865750008103, 2.31141646195859, 
2.18867723740005, 2.10330882943435, 2.02361941258173), PC2 = c(2.513457458921, 
1.39006836165313, 0.976211435585022, 1.98022059968544, -0.930280794223418, 
-0.908480231794788, -1.87943285539628, -0.221753725212705, -2.09976115719695, 
-1.73429789916951, 1.53155977977225, -2.03200946293869, -2.19468937121067, 
2.71201044916189, 2.31809496737017, -0.561189572576716, -0.914371814710762, 
-3.09088401547117, -3.2367657712996, -1.47899640213998, -2.27291457674218, 
-3.06844033447764, 0.361357083034496, 2.87493176893121, 2.50774284831908, 
4.48307464510535, 0.719590157678573, 0.950207337579048, 0.519935894659001, 
0.785805197105419), PC3 = c(-0.593605656941536, -2.41169312700627, 
3.42245477165316, 2.32103325600927, -1.20421752405871, -2.12543152051203, 
1.50297591603228, 0.907752402890159, -0.715719837216544, -0.176667860796774, 
0.0977877384294157, -1.09054058162888, 2.09460400960772, -0.351521263276547, 
-2.03216436108758, -0.320967045260432, 0.396002884511102, -1.98885693487832, 
-1.44797012685519, 1.66954160608131, 2.59674418325189, -1.61006342676716, 
1.96179779588314, -0.447937419038047, 0.356740333334006, -0.990664949264204, 
-0.877768580933614, -0.495455616180557, 0.925094811092282, 0.628716122926644
), PC4 = c(1.78952902698518, 1.37340407841083, 1.6005908999068, 
0.732968194465993, 0.653824753615599, 0.665077696404537, -1.77358082516927, 
-0.795990779624489, 1.76153535832313, 1.75203314538194, -0.420521200059536, 
-1.4582340543487, -2.3603861882907, -1.02663076885461, -3.05114382636744, 
2.38437628800937, 2.50094107639545, -0.800037012551982, 0.10243484375005, 
-0.701971577780459, -1.18504466447402, -0.841894048083002, 1.33130718114457, 
-0.710519955969351, -1.06462398825632, -0.810160377589491, 0.368973290550095, 
0.457777535462269, -0.303437881177987, -0.170596220208466), PC5 = c(0.555128333959788, 
0.0882078004966836, -1.01890769816862, -2.29901235898111, 0.587097003218695, 
0.582234518107283, 1.40833083334084, 0.186737986394422, -0.257648867957099, 
-0.372611594774928, -0.535146836463824, 0.248564759734062, 0.304219930919185, 
-0.112531496920674, -3.16536688393648, 0.0979783253130196, -0.541422716558147, 
-1.18312027810175, -0.400586796060056, 1.18985680973737, 0.482520572601662, 
-0.494206152873624, -0.986186427260261, 0.766947964203955, -0.24527554959318, 
1.87753157251745, 0.976222715049242, 1.07223174760341, 0.445515513095953, 
0.742697271356733), Animal = structure(c(1L, 1L, 2L, 3L, 4L, 
4L, 5L, 5L, 6L, 6L, 7L, 8L, 8L, 9L, 9L, 10L, 10L, 11L, 11L, 12L, 
12L, 13L, 13L, 14L, 15L, 16L, 17L, 17L, 18L, 18L), .Label = c("107", 
"108", "109", "111", "112", "170", "172", "173", "175", "176", 
"177", "179", "180", "181", "182", "183", "184", "188"), class = "factor"), 
    Year = structure(c(1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 
    2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 
    1L, 1L, 2L, 1L, 2L), .Label = c("2010", "2011"), class = "factor")), .Names = c("Distance", 
"PC1", "PC2", "PC3", "PC4", "PC5", "Animal", "Year"), row.names = c(NA, 
-30L), class = "data.frame")

Best Answer

I'm sorry to be negative, but I think there are a number of reasons that what you're doing won't work very well.

  • As a rule of thumb (see e.g. Harrell 2001 Regression Modeling Strategies), you need at least 10 data points per parameter estimated to expect a reliable answer: you have 30 data points for 5 principal component slope parameters, and that's not even counting the random effects variances (which are generally even harder to estimate than fixed effect parameters).
  • it doesn't work very well to estimate random effects from fewer than approx. 5 levels, so a random effect of year (where you've only measured 2 values) is probably not a good idea.
  • I don't think lme (on which glmmPQL is built) can handle crossed effects, so without knowing it I think you're probably fitting Year nested within Individual.

Let's look at the data:

source("heidel.dat")  ## get data
library(ggplot2)
theme_set(theme_bw())  ## cosmetic

ggplot(dat,aes(x=PC1,y=Distance))+geom_point(aes(colour=Year))+
    geom_line(aes(group=Animal),alpha=0.3)+scale_y_log10()

enter image description here

There doesn't seem to be much going on here, but if I did want to try I would probably forego the Gamma model and just look at logged data:

library(lme4)
lmer(log(Distance)~PC1+Year+(1|Animal),data=dat)

At the risk of snooping, let's look at the marginal effects of all 5 PC variables to see if any of them looks like it's doing anything:

library(reshape2)
mdat <- melt(dat,id.var=c("Distance","Animal","Year"))

library(grid)
zmargin <- theme(panel.margin=unit(0,"lines"))  ## 
ggplot(mdat,aes(x=value,y=Distance))+geom_point(aes(colour=Year))+
    geom_line(aes(group=Animal),alpha=0.3)+scale_y_log10()+
    facet_grid(.~variable,scale="free_x")+geom_smooth(method="lm")+zmargin

enter image description here

Related Question