I'm sorry to be negative, but I think there are a number of reasons that what you're doing won't work very well.
- As a rule of thumb (see e.g. Harrell 2001 Regression Modeling Strategies), you need at least 10 data points per parameter estimated to expect a reliable answer: you have 30 data points for 5 principal component slope parameters, and that's not even counting the random effects variances (which are generally even harder to estimate than fixed effect parameters).
- it doesn't work very well to estimate random effects from fewer than approx. 5 levels, so a random effect of year (where you've only measured 2 values) is probably not a good idea.
- I don't think
lme
(on which glmmPQL
is built) can handle crossed effects, so without knowing it I think you're probably fitting Year nested within Individual.
Let's look at the data:
source("heidel.dat") ## get data
library(ggplot2)
theme_set(theme_bw()) ## cosmetic
ggplot(dat,aes(x=PC1,y=Distance))+geom_point(aes(colour=Year))+
geom_line(aes(group=Animal),alpha=0.3)+scale_y_log10()
There doesn't seem to be much going on here, but if I did want to try I would probably forego the Gamma model and just look at logged data:
library(lme4)
lmer(log(Distance)~PC1+Year+(1|Animal),data=dat)
At the risk of snooping, let's look at the marginal effects of all 5 PC variables to see if any of them looks like it's doing anything:
library(reshape2)
mdat <- melt(dat,id.var=c("Distance","Animal","Year"))
library(grid)
zmargin <- theme(panel.margin=unit(0,"lines")) ##
ggplot(mdat,aes(x=value,y=Distance))+geom_point(aes(colour=Year))+
geom_line(aes(group=Animal),alpha=0.3)+scale_y_log10()+
facet_grid(.~variable,scale="free_x")+geom_smooth(method="lm")+zmargin
Using my magic crystal ball to see the output of str(DF)
...
Aha! SAFE_DRVR_PLEDGE_FLG
has another level, ""
, and that in that same row, one of the other variables is missing. So it's not using that row to do the fit, but when it tries to construct the data matrix to get the predictions, it can't because the level ""
exists in the data set but not in the model.
You should probably fix your data set before fitting the model, but you can also get the predictions to work by not telling predict
to use the DF
data frame; it will then successfully work as it uses the data matrix from the model. However, this will have fewer predictions than there are rows in DF
because those rows that were causing trouble were automatically discarded.
Example, based on @smillig's:
d <- data.frame(w=rnorm(100),
x=rnorm(100),
y=sample(LETTERS[1:2], 100, replace=TRUE),
z=sample(LETTERS[3:4], 100, replace=TRUE) )
d2 <- rbind(d, data.frame(w=1, x=1,y=NA, z=""))
fm2 <- glm(y ~ w + x + z, data=d2, family=binomial)
predict.glm(fm2, d2, type="response", se.fit=FALSE)
# Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) :
# factor 'z' has new level(s)
predict(fm2, type="response", se.fit=FALSE)
# No error, this works. However, it has length 100 even though d has 101 rows.
Best Answer
This often happens when internally some of the results were invalid, like logarithms of non-positive numbers. In my experience, I have found
fitdistr
to be very "fragile". Sometimes you will still get results, other times (often with gammas)fitdistr
just fails. You are almost always better off writing a small function to return the negative loglikelihood, and passing it directly intooptim
using Nelder-Mead (to dispense with calculating derivatives, althoughderiv3
often works).