Mixed Model – How to Calculate Estimated Proportions and Their Confidence Intervals

glmmlme4-nlmemixed modelproportion;standard error

I have an experiment with two treatments. It is a split plot experiment, with the structure Block/Treatment1/Treatment2. Each treatment has 2 levels. The dependent variable is presence/absence data for functional speceis groups. I am analysing the data using mixed models via lme4 and afex.

The model structure is as follows:

m <- lmer (DV ~ Treatment 1 * Treatment 2 + (1|Block/Treatment1),
           family = binomial) 

For graphical purposes however, I want to plot the treatment means for each of the 4 treatment combinations (Treatment 1 level A and Treatment 2 level A; Treatment 1 level B Treatment 2 level A; Treatment 1 level B and Treatment 2 level A; Treatment 1 level B and Treatment 2 level B). I couldn't find an intuitive way to extract these mean and standard error values in R for data with a binomial error distribution, thus I am calculating these values by hand. I appreciate that the standard errors I calculate will not take into account the random effect structure in my dataset, but I cannot find a way to calculate them to include this random effect structure (using a mcmc simulation in language R to create HPD confidence intervals only works for Gaussian data).

As such, I want to calculate treatment means for the proportion of a particular functional group in a particular treatment combination, and then standard errors of these 4 treatment means.

Consider that the proportions I have for a particular treatment are:

Plot 1: 24/65
Plot 2 26/64
Plot 3: 25/65
Plot 4: 22/62
Plot 5: 30/66
Plot 6: 29/65

I understand that to calculate the average proportion I need to sum the total number of hits (sum of all the numerators above) and divide by the total sum of hits and misses (sum of all the denominators above). This gives me 0.40.

My question then is, how does one calculate the standard error of the resulting proportion?

Calling the proportion of hits p and misses q, I have the following formula for the standard error of a proportion:

sqrt((p*q)/n)

Is this correct? In addition, what is n? Is n the number of proportions that make up the mean proportion (in this case 6) or the total number of hits and misses (i.e. sum of all the denominators above, which here is 387)?

I'm sorry that this is a simple question, I just cannot find an answer anywhere! Also i'm sorry if this is not the correct forum to post this in…

Many thanks.

Best Answer

First a note: you can't calculate a decent standard error on the probabilities, you have to do so on a logit scale and use those to construct your confidence intervals. Intervals around probabilities are hardly symmetrical, and definitely not when using a mixed model.

You can easily plot the effects using the package effects:

With the function Effect() you can specify the effects you want to plot and plot immediately, or extract the information you want.

Some random fake data:

ndata <- data.frame(
  DV = sample(0:1,200,TRUE),
  Treatment1 = rep(rep(c('A1','B1'),25),4),
  Treatment2 = rep(rep(c('A2','B2'),each=25),4),
  Block = rep(c("Block1","Block2"),each=100)
  )

m <- glmer (DV ~ Treatment1 * Treatment2 + (1|Block/Treatment1),
           data=ndata, family = binomial) 

To get a plot of the effects, you can simply do:

plot(allEffects(m))

To get:

enter image description here

The same you get with plot(Effect(c("Treatment1","Treatment2"),m)

If you want to get the actual data out, you can save the result of a call to Effect() in an object, and extract the necessary data:

est <- Effect(c("Treatment1","Treatment2"),m)
cbind(est$x,est$fit,est$se,est$lower,est$upper)

to get:

  Treatment1 Treatment2       est$fit    est$se  est$lower est$upper
1         A1         A2 -7.696104e-02 0.2775554 -0.6209597 0.4670376
2         B1         A2 -1.670541e-01 0.2896820 -0.7348204 0.4007122
3         A1         B2 -3.364722e-01 0.2927591 -0.9102694 0.2373250
4         B1         B2  1.110223e-16 0.2773501 -0.5435962 0.5435962

Note that these are on the original (logit) scale. Calculating a confidence interval would involve transforming this to the original scale, using eg. plogis() :

> cbind(est$x,plogis(est$fit),plogis(est$lower),plogis(est$upper))
  Treatment1 Treatment2 plogis(est$fit) plogis(est$lower) plogis(est$upper)
1         A1         A2       0.4807692         0.3495632         0.6146824
2         B1         A2       0.4583333         0.3241378         0.5988588
3         A1         B2       0.4166667         0.2869447         0.5590543
4         B1         B2       0.5000000         0.3673514         0.6326486

PS : this is not the cleanest code, it's just for illustrative purposes.

Related Question