Solved – Help Deriving Variance Function – Binomial GLM

exponential-familygeneralized linear model

I'm having difficulty replicating/deriving a result in GLM's for Binomial data. That is, if $Y \sim Bin(n, \mu)$ and we put the distribution of $Y/n$ into exponential family form (with a dispersion parameter), then the variance function is given by:

$$V(\mu) = \frac{\mu(1-\mu)}{n}$$

This can be found on page six of these lecture slides and on page 116 of Faraway's "Extending the Linear Model with R"

I just don't see that. Here's my approach. Let's start by writing the response variable's pmf in exponential family form. If we write $X = Y/n$ as our response variable, $nX$ is binomial so that

$f(x) = {n \choose nx} \mu^{nx}(1-\mu)^{n-nx}$

$\implies f(x) = exp\bigg[ log({n \choose nx}) + nxlog(\mu) + (n-nx)log(1-\mu) \bigg]$

$\implies f(x) = exp\bigg[ nxlog(\frac{\mu}{1-\mu}) + nlog(1-\mu) + log({n \choose nx})\bigg]$

$\implies f(x)= exp\bigg[ \frac{xlog(\frac{\mu}{1-\mu}) + log(1-\mu)}{1/n} + log({n \choose nx})\bigg]$

$\implies f(x) = exp\bigg[ \frac{x \theta -b(\theta)}{a(\phi)} + c(x,\phi) \bigg]$,

with $\theta = log(\frac{\mu}{1-\mu})$, $b(\theta)=log(1+e^\theta)$, $\phi = 1$, $a(\phi) = 1/n$, and $c(x,\phi) =log({n \choose nx})$ .

Now, to get the variance function, we begin by:

$b'(\theta) = \frac{1}{1+e^\theta} e^\theta $

$b''(\theta) = \frac{e^\theta}{(1+e^\theta)^2}$

We need this in terms of $\mu$ so we plug in $\theta = log(\frac{\mu}{1-\mu})$ and get

$V(\mu) = \frac{\frac{\mu}{1-\mu}}{(\frac{1}{1-\mu})^2} = \mu(1-\mu)$

This is conspicuously missing a $1/n$. What am I missing?

Best Answer

I think I figured it out. It's important to mention that the discussion in Faraway was in the context of IRWLS.

First of all, we can use either the variance of the response or the variance function in our IRWLS implementation. It just represents a scale change: $Var(Y)=V(\mu)a(\phi)$ where $a(\phi)$ is just some constant. So I think Faraway is actually using the Variance of Y.

Second, Faraway was using R, which actually takes the sample proportion, $\bar{Y}=Y/n$, when it fits the model:

If a binomial glm model was specified by giving a two-column response, the weights returned by prior.weights are the total numbers of cases (factored by the supplied case weights) and the component y of the result is the proportion of successes.

So even though $Y\sim Bin(n,\mu)$, the response $\bar{Y}$ has variance $Var(\bar{Y})=\frac{1}{n^2}Var(Y)=\frac{\mu(1-\mu)}{n}$

So until I'm told otherwise, I will assume that his writing $V(\mu)$ above was either an error or done to gloss over something. It's the variance of the response.

Related Question