Solved – “weight” input in glm and lm functions in R

generalized linear modellikelihoodlmrweighted-regression

I am confused with the definition of the weights in glm and lm.

Using the McCullagh and Nelder (1989)'s notation, If random variable $y_i$ is from the Generalized Linear Model (GLM), then its density is modelled in the form:

\begin{equation}
f(y_i) = exp\Big(\frac{m_i}{\phi} [\theta_i y_i – b(\theta_i) ] + c(y_i;\phi)\Big)
\end{equation}

where $\theta_i$ is the canonical parameter, $\kappa$ is the dispersion parameter and $m$ is the known prior "weight". I would like to know that this prior "weight" is NOT the weight specified in glm. help(glm) says that:

Non-NULL weights can be used to indicate that different observations have different dispersions (with the values in weights being inversely proportional to the dispersions); or equivalently, when the elements of weights are positive integers w_i, that each response y_i is the mean of w_i unit-weight observations. For a binomial GLM prior weights are used to give the number of trials when the response is the proportion of successes: they would rarely be used for a Poisson GLM.

Therefore, in my understanding, what "weight" $w_i$ does is to re-parameterize the dispersion parameter as

$$\phi=\frac{\phi^*}{w_i},$$

where $\phi^*$ is the redefined dispersion parameter.
This means that for example, when $y_i$ is modelled only with an intercept term $\beta_0$,
lm function with non NULL "weight" specification maximizes the sum of the weighted likelihood of $y_i$ with respect to $\phi^*$ and $\beta_0$ where:

$$
f(y_i)=\sqrt{ \frac{w_i}{2\pi \phi^*} } \exp\Big(-\frac{1}{2}\frac{w_i (y-\beta_0)^2}{\phi^*}\Big),
$$
where the identity link is used $\beta_0=\theta_i$.

Similarly, glm function with family = "poisson" with non NULL "weight" maximizes the sum of the weighted likelihood of $y_i$ with respect to $\beta_0$ where:

$$
f(y_i)=\frac{\beta_0^{w_i y_i}}{y_{i}!} exp(-w_i \beta_0),
$$

where the log link is used $\beta_0=exp(\theta_i)$.

Similarly, glm function with family = "binomial" with non NULL "weight" maximizes the sum of the weighted likelihood of $y_i$ with respect to $\phi^*$ and $\beta_0$ where:

$$
f(y_i)=
\begin{pmatrix}
m\\
y_i
\end{pmatrix}
\beta_0^{w_iy_i}(1-\beta_0)^{w_i(m-y_i)}
$$

where logit link is used $\beta_0 = logit^{-1}(\theta_i)$.

Is my understanding correct?

Reference:

C.E. McCulloch and J.A. Nelder. Generalized Linear Models. Chapman and Hall, London,
1989.

Best Answer

I found a reference supporting my understanding of the weight in glm.

The book "Modern Applied Statics with S" written by W.N Venables and B.D Ripley (Fourth edition) defines GLM model for $y_i$ as:

$$ f(y_i;\theta_i, \phi)=\exp \Big( \frac{A_i (y_i\theta_i-b(\theta_i))}{\phi}+c(y_i,\phi/A_i)\Big) $$

(page 183, equation 7.1). Then the page 188 says

"Prior weights $A_i$ may be specified using weight argument."