Solved – GLM analogue of weighted least squares

generalized linear modelquasi-likelihood

The short version:

I can fit a model using Weighted Least Squares, given a diagonal matrix of weights $W$, by solving $(X^TWX)\hat{\beta}=X^TWy$ for $\hat{\beta}$.

Is there a GLM analogue? if so, what is it?

There seems to be a GLM analogue, e.g. with the weights argument in R's glm function. How is R using these weights?


The long version:

the situation

As a follow-up to my IPTW question, I just want to double check that I understand how to fit a parametric model using inverse probability(-of-treatment) weights (IPTW). The idea with IPTW is to simulate a dataset in which the relationship between my independent variables $(a^1,a^2,a^3)$ and dependent variable $y$ is unconfounded and therefore causal. For argument's sake let's say I already estimated an IPT weight $\hat{w}_i$ for each observation. These weights are hypothetical probability weights from the simulated dataset.

the question

I now want to fit a GLM. I'd just use WLS, but I'm working with a binary outcome and an outcome truncated at zero. So I have a linear model $\eta_i=a^T\beta$, a link $\mu_i=g(\eta_i)$, and a variance $V(y_i)$ derived from my likelihood for $y$. Then the likelihood equations are
$$
\sum_{i=1}^N \frac{y_i-\mu_i}{V(y_i)}\frac{\partial\mu_i}{\partial\beta_j}=\sum_{i=1}^N \frac{y_i-\mu_i}{V(y_i)}\left(\frac{\partial\mu_i}{\partial\eta_i}x_{ij}\right)=0,~\forall j
$$ as per Categorical Data Analysis, Agresti, 2013, section 4.4.5.

So all I have to do is multiply $var(\mu_i)$ by the weight $\hat{w}_i$, right? The same way I might if I wanted to incorporate an overdispersion parameter? If so, is this because the variance of, say, 5 independent observations is 5 times the variance of one independent observation?

Follow-up idea: since the likelihood is the product of the likelihood for each observation, is there some weighting procedure I can use to just weight the likelihoods?

Best Answer

Fit an MLE by maximizing $$ l(\mathbf{\theta};\mathbf{y})=\sum_{i=1}^Nl{\left(\theta;y_i\right)} $$

where $l$ is the log-likelihood. Fitting an MLE with inverse-probability (i.e. frequency) weights entails modifying the log-likelihood to:

$$ l(\mathbf{\theta};\mathbf{y})=\sum_{i=1}^Nw_i~l{\left(\theta;y_i\right)}. $$

In the GLM case, this reduces to solving $$ \sum_{i=1}^N w_i\frac{y_i-\mu_i}{V(y_i)}\left(\frac{\partial\mu_i}{\partial\eta_i}x_{ij}\right)=0,~\forall j $$

Source: page 119 of http://www.ssicentral.com/lisrel/techdocs/sglim.pdf, linked at http://www.ssicentral.com/lisrel/resources.html#t. It's the "Generalized Linear Modeling" chapter (chapter 3) of the LISREL "technical documents."

Related Question