I do not understand the role of weights in "weighted Poisson regression". What exactly is being weighted? Is it the contribution of the observation to the log-likelihood of the model, or something else?
In the following two popular threads,
Where does the offset go in Poisson/negative binomial regression?
When to use an offset in a Poisson regression?
commentators establish the equivalence between Poisson regression with explicit offset $t_i$ (for exposure time, for example) in equation:
$$\ln(\lambda_i) = \beta_0 + \beta_1 x_{i1} + \dots + \beta_1 x_{iN} + \ln(t_i)$$
and weighted Poisson regression with weights $t_i$ (at least in R):
$$\ln\!\bigg(\frac{\lambda_i}{t_i}\bigg) = \beta_0 + \beta_1 x_{i1} + \dots + \beta_1 x_{iN}$$
By equivalent, one of the threads demonstrates with an example that the estimated coefficients are the same.
However, I don't understand what the weighting in the second regression means? What are the objective functions being optimised in both cases? In the first one is it the normal Poisson log-likelihood: $-\lambda + k \ln(\lambda) – \ln(k!)$?
Best Answer
This also confused me. I thought, "what is the point of explicitly including an offset instead of just pretending that the response divided by the offset / exposure is the $y$ value?".
You actually get two different loss functions if you do so.
The correct way (use an exposure/offset $s_i$)
Model $\log \lambda_i = \log s_i + \theta^T x$ so that $\lambda_i = s_i e^{\theta^Tx}$. This makes complete sense: the exposure $s_i$ just multiplies the $\hat{\lambda_i}=e^{\hat{\theta}^Tx}$ in a Poisson regression model without different exposures.
We model the random variable $Y$, a response to $x_i$, with a Poisson distribution with parameter $\lambda_i$.
Then the likelihood for $N$ data points is:
$$\prod_{i=1}^N \dfrac{(s_ie^{\theta^Tx})^{y_i}}{y_i!}e^{-s_i e^{\theta^Tx}}$$
The log likelihood $\ell$, keeping only terms that depend on $\theta$ since others will drop out after differentiation:
$$\ell = \displaystyle \sum_{i=1}^N\big[ y_i\theta_Tx_i -s_i e^{\theta^Tx_i}\big]$$
The incorrect way (using $y_i/s_i$ as the y-values)
Now we still model:
$$\log \lambda_i = \log s_i + \theta^T x$$
The difference is that now we assume $y_i/s_i$ has a Poisson distribution. This is essentially what makes the model incorrect. It violates the assumption that $y_i$ has a Poisson distribution. Now you are modeling the rate as having a Poisson distribution. So the likelihood is now:
$$\prod_{i=1}^N \dfrac{(e^{\theta^Tx})^{y_i/s_i}}{(y_i/s_i)!}e^{- e^{\theta^Tx}}$$
[Awkward to have $y_i/s_i$ in the factorial term but it drops out anyway after differentiation of the log likelihood so let's carry on.]
The log likelihood $\hat{\ell}$, keeping only terms that depend on $\theta$ since others will drop out after differentiation:
$$\hat{\ell} = \displaystyle \sum_{i=1}^N\bigg[ \frac{y_i}{s_i}\theta_Tx_i - e^{\theta^Tx_i}\bigg]$$
Conclusion
$\ell$ and $\hat{\ell}$ look strikingly similar, and you might think they are the same, but they are not (you can't just divide by $s_i$ because it is different for every term!)
However, if we consider a weighted Poisson regression when we model $y_i/s_i$ as distributed Poissonian (is that a word?), each data point in the log likelihood gets a weight of $s_i$, then:
$$\hat{\ell}_{{\rm weighted}}=\displaystyle \sum_{i=1}^N s_i[ \frac{y_i}{s_i}\theta_Tx_i - e^{\theta^Tx_i}]$$
is equivalent to $\ell$!