Poisson Regression – How is a Poisson Rate Regression Equal to a Poisson Regression with Corresponding Offset Term?

generalized linear modeloffsetpoisson distributionrweighted-data

I do not understand the role of weights in "weighted Poisson regression". What exactly is being weighted? Is it the contribution of the observation to the log-likelihood of the model, or something else?

In the following two popular threads,

Where does the offset go in Poisson/negative binomial regression?

When to use an offset in a Poisson regression?

commentators establish the equivalence between Poisson regression with explicit offset $t_i$ (for exposure time, for example) in equation:

$$\ln(\lambda_i) = \beta_0 + \beta_1 x_{i1} + \dots + \beta_1 x_{iN} + \ln(t_i)$$

and weighted Poisson regression with weights $t_i$ (at least in R):

$$\ln\!\bigg(\frac{\lambda_i}{t_i}\bigg) = \beta_0 + \beta_1 x_{i1} + \dots + \beta_1 x_{iN}$$

By equivalent, one of the threads demonstrates with an example that the estimated coefficients are the same.

However, I don't understand what the weighting in the second regression means? What are the objective functions being optimised in both cases? In the first one is it the normal Poisson log-likelihood: $-\lambda + k \ln(\lambda) – \ln(k!)$?

Best Answer

This also confused me. I thought, "what is the point of explicitly including an offset instead of just pretending that the response divided by the offset / exposure is the $y$ value?".

You actually get two different loss functions if you do so.

The correct way (use an exposure/offset $s_i$)

Model $\log \lambda_i = \log s_i + \theta^T x$ so that $\lambda_i = s_i e^{\theta^Tx}$. This makes complete sense: the exposure $s_i$ just multiplies the $\hat{\lambda_i}=e^{\hat{\theta}^Tx}$ in a Poisson regression model without different exposures.

We model the random variable $Y$, a response to $x_i$, with a Poisson distribution with parameter $\lambda_i$.

Then the likelihood for $N$ data points is:

$$\prod_{i=1}^N \dfrac{(s_ie^{\theta^Tx})^{y_i}}{y_i!}e^{-s_i e^{\theta^Tx}}$$

The log likelihood $\ell$, keeping only terms that depend on $\theta$ since others will drop out after differentiation:

$$\ell = \displaystyle \sum_{i=1}^N\big[ y_i\theta_Tx_i -s_i e^{\theta^Tx_i}\big]$$

The incorrect way (using $y_i/s_i$ as the y-values)

Now we still model:

$$\log \lambda_i = \log s_i + \theta^T x$$

The difference is that now we assume $y_i/s_i$ has a Poisson distribution. This is essentially what makes the model incorrect. It violates the assumption that $y_i$ has a Poisson distribution. Now you are modeling the rate as having a Poisson distribution. So the likelihood is now:

$$\prod_{i=1}^N \dfrac{(e^{\theta^Tx})^{y_i/s_i}}{(y_i/s_i)!}e^{- e^{\theta^Tx}}$$

[Awkward to have $y_i/s_i$ in the factorial term but it drops out anyway after differentiation of the log likelihood so let's carry on.]

The log likelihood $\hat{\ell}$, keeping only terms that depend on $\theta$ since others will drop out after differentiation:

$$\hat{\ell} = \displaystyle \sum_{i=1}^N\bigg[ \frac{y_i}{s_i}\theta_Tx_i - e^{\theta^Tx_i}\bigg]$$

Conclusion

$\ell$ and $\hat{\ell}$ look strikingly similar, and you might think they are the same, but they are not (you can't just divide by $s_i$ because it is different for every term!)

However, if we consider a weighted Poisson regression when we model $y_i/s_i$ as distributed Poissonian (is that a word?), each data point in the log likelihood gets a weight of $s_i$, then:

$$\hat{\ell}_{{\rm weighted}}=\displaystyle \sum_{i=1}^N s_i[ \frac{y_i}{s_i}\theta_Tx_i - e^{\theta^Tx_i}]$$

is equivalent to $\ell$!

Related Question