Solved – R fitting Poisson distribution with weighting

distributionspoisson distributionr

I am wanting to fit a Poisson distribution but I am wanting to weight some of the values as more important. Is there a way to do this in R with fitdistr() or some equivalent function? Here is what I have at the moment:

randoms <- rpois(15,10)
weighting <- seq(1, 100, by=1)
fit <- fitdistr(randoms, "poisson")

I would like to use the 'weighting' data as a way to emphasize the observation at the end of the data set.

Best Answer

Note that 'fitdistr does nothing but maximum likelihood estimation. That is to say, you can do it by yourself by writing down the likelihood. Below is an example for the poisson distribution in R. It can be adapted to upweight/downweight the contribution to the likelihood of each data.

density (as in R) $$f(x; \lambda) = \lambda^x \frac{\exp(-\lambda)}{x!}, \qquad \lambda > 0$$

likelihood $$L(\lambda; \mathbf{x}) = \prod_{i=1}^n \left\{ \lambda^{x_i} \frac{\exp(-\lambda)}{x_i!} \right\} $$

log-likelihood

$$\ell(\lambda; \mathbf{x}) = \sum_{i=1}^n \left\{ x_i \log(\lambda) - \lambda - \log(x_i!) \right\}$$

2nd derivatie of $\ell$:

$$ \frac{d^2 \ell}{d\lambda^2}(\lambda; \mathbf{x}) = \sum_{i=1}^n - \frac{x_i}{\lambda^2} = -\frac{1}{\lambda^2} n\bar{x}$$

#------data------
set.seed(730)
sample <- rpois(1000, 10)
#----------------

 ################################################################################
 # Using 'fitdistr'                                                             #
 ################################################################################

 library(MASS)
 fitdistr(x=sample, densfun="Poisson")
 lambda  
   10.1240000 
   ( 0.1006181)

 ################################################################################
 # writing down the log-likelihood explicitly                                   #
 ################################################################################

 #------minus log-likelihood------
 mloglik <- function(lambda2, sample) #lambda2 = log(lambda) in (-\infty, \infty)
 {
  - sum(sample * lambda2 - exp(lambda2) - log(factorial(sample)))
 }
 #--------------------------------

 #------optimisation------
 res <- nlm(f=mloglik, p=1, sample=sample)
 #------------------------

 #------recover lambda------
 lambda <- exp(res$estimate)
 round(lambda, 7)
 [1] 10.12399
 #--------------------------

 #------standard error------
 #square root of negative inverse second derivative of the log-likelihood
 se <- lambda / sqrt(length(sample) * mean(sample))
 round(se, 7)
 [1] 0.100618
 #--------------------------

Related Solutions

Poisson Distribution – Modeling a Poisson Distribution with Overdispersion

for overdispersed poisson, use the negative binomial, which allows you to parameterize the variance as a function of the mean precisely. rnbinom(), etc. in R.

Poisson GLM in R – Issues with Rates vs. Counts

When you add the offset you don't need to (and shouldn't) also compute the rate and include the exposure.

I don't know whether this is the cause of the errors, but if the exposure per case is person days pd, then the dependent variable should be counts and the offset should be log(pd), like this:

cdi_model <- glm(counts ~ covariate + months + offset(log(pd)), 
                 data=test, family = poisson)

Best Answer

Related Solutions

Poisson Distribution – Modeling a Poisson Distribution with Overdispersion

Poisson GLM in R – Issues with Rates vs. Counts

Related Question