Solved – logloss equivalent for poisson regression

accuracycalibrationlog-losspoisson distributionr

I have a Poisson regression model, and I would like to measure the discrepancy between actual counts and predicted counts. For binary classification model, the log-loss metrics fits for this purpose. Is there any metrics alike for Poisson models? Similarly, is there any package or is possible to code a function in R for it? Thanks in advance

Best Answer

When fitting a GLM, the deviance is something you'd like to see as low as possible. I believe for a binomial GLM, the binomial deviance is already the log loss. If you run a Poisson GLM, the Poisson deviance should be the number you're looking for. It's spit out by default in glm() in R.

Check it out what it actually is here.

A bonus benefit to using deviance, is that you can use it to compare models via a $\chi^2$ test.

Related Solutions

Solved – How to analyze residuals of Poisson log-linear model

This sounds like a hierarchical model (or multi-level model), but not explained in those terms. Firstly, note that a Poisson "residual" is not really the same thing as you would find in the case of a normal error term. The reason is that a poisson random variable cannot be separated out easily into a "systematic part" and "random part". One standard way is to use the "deviance residuals", defined by:

$$d_{i}=sgn(Y_{i}-\hat{Y}_{i})\sqrt{2Y_{i}\log\left(\frac{Y_{i}}{\hat{Y}_{i}}\right)-2(Y_{i}-\hat{Y}_{i})}$$

These quantities are analogues to the residuals in OLS regression, and can be easily obtained from residuals(glm.object, "deviance") function in R. You can show that the sum of squared deviance residuals has an approximate chi-square distribution, but I wouldn't know how to figure out the exact distribution.

An alternative is to add further structure to the Poisson GLM by introducing "random effects" into the model equation. For the Poisson model with canoncial log-link function we have $\log(\mu_i)=x_{i}^{T}\beta$. In your case $x_{i}^{T}$ is a row of a "design matrix" (all zeros and ones) which picks out the relevant beta coefficient for your site effect and year effect. This addition of random effects basically says that the above relationship only holds "on the average" rather than deterministically (can think of this as a "softer" link function). So that we actually have $E\left[\log(\mu_i)\right]=x_{i}^{T}\beta$. The "actual link" is given by:

$$\log(\mu_i)=x_{i}^{T}\beta+z_{i}^{T}u_{i}$$

Where $z_{i}^{T}$ is a known set of covariates, and $u_i$ is a random effect for the $i$th observation with $E[u_{i}]=0$. Now if you were to set $z_{i}^{T}=1$ then you have a good definition of the residuals, given by the random effects $u_i$. All that is needed to do now is to assign a probability distribution to the random effects and do the maths/coding work to either give point estimates or posterior probability distributions for $u_{i}$ to use in your subsequent model. These are often called Generalised linear mixed models, and $u_i$ is often chosen to have a normal distribution.

The nlme package should be able to do this kind of modelling

Poisson Models – Error Metrics for Cross-Validating Poisson Distribution Models

There are a couple of proper and strictly proper scoring rules for count data you can use. Scoring rules are penalties $s(y,P)$ introduced with $P$ being the predictive distribution and $y$ the observed value. They have a number of desirable properties, first and foremost that a forecast that is closer to the true probability will always receive less penalty and there is a (unique) best forecast and that one is when the predicted probability coincides with the true probability. Thus minimizing the expectation of $s(y,P)$ means reporting the true probabilities. See also Wikipedia.

Often one takes an average of those over all predicted values as

$S=\frac{1}{n}\sum_{i=1}^n s(y^{(i)},P^{(i)})$

Which rule to take depends on your objective but I'll give a rough characterization when each is good to be used.

In what follows I use $f(y)$ for the predictive probability mass function $\Pr(Y=y)$ and $F(y)$ the predictive cumulative distribution function. A $\sum_k$ runs over the whole support of the count distribution (i.e, $0,1,\dots, \infty$). $I$ denotes an indicator function. $\mu$ and $\sigma$ are the mean and standard deviation of the predictive distribution (which are usually directly estimated quantities in count data models).

Strictly proper scoring rules

Brier Score: $s(y,P)=-2 f(y) + \sum_k f^2(k)$ (stable for size imbalance in categorical predictors)
Dawid-Sebastiani score: $s(y,P)=(\frac{y-\mu}{\sigma})^2+2\log\sigma$ (good for general predictive model choice; stable for size imbalance in categorical predictors)
Deviance score: $s(y,P)=-2\log f(y) + g_y$ ($g_y$ is a normalization term that only depends on $y$, in Poisson models it is usually taken as the saturated deviance; good for use with estimates from an ML framework)
Logarithmic score: $s(y,P)=-\log f(y)$ (very easily calculated; stable for size imbalance in categorical predictors)
Ranked probability score: $s(y,P)=\sum_k \{F(k)-I(y\leq k)\}^2$ (good for contrasting different predictions of very high counts; susceptible to size imbalance in categorical predictors)
Spherical score: $s(y,P)=\frac{f(y)}{\sqrt{\sum_k f^2(k)}}$ (stable for size imbalance in categorical predictors)

Other scoring rules (not so proper but often used)

Absolute error score: $s(y,P)=|y-\mu|$ (not proper)
Squared error score: $s(y,P)=(y-\mu)^2$ (not strictly proper; susceptible to outliers; susceptible to size imbalance in categorical predictors)
Pearson normalized squared error score: $s(y,P)=(\frac{y-\mu}{\sigma})^2$ (not strictly proper; susceptible to outliers; can be used for checking if model checking if the averaged score is very different from 1; stable for size imbalance in categorical predictors)

Example R code for the strictly proper rules:

library(vcdExtra)
m1 <- glm(Freq ~ mental, family=poisson, data=Mental) 

# scores for the first observation
mu <- predict(m1, type="response")[1]
x  <- Mental$Freq[1]

# logarithmic (equivalent to deviance score up to a constant) 
-log(dpois(x, lambda=mu))

# quadratic (brier)
-2*dpois(x,lambda=mu) + sapply(mu, function(x){ sum(dpois(1:1000,lambda=x)^2) })

# spherical
- dpois(x,mu) / sqrt(sapply(mu, function(x){ sum(dpois(1:1000,lambda=x)^2) }))

# ranked probability score
sum(ppois((-1):(x-1), mu)^2) + sum((ppois(x:10000,mu)-1)^2)

# Dawid Sebastiani
(x-mu)^2/mu + log(mu)

Best Answer

Related Solutions

Solved – How to analyze residuals of Poisson log-linear model

Poisson Models – Error Metrics for Cross-Validating Poisson Distribution Models

Related Question