Tweedie and Poisson Loss Functions – Their Use in XGBoost and Deep Learning Models

forecastingloss-functionsmachine learningpoisson distributiontweedie-distribution

I am looking at few competitions in kaggle where people used tweedie loss or poisson loss as objective function for forecasting sales or predicting insurance claims.

  1. Can someone please explain the use/need for using tweedie or poisson instead of the regular mean squared loss as objective.
  2. Is it because of the distribution of the response variable ?
  3. If the response is variable is positive and right skewed, should we always use tweedie or poisson instead of mean squared loss ?

Best Answer

I used to develop these models professionally for a major casualty insurer, and probably had a part in developing the data for one of the Kaggle competition's you're referencing. So I'm relatively well positioned for this question.

Can someone please explain the use/need for using Tweedie or poisson instead of the regular mean squared loss as objective.

The goal of these models is to price insurance contracts. I.e., we want to know, for a customer who as purchased an insurance contract, how much our company will pay out in total claim costs for the customer. So let's let $X$ denote all the measurements we have for a single customer we've insured.

There are two possibilities for what happens over the life of the contract:

  1. The insured files no claims. In this case the company pays out nothing. Let's call $F$ the random variable counting the number of claims filed by the insured over the contract period. This is often assumed to be poisson distributed, as a decent approximation. In the jargon of the industry, this random variable is called the frequency.

  2. The insured files at least one claim. Then, for each claim, a random amount is payed out by our company. Let's denote the amount payed out for the $i$'th claim $S_i$. This is a continuous random variable with a heavy right tail. It is often assumed these are gamma distributed, because the shape is intuitively reasonable. In the jargon of the industry, these are called the severity.

Putting that all together, the amount payed out over the insurance contract is a random variable:

$$Y \mid X = \sum_{i \sim F} S_i $$

This is a funny little equation, but basically there is a random number of summands, according the the frequency $F$, and each summand $S_i$ is a random claim amount (for a single claim).

If $P$ is poisson, and each $S_i$ is a gamma distribution, this is the Tweedie distribution. Reasonable assumptions lead to a parametric assumption that $Y \mid X$ is Tweedie distributed.

Is it because of the distribution of the response variable ?

As noted above, sort of. It's actually the conditional distribution of the response variable (so $Y \mid X$, not the marginal $Y$), which we never really observe. Some features of the conditional distributions manifest in the marginal, like the large point mass at zero.

If the response is variable is positive and right skewed, should we always use Tweedie or poisson instead of mean squared loss ?

Nope. It's the conditional distribution $Y \mid X$ that guides the choice of loss function, which often comes from thought and imagination like the above. The (marginal) distribution of $Y$ can be skew even if the conditional distributions $Y \mid X$ is symmetric. For example:

$$ X \sim \text{Poisson}(\lambda = 1.0) $$ $$ Y \mid X \sim \text{Normal}(\mu = X, \sigma = 1.0) $$

Will lead to a right skew marginal distribution of $Y$, but the least squares loss is exactly correct to use.

Is the sales forecasting same as the claims example - where each sale is poisson and sale amount is gamma distributed?

I haven't done any projects in this area, but that sounds like a reasonable approach.

Can you please explain, how/why claim amount follows gamma distribution.

There's no magic here, there's no principled theory about claims distributions. Roughly, it has the correct shape: it's positively supported (i.e. $P(G \leq 0) = 0$), it's unimodal, and it has a positive skew; and it leads to mathematically tractable models. That's about it, it's just a reasonable choice which has worked well for a long time.