Solved – the intuition behind the expected transaction value for a customer in the gamma-gamma model

gamma distributionintuitionmarketingpredictive-models

Background and Motivation: I was reading the paper RFM and CLV: Using Iso-Value Curves for Customer Base Analysis by Peter S. Fader, Bruce G. S. Hardie and Ka Lok Lee, in an attempt to gain some intuition behind the methods available in software libraries such as lifetimes and BTYD. The paper presents a way to incorporate RFM models in customer lifetime value calculations by using the gamma-gamma model for spend per transaction.

Problem: In the paper, the following formula, to calculate the expected average transaction value for a customer with an average spend of $m_x$ across $x$ transactions, is presented in Eq 4 (p.12):

$$
\begin{align}
\mathbb{E}(M\mid p, q, \gamma, m_x, x) & = \frac{(\gamma + m_xx)p}{px+q-1}\\
& = \bigg(\frac{q-1}{px+q-1}\bigg)\frac{\gamma p}{q-1}+\bigg(\frac{px}{px+q-1}\bigg)m_x\\
\end{align}
$$

I understand that the $q$ parameter is used as the shape parameter in the underlying gamma distribution that models the heterogeneity in mean transaction values across customers, but I'm unable to grasp its effect on the expected average transaction value for a given customer.

Namely, I was able to fit a gamma-gamma model with $q < 1$ and the expected average transaction value for certain customers evaluates to a negative value.

Question: Does the fact that I get negative values for expected transaction values expose some issues with the underlying dataset used to fit the model?

Do models with $q < 1$ make sense, and if so, what should I interpret from a negative expected average transaction value?

Best Answer

This is a (super) late answer, but I myself was looking for some information related to gamma-gamma models for monetary value, and came across this. The short answer is yes, the negative values for expected transaction values exposes issues with the underlying dataset used to fit the model.

In case it is helpful for you or others with similar questions, I'll try to illustrate why it's concerning to have $q<1$. The purpose of these spend models is to understand observed spend per transaction with the goal of predicting future spend per transaction at the individual level. The use of a gamma distribution was first proposed by Colombo and Jiang (1999) and was motivated by the observation that if transactions are distributed normal, then 1) it is not bounded below by $0$ for any choice of mean and variance parameters, and 2) you get symmetric spend distributions, when the observed data consistently appears to be right skewed.

Following the paper you refer to, a customer with $x$ transactions values $z_1,\dots,z_x$ is modeled such that $z_i \sim \text{Gamma}(p,\nu),$ and we allow for heterogeneity across customers by also having that $\nu \sim \text{Gamma}(q,\gamma)$. A key observation is that conditional on $p$ and $\nu$, a customer's mean transaction value $\delta$ is $\delta = p/\nu$. Now $\nu$ varies across customers, so you may want to know what the mean transaction value $\delta$ is across all individuals. Denote this random variable $D$. It can be shown that $$E[D|p,q,\gamma] = \frac{p\gamma}{q-1}$$

which says that the mean transaction value for customers is $\frac{p\gamma}{q-1}$ (showing this is a bit involved, but the way to do it is to derive the distribution and show it is an inverse-gamma distribution with specific parameters and find the expected value given that). In any gamma distribution, the parameters are strictly positive, so $p>0,\gamma >0$, so if you have $q<1$, then it must be that the expected transaction value across individuals is negative.

This should be pause for concern: why is the expected transaction value negative? You can try to validate this by thinking of compensating individuals for each transaction, but this is quite odd and there are other models if this is the kind of situation you are dealing with, and so the fact that your model finds $q<1$ should immediately raise some serious concerns for this reason alone.

As a final point, I think it's nice to better understand

$$ \begin{align} \mathbb{E}(M\mid p, q, \gamma, m_x, x) & = \frac{(\gamma + m_xx)p}{px+q-1}\\ & = \bigg(\frac{q-1}{px+q-1}\bigg)\frac{\gamma p}{q-1}+\bigg(\frac{px}{px+q-1}\bigg)m_x\\ \end{align} $$

as noting that it is simply the weighted average of the population mean transaction value $E[D|p,q,\gamma] = \frac{p\gamma}{q-1}$ and the observed average transaction value $m_x = \frac{1}{x}\sum_{i=1}^x z_i$ of a given customer, and the weightings can be fully understood from a bayesian framework as having a prior (the mean average transaction value), and the weight you place on it goes down as you observe more data $x$ on an given individual!