Pareto/NBD Model – Is it Possible to Understand the Pareto/NBD Model Conceptually?

distributionsgamma distributionmarketingpareto-distribution

I am learning to use BTYD package that uses Pareto/NBD model to predict when will be a customer is expected to be back. However, all literature on this model is full of mathematics and there does not appear to be a simple/conceptual explanation of the workings of this model.
Is it possible to understand Pareto/NBD model for non-mathmeticians?
I have gone through this famous paper by Fader . Pareto/NBD model makes the following assumptions:

i. While active, the number of transactions made by a customer in a time period of length t is distributed Poisson with transaction rate λ.

ii. Heterogeneity in transaction rates across customers follows a gamma distribution with shape parameter r and scale parameter α.

iii. Each customer has an unobserved “lifetime” of length τ. This point at which the customer becomes inactive is distributed exponential with dropout rate µ.

iv) Heterogeneity in dropout rates across customers follows a gamma distribution with shape parameter s and scale parameter β.

v. The transaction rate λ and the dropout rate µ vary independently across customers."

I do not understand the (intuition behind) rationale of assumptions (ii), (iii) and (iv). Why only these distributions, why not others?

Also BG/NBD model assumptions are:

i.) While active, the number of transactions made by a customer follows a Poisson process with transaction rate λ. This is equivalent to assuming that the time between transactions is distributed exponential with transaction rate λ

ii) Heterogeneity in λ follows a gamma distribution

iii) After any transaction, a customer becomes inactive with probability p. Therefore the point at which the customer “drops out” is distributed across transactions according to a (shifted) geometric distribution with pmf

iv) Heterogeneity in p follows a beta distribution

The (intuitive) rationality of assumptions (ii), (iii) and (iv) are also not at all obvious.

I shall be grateful for any help. Thanks.

Best Answer

Imagine you're the newly appointed manager of a flower shop. You've got a record of last year's customers – the frequency with which they shop and how long since their last visit. You want to know how much business the listed customers are likely to bring in this year. There are a few things to consider:

[assumption (ii)] Customers have different shopping habits.

Some people like having fresh flowers all the time, while others only by them on special occasions. It makes more sense to have a distribution for the transaction rate $\lambda$, rather than assuming that a single $\lambda$ explains everyone’s behaviour.

The distribution needs to have few parameters (you don’t necessarily have a lot of data), to be fairly flexible (you’re presumably not a mind-reading entrepreneurial guru and don’t know all about shopping habits), and to take values in the positive real numbers. The Gamma distribution ticks all of those boxes, and is well-studied and relatively easy to work with. It’s often used as a prior for positive parameters in different settings.

[assumption (iii)] You might have already lost some of the customers on the list.

If Andrea has bought flowers about once a month every month in the last year, it’s a fairly safe bet she’ll be returning this year. If Ben used to buy flowers weekly, but he hasn’t been around for months, then maybe he’s found a different flower shop. In making future business plans, you might want to count on Andrea but not on Ben.

Customers won’t tell you when they’ve moved on, which is where the “unobserved lifetime” assumption kicks in for both models. Imagine a third customer, Cary. The Pareto/NBD and BG/NBD models give you two different ways to think about Cary dropping out of the shop for good.

For the Pareto/NBD case, imagine that at any point in time, there is a small chance that Cary might come across a better shop than yours. This constant infinitesimal risk gives you the exponential lifetime – and the longer it’s been since Cary’s last visit, the longer he’s been exposed to other (potentially better) flower shops.

The BG/NBD case is a little more contrived. Every time Cary arrives in your shop, he’s committed to buying some flowers. While browsing, he’ll consider the changes in price, quality and variety since his last visit, and that will ultimately make him decide whether to come back again next time, or look for another shop. So rather than being constantly at risk, Cary has some probability p of just deciding to leave after each purchase.

[assumption (iv)] Not all customers are equally committed to your shop.

Some customers are regulars, and only death – or a sharp price increase – will force them to leave. Others might like to explore, and would happily leave you for the sake of the new hipster flower shop across the street. Rather than a single drop-out rate for all customers, it makes more sense to have a distribution of drop-out rates (or probabilities in the BG/NBD case).

This works very much in the same vein as the shopping habits. We’re after a flexible, well-established distribution with few parameters. In the Pareto/NBD case we use a Gamma, since the rate $\mu$ is in the positive real numbers. In the BG/NBD case we use a Beta, which is the standard prior for parameters in $(0; 1)$.

I hope this helps. Have a look at the original paper (Schmittlein et al., 1987) if you haven't already -- they go through some of the intuition there.