Solved – Checking if data come from a Poisson Process

distributionsfittinghypothesis testing

In marketing, people assume that individual customer purchases are distributed according to a poisson distribution. This may be an incorrect assumption, however all models are wrong but some are useful.

I've managed to find some customer transaction data housing that contains the customer ID and date of purchase as rows for several thousand customers (available here). Here is a sample of the data:

CustomerID  InvoiceDate
17381.0 2011-10-04 08:30:00
15749.0 2011-04-18 13:08:00
14371.0 2011-02-17 16:35:00
18226.0 2011-07-03 10:47:00
14012.0 2011-03-07 12:26:00
13144.0 2011-01-11 11:15:00
12852.0 2011-02-18 08:47:00
12901.0 2011-06-23 16:00:00
13854.0 2011-03-21 11:20:00
13158.0 2011-01-25 09:58:00

The data is publicly available, so I don't think there are any issues posting it here.

I can group the customers and count their transactions for any period of time. I'd like to check how well this assumption of being poisson distributed is.

Could I just count the transactions for a given customer weekly, and then use the chi-square goodness of fit test to determine if the data could have been distributed according to a poisson distribution? Am I missing something crucial in my assumptions?

Best Answer

If you have data from a Poisson process with parameter $\lambda$, then the inter-arrival times $$ X_1, \ldots, X_n \overset{iid}{\sim} \text{Exponential}(\lambda). $$ You get these data points by just taking the difference of the column of time stamps. Then you can just look at a qqplot (check this thread for R code.)

Another idea: bin the data into buckets. Then run a Poisson regression on a few polynomials in time. If any of the significance tests reject the null, you can't assume that it's a (homogeneous) Poisson process.