Solved – Tricky Question re Customer Purchase Probabilities

distributionsprobability

I would like to calculate some parameters relating to customer purchasing in a retail situation.

I have some basic information which I can use:

Customer visit frequency in the form of probability distribution (I can generate Excel poisson tables using average visit frequency and these work well) for number of customer visits in a given period (1 month), e.g.:

0 Customers: 14%
1 Customer: 27%
2 Customers: 27%
3 Customers: 18%
4 Customers: 9%
5 Customers: 4%
6 Customers: 1%
…. etc

Customer purchase quantity per visit (based on observation), e.g.:

1 unit: probability = 60%
2 units: probability = 25%
3 units: probability = 10%
4 units: probability = 4%
5 units: probability = 1%
(max 5 units)

The data points above are provided by way of example but will differ from case to case.

I would like to calculate the probability of a total of 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11…. etc, units being purchased in the given time period. I think I can do this if the max number of customers is very small (2 or less!) but am struggling to see how to achieve this in a more general case where the numbers of potential customers n are larger! Once I know how to do the calculation, I would like to implement this in Excel.

To make this clearer: Case for 1 customer is simple, being 5 options (using my data: probability of 1 customer is 27% who is likely to purchase 1, 2, 3, 4 or 5 units with probabilities 60%, 25%, 10%, 4% ,1%). I can then calculate probability of selling 1, 2, 3, 4, or 5 units by multiplying.

The case for n = 2 customers has 25 options (I think) (customer 1 with 5 options x customer 2 with 5 options). For this case I may sell 2,3,4,5,6,7,8,9, or 10 units and I can multiply and sum the probabilities manually. This result would then be added to the result for 1 customer.

However, with the case for n = 3 customers there would be 125 options and the adding and multiplying already starts to become hairy. In practice, my customer visit table may extend to many more than 3 customers so the problem quickly becomes difficult to manage without having some kind of general formula.

Best Answer

what you are looking for is the sum of independent random variables. This can be calculated either by convolution or by using moment generating functions.

discrete convolution formula so in excel you should be able to do it in a spreadsheet or in vba: basically you split the problem up- calculate sum of 2 variables-> then you repeat with pdf of sum of 2 variables and next variable

or you calculate the mgf of #objects purchased with 1 customer( see calculation for discrete probability mass in wikipedia article) then you use the 'sum of independent RVs is product of mgfs' to get the mgf for n customers. finally you get the distribution of n customers purchases by inverting the mgf (which will again be a discrete distribution). wiki:mgf

Updated. So what I am suggesting is to use the discrete convolution formula for $Z=X+Y$. You have worked out how to calculate it for two customers - and all I am suggesting is just do that repeatedly. Take two customers, then $X$ is customer 1 and $Y$ is customer 2, and use the formula to calculate the pdf Z for 2 customers. Now to calculate the pdf for 3 customers you use the pdf for 2 customers which you just calculated ( call that now X) and the pdf of 1 customer purchase (Y) in the same formula. so in terms of VBA you might write a function with the following declaration :conv(pdf_x, pdf_y) which given 2 2-D arrays (1 column for quantity and 1 for probability) would produce a new 2-d array pdf_z. then you would call it repeatedly to get the pdfs for 3,4,5,6,... etc

$P(Z=z)=\sum_{x=0}^z f_X(x)f_Y(z-x)$

Related Question