Solved – How to calculate margin of error in a NPS (Net Promoter Score) result

hypothesis testingmultinomial-distributionnet-promoter-scorestandard errorstatistical significance

I'll let Wikipedia explain how NPS is calculated:

The Net Promoter Score is obtained by asking customers a single
question on a 0 to 10 rating scale, where 10 is "extremely likely" and
0 is "not at all likely": "How likely is it that you would recommend
our company to a friend or colleague?" Based on their responses,
customers are categorized into one of three groups: Promoters (9–10
rating), Passives (7–8 rating), and Detractors (0–6 rating). The
percentage of Detractors is then subtracted from the percentage of
Promoters to obtain a Net Promoter score (NPS). NPS can be as low as
-100 (everybody is a detractor) or as high as +100 (everybody is a promoter).

We have been running this survey periodically for several years. We get several hundred responses each time. The resulting score has varied by 20-30 points over the course of time. I'm trying to figure out which score movements are significant, if any.

If that simply proves too difficult, I'm also interested in trying to figure out the margin of error on the basics of the calculation. What's the margin of error of each "bucket" (promoter, passive, detractor)? Maybe even, what's the margin of error if I just look at the mean of the scores, reducing the data to just one number per survey run? Would that get me anywhere?

Any ideas here are helpful. Except "don't use NPS." That decision is outside my ability to change!

Best Answer

Suppose the population, from which we assume you are sampling randomly, contains proportions $p_1$ of promoters, $p_0$ of passives, and $p_{-1}$ of detractors, with $p_1+p_0+p_{-1}=1$. To model the NPS, imagine filling a large hat with a huge number of tickets (one for each member of your population) labeled $+1$ for promoters, $0$ for passives, and $-1$ for detractors, in the given proportions, and then drawing $n$ of them at random. The sample NPS is the average value on the tickets that were drawn. The true NPS is computed as the average value of all the tickets in the hat: it is the expected value (or expectation) of the hat.

A good estimator of the true NPS is the sample NPS. The sample NPS also has an expectation. It can be considered to be the average of all the possible sample NPS's. This expectation happens to equal the true NPS. The standard error of the sample NPS is a measure of how much the sample NPS's typically vary between one random sample and another. Fortunately, we do not have to compute all possible samples to find the SE: it can be found more simply by computing the standard deviation of the tickets in the hat and dividing by $\sqrt{n}$. (A small adjustment can be made when the sample is an appreciable proportion of the population, but that's not likely to be needed here.)

For example, consider a population of $p_1=1/2$ promoters, $p_0=1/3$ passives, and $p_{-1}=1/6$ detractors. The true NPS is

$$\mbox{NPS} = 1\times 1/2 + 0\times 1/3 + -1\times 1/6 = 1/3.$$

The variance is therefore

$$\eqalign{ \mbox{Var(NPS)} &= (1-\mbox{NPS})^2\times p_1 + (0-\mbox{NPS})^2\times p_0 + (-1-\mbox{NPS})^2\times p_{-1}\\ &=(1-1/3)^2\times 1/2 + (0-1/3)^2\times 1/3 + (-1-1/3)^2\times 1/6 \\ &= 5/9. }$$

The standard deviation is the square root of this, about equal to $0.75.$

In a sample of, say, $324$, you would therefore expect to observe an NPS around $1/3 = 33$% with a standard error of $0.75/\sqrt{324}=$ about $4.1$%.

You don't, in fact, know the standard deviation of the tickets in the hat, so you estimate it by using the standard deviation of your sample instead. When divided by the square root of the sample size, it estimates the standard error of the NPS: this estimate is the margin of error (MoE).

Provided you observe substantial numbers of each type of customer (typically, about 5 or more of each will do), the distribution of the sample NPS will be close to Normal. This implies you can interpret the MoE in the usual ways. In particular, about 2/3 of the time the sample NPS will lie within one MoE of the true NPS and about 19/20 of the time (95%) the sample NPS will lie within two MoEs of the true NPS. In the example, if the margin of error really were 4.1%, we would have 95% confidence that the survey result (the sample NPS) is within 8.2% of the population NPS.

Each survey will have its own margin of error. To compare two such results you need to account for the possibility of error in each. When survey sizes are about the same, the standard error of their difference can be found by a Pythagorean theorem: take the square root of the sum of their squares. For instance, if one year the MoE is 4.1% and another year the MoE is 3.5%, then roughly figure a margin of error around $\sqrt{3.5^2+4.1^2}$ = 5.4% for the difference in those two results. In this case, you can conclude with 95% confidence that the population NPS changed from one survey to the next provided the difference in the two survey results is 10.8% or greater.

When comparing many survey results over time, more sophisticated methods can help, because you have to cope with many separate margins of error. When the margins of error are all pretty similar, a crude rule of thumb is to consider a change of three or more MoEs as "significant." In this example, if the MoEs hover around 4%, then a change of around 12% or larger over a period of several surveys ought to get your attention and smaller changes could validly be dismissed as survey error. Regardless, the analysis and rules of thumb provided here usually provide a good start when thinking about what the differences among the surveys might mean.

Note that you cannot compute the margin of error from the observed NPS alone: it depends on the observed numbers of each of the three types of respondents. For example, if almost everybody is a "passive," the survey NPS will be near $0$ with a tiny margin of error. If the population is polarized equally between promoters and detractors, the survey NPS will still be near $0$ but will have the largest possible margin of error (equal to $1/\sqrt{n}$ in a sample of $n$ people).