Solved – How to perform a t-test (or another hypothesis test) on NPS (Net Promoter Score) results

hypothesis testingnet-promoter-scoret-test

Here is an explanation of how NPS is calculated:

http://en.wikipedia.org/wiki/Net_promoter_score

I'm interested in testing two net promoter scores to determine if they are statistically different. I read a great answer to calculating margin of error for NPS (see link below), but I'm really interested in testing to see if there is a difference between two scores, because I suspect that our results aren't as "different" from year to year as they appear to be.

How can I calculate margin of error in a NPS (Net Promoter Score) result?

Is this at all possible? I understand t-tests are typically used to test whether two different record sets are statistically different. But is it possible to test Net Promoter Scores, either with a t-test or some other hypothesis test?

Any ideas you have would be a great help. Thank you!

Best Answer

Unless there is a huge imbalance resulting in almost no Promoters or no Detractors, a t-test should work fine.

Specifically, the NPS method reduces the data to a set of $-1,0,1$ values (representing "Detractors," "Passives," and "Promoters," respectively). In a given dataset $\mathcal{S}$ of $n$ values let the count of the value $x$ be $n_x.$ The NPS is the mean value,

$$NPS_{\,\mathcal{S}} = \frac{1}{n}\left(n_{-1} + n_0 + n_1\right)$$

and its sample variance is an adjusted mean squared difference

$$s_\mathcal{S}^2 = \frac{1}{n-1}\left(n_{-1}(-1-NPS_\mathcal{S})^2 + n_0(0-NPS_\mathcal{S})^2 + n_1(1-NPS_\mathcal{S})^2\right).$$

As explained at https://stats.stackexchange.com/a/18609/919, the square of the standard error (there referred to as "margin of error") is the sample variance divided by the sample size,

$$\operatorname{se}_\mathcal{S}^2 = \frac{s^2_\mathcal{S}}{n}.$$

Given two such sets of data to compare, say $A$ and $B$, the difference in their NPSes is $NPS_A-NPS_B$ and the squared standard error of that difference is $\operatorname{se}_A^2 + \operatorname{se}_B^2.$ The Student $t$ statistic is the ratio of the difference to its standard error,

$$t = \frac{NPS_A - NPS_B}{\sqrt{\operatorname{se}_A^2 + \operatorname{se}_B^2}}.$$

Because we have assumed a situation where there are some promoters or some detractors, the denominator is nonzero, so $t$ is well-defined. The only issue is how to interpret it.

When the size of $t$ is "large," we say the difference in NPS is "significant" and conclude there is some cause for this difference other than sampling error. The only issue concerns the determination of how large is "large." The Student t-test uses quantiles of a Student t distribution with $n_A-1 + n_B-1$ degrees of freedom to determine what is a "large" value of $t$ for any given level of statistical risk $\alpha$ you care to specify. This risk is the chance that two random samples from populations with equal NPSes will produce a "large" value of $t,$ thereby causing you incorrectly to conclude there's a difference in NPS.

The "critical value," or threshold value to determine what "large" means, is the $1-\alpha/2$ quantile of the appropriate Student $t$ distribution.

Let's work an example. Suppose group $A$ has $n_{-1}=2$ Detractors, $n_0=8$ Passives, and $n_1=10$ Promoters for a total of $n=20.$ Its NPS is $NPS_A = (-2 + 10)/20 = 0.4$ (the same as $40\%$ if you prefer to express values as percents) and its variance is $$s^2_A = (2(-1-0.4)^2 + 8(0-0.4)^2 + 10(1-0.4)^2)/19 = 0.463.$$

Similarly, let group $B$ have $5$ Detractors, $20$ Passives, and $5$ Promoters, for a total of $30.$ The balance of Detractors and Promoters shows $NPS_B$ is zero. Its variance is $s_B^2=0.345.$ Thus, the t statistic for comparing these groups is

$$t = \frac{0.4 - 0} {\sqrt{0.463/20 + 0.345/30}} = 2.15.$$

Its size is $|t|=2.15.$ To determine how large this is, we refer to the Student $t$ distribution with $20-1 + 30-1 = 48$ degrees of freedom. It assigns a chance of $3.7\%$ to a value this large. This is the "p-value" of the t-test. If your risk threshold is only, say, $\alpha=5\%,$ then because the p-value is less than the threshold you will conclude this is a significant difference. If your risk threshold is smaller, say $\alpha=1\%,$ then because the p-value is greater than the threshold you will not conclude the observed difference in the samples is significant evidence of a real difference in the population represented by those samples.

Simulation studies indicate the use of the Student $t$ distribution works well when each group has at least 20 people. It also works when there are huge differences in NPS between the groups, where the conclusion to make is obvious. For smaller groups with similar NPSes or where there are extreme imbalances, you should mistrust the p-value. In such circumstances conduct a permutation test or collect more data.


For greater insight, pay attention to the variances: even when the groups have comparable NPSes and those do not differ "significantly," if one of the groups has a much larger variance you might want to take that polarization of your customers into consideration. For instance, a group of $20$ Passives and another group comprised of $10$ Detractors and $10$ Promoters will have identical NPSes of $0,$ whence a $t$ statistic of $0$ (which is never "significant" for any $\alpha$), yet there is a clear difference in how those groups are reacting to your product. This failure to account for the variance in evaluating customers is, IMHO, the chief drawback of using the NPS.

Related Question