Suppose the population, from which we assume you are sampling randomly, contains proportions $p_1$ of promoters, $p_0$ of passives, and $p_{-1}$ of detractors, with $p_1+p_0+p_{-1}=1$. To model the NPS, imagine filling a large hat with a huge number of tickets (one for each member of your population) labeled $+1$ for promoters, $0$ for passives, and $-1$ for detractors, in the given proportions, and then drawing $n$ of them at random. The sample NPS is the average value on the tickets that were drawn. The true NPS is computed as the average value of all the tickets in the hat: it is the expected value (or expectation) of the hat.
A good estimator of the true NPS is the sample NPS. The sample NPS also has an expectation. It can be considered to be the average of all the possible sample NPS's. This expectation happens to equal the true NPS. The standard error of the sample NPS is a measure of how much the sample NPS's typically vary between one random sample and another. Fortunately, we do not have to compute all possible samples to find the SE: it can be found more simply by computing the standard deviation of the tickets in the hat and dividing by $\sqrt{n}$. (A small adjustment can be made when the sample is an appreciable proportion of the population, but that's not likely to be needed here.)
For example, consider a population of $p_1=1/2$ promoters, $p_0=1/3$ passives, and $p_{-1}=1/6$ detractors. The true NPS is
$$\mbox{NPS} = 1\times 1/2 + 0\times 1/3 + -1\times 1/6 = 1/3.$$
The variance is therefore
$$\eqalign{
\mbox{Var(NPS)} &= (1-\mbox{NPS})^2\times p_1 + (0-\mbox{NPS})^2\times p_0 + (-1-\mbox{NPS})^2\times p_{-1}\\
&=(1-1/3)^2\times 1/2 + (0-1/3)^2\times 1/3 + (-1-1/3)^2\times 1/6 \\
&= 5/9.
}$$
The standard deviation is the square root of this, about equal to $0.75.$
In a sample of, say, $324$, you would therefore expect to observe an NPS around $1/3 = 33$% with a standard error of $0.75/\sqrt{324}=$ about $4.1$%.
You don't, in fact, know the standard deviation of the tickets in the hat, so you estimate it by using the standard deviation of your sample instead. When divided by the square root of the sample size, it estimates the standard error of the NPS: this estimate is the margin of error (MoE).
Provided you observe substantial numbers of each type of customer (typically, about 5 or more of each will do), the distribution of the sample NPS will be close to Normal. This implies you can interpret the MoE in the usual ways. In particular, about 2/3 of the time the sample NPS will lie within one MoE of the true NPS and about 19/20 of the time (95%) the sample NPS will lie within two MoEs of the true NPS. In the example, if the margin of error really were 4.1%, we would have 95% confidence that the survey result (the sample NPS) is within 8.2% of the population NPS.
Each survey will have its own margin of error. To compare two such results you need to account for the possibility of error in each. When survey sizes are about the same, the standard error of their difference can be found by a Pythagorean theorem: take the square root of the sum of their squares. For instance, if one year the MoE is 4.1% and another year the MoE is 3.5%, then roughly figure a margin of error around $\sqrt{3.5^2+4.1^2}$ = 5.4% for the difference in those two results. In this case, you can conclude with 95% confidence that the population NPS changed from one survey to the next provided the difference in the two survey results is 10.8% or greater.
When comparing many survey results over time, more sophisticated methods can help, because you have to cope with many separate margins of error. When the margins of error are all pretty similar, a crude rule of thumb is to consider a change of three or more MoEs as "significant." In this example, if the MoEs hover around 4%, then a change of around 12% or larger over a period of several surveys ought to get your attention and smaller changes could validly be dismissed as survey error. Regardless, the analysis and rules of thumb provided here usually provide a good start when thinking about what the differences among the surveys might mean.
Note that you cannot compute the margin of error from the observed NPS alone: it depends on the observed numbers of each of the three types of respondents. For example, if almost everybody is a "passive," the survey NPS will be near $0$ with a tiny margin of error. If the population is polarized equally between promoters and detractors, the survey NPS will still be near $0$ but will have the largest possible margin of error (equal to $1/\sqrt{n}$ in a sample of $n$ people).
Unless there is a huge imbalance resulting in almost no Promoters or no Detractors, a t-test should work fine.
Specifically, the NPS method reduces the data to a set of $-1,0,1$ values (representing "Detractors," "Passives," and "Promoters," respectively). In a given dataset $\mathcal{S}$ of $n$ values let the count of the value $x$ be $n_x.$ The NPS is the mean value,
$$NPS_{\,\mathcal{S}} = \frac{1}{n}\left(n_{-1} + n_0 + n_1\right)$$
and its sample variance is an adjusted mean squared difference
$$s_\mathcal{S}^2 = \frac{1}{n-1}\left(n_{-1}(-1-NPS_\mathcal{S})^2 + n_0(0-NPS_\mathcal{S})^2 + n_1(1-NPS_\mathcal{S})^2\right).$$
As explained at https://stats.stackexchange.com/a/18609/919, the square of the standard error (there referred to as "margin of error") is the sample variance divided by the sample size,
$$\operatorname{se}_\mathcal{S}^2 = \frac{s^2_\mathcal{S}}{n}.$$
Given two such sets of data to compare, say $A$ and $B$, the difference in their NPSes is $NPS_A-NPS_B$ and the squared standard error of that difference is $\operatorname{se}_A^2 + \operatorname{se}_B^2.$ The Student $t$ statistic is the ratio of the difference to its standard error,
$$t = \frac{NPS_A - NPS_B}{\sqrt{\operatorname{se}_A^2 + \operatorname{se}_B^2}}.$$
Because we have assumed a situation where there are some promoters or some detractors, the denominator is nonzero, so $t$ is well-defined. The only issue is how to interpret it.
When the size of $t$ is "large," we say the difference in NPS is "significant" and conclude there is some cause for this difference other than sampling error. The only issue concerns the determination of how large is "large." The Student t-test uses quantiles of a Student t distribution with $n_A-1 + n_B-1$ degrees of freedom to determine what is a "large" value of $t$ for any given level of statistical risk $\alpha$ you care to specify. This risk is the chance that two random samples from populations with equal NPSes will produce a "large" value of $t,$ thereby causing you incorrectly to conclude there's a difference in NPS.
The "critical value," or threshold value to determine what "large" means, is the $1-\alpha/2$ quantile of the appropriate Student $t$ distribution.
Let's work an example. Suppose group $A$ has $n_{-1}=2$ Detractors, $n_0=8$ Passives, and $n_1=10$ Promoters for a total of $n=20.$ Its NPS is $NPS_A = (-2 + 10)/20 = 0.4$ (the same as $40\%$ if you prefer to express values as percents) and its variance is $$s^2_A = (2(-1-0.4)^2 + 8(0-0.4)^2 + 10(1-0.4)^2)/19 = 0.463.$$
Similarly, let group $B$ have $5$ Detractors, $20$ Passives, and $5$ Promoters, for a total of $30.$ The balance of Detractors and Promoters shows $NPS_B$ is zero. Its variance is $s_B^2=0.345.$ Thus, the t statistic for comparing these groups is
$$t = \frac{0.4 - 0} {\sqrt{0.463/20 + 0.345/30}} = 2.15.$$
Its size is $|t|=2.15.$ To determine how large this is, we refer to the Student $t$ distribution with $20-1 + 30-1 = 48$ degrees of freedom. It assigns a chance of $3.7\%$ to a value this large. This is the "p-value" of the t-test. If your risk threshold is only, say, $\alpha=5\%,$ then because the p-value is less than the threshold you will conclude this is a significant difference. If your risk threshold is smaller, say $\alpha=1\%,$ then because the p-value is greater than the threshold you will not conclude the observed difference in the samples is significant evidence of a real difference in the population represented by those samples.
Simulation studies indicate the use of the Student $t$ distribution works well when each group has at least 20 people. It also works when there are huge differences in NPS between the groups, where the conclusion to make is obvious. For smaller groups with similar NPSes or where there are extreme imbalances, you should mistrust the p-value. In such circumstances conduct a permutation test or collect more data.
For greater insight, pay attention to the variances: even when the groups have comparable NPSes and those do not differ "significantly," if one of the groups has a much larger variance you might want to take that polarization of your customers into consideration. For instance, a group of $20$ Passives and another group comprised of $10$ Detractors and $10$ Promoters will have identical NPSes of $0,$ whence a $t$ statistic of $0$ (which is never "significant" for any $\alpha$), yet there is a clear difference in how those groups are reacting to your product. This failure to account for the variance in evaluating customers is, IMHO, the chief drawback of using the NPS.
Best Answer
NPS isn't statistics but an attempt to measure perception of a company, product or service. So too, my answer isn't statistical but psychological or practical.
Don't scale down.
NPS' scale is deliberately 0 - 10 to allow both 9 and 10 to be promoters. There are many people who will refuse to give a "perfect" score but who are avid promoters. So you need both 9 and 10.
If you must go 1 to 5, then you need 4 and 5 to be promoters. That means 40% are now in the promoter range rather than 18%, but NPS isn't concerned with that*.
At the same time, 0 - 6 detractors become 1 - 3 by scale and we're left with no "passives". So do we go with 1 - 2 then? No. Don't do that. Just get rid of passives. You need more detractor options than promoter options because detractors carry more weight. To get a net-positive score you need significantly more promoters.
Imagine you go to a conference and you speak to five people about some company. Two say "it's great". Two say "it sucks". With the fifth person scoring "3" (passive) you now have equal promoters and detractors. But psychologically you're much more likely to feel negative about the company. Is a neutral score (40% - 40% = 0 NPS) therefore valid? No.
So let's put person number five's score of '3' in the detractors. Now our score is 40% - 60% = -20 NPS. A negative score. Much more representative of actual results of the opinions of those surveyed.
* However you do it, the concept remains. If you change it from 0-10 to 1-5, you can't compare to a competitor that uses 0-10 and just scale. There's more psychology than statistics at play here. What you can (and should) do is compare it to your results using 1-5 from your last survey. And that's what ultimately matters: has our market perception gone up or down?