Definition/statistics/calculations for “evenly spaced numbers”

descriptive statisticsprobability distributionsstatistics

It seems like this ought to have a name like "smoothness" or something similar, but smoothness relates to something entirely different, and I can't find terminology that applies to the following problem:

Consider a set $A$ containing 50 unique integers, with $1 \leq (a \in A) \leq 100$.

If the set contained all the even (or odd) integers between $1$ and $100$, then a reasonable observer would call the numbers evenly spaced within the interval. On the other hand, if the set were all the integers between $51$ and $100$ (or $1$ to $50$), this would be the worst possible "even spacing" on the interval.

Determining "perfect" even spacing for sets with cardinalities not a divisor of the interval is more difficult. How, for instance, do we space out a set of $31$ integers? It looks like we need $24$ gaps of $3$ and $7$ gaps of $4$–but how are those gaps spaced?

We can generalize this to any set of integers and any length of interval. My suspicion/guess is that we could define the "perfect spacing" as, say, the nearest-integer of the zeroes of

$$f(x) = 1 – \cos(\frac{|A| \tau}{L}x)$$

where $|A|$ is the cardinality of $A$ and $L$ is the length of the interval. But then, given a set of integers that aren't "perfectly spaced," how do we determine how close they are to evenly-spaced?

I imagine it's possible to map "how close to evenly spaced" to a value between $0$ and $1$. I also imagine I'm not the first to ask this sort of question. Is there a proper terminology for these sorts of questions? Is there a specific distribution I should be looking at?

Best Answer

I think @DavidDStork's comment is an idea worth considering; I would use the standard deviation instead of the variance. [See the note at the end for another possibility.]

If you have only odd numbers this 'index of evenness' will be zero. For less 'even' sequences this measure will be greater. [Computations in R.]

x = seq(1, 100, by= 2);  x
 [1]  1  3  5  7  9 11 13 15 17 19 21 23 25
[14] 27 29 31 33 35 37 39 41 43 45 47 49 51
[27] 53 55 57 59 61 63 65 67 69 71 73 75 77
[40] 79 81 83 85 87 89 91 93 95 97 99
diff(x)
 [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[21] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[41] 2 2 2 2 2 2 2 2 2
sd(diff(x))
[1] 0

If the 50 numbers are chosen at random without replacement, you might wonder about the distribution of this measure. How small does the measure have to be in order for the sequence to be called 'relatively even'?

The following simulation shows the distribution for 100,000 such random choices:

set.seed(2020)
sd.dif = replicate(10^5, sd(diff(sort(sample(1:100, 50)))))
summary(sd.dif)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.8165  1.2410  1.3534  1.3680  1.4777  2.9226 

hist(sd,dif, prob=T, col="skyblue2")

enter image description here .

So if you get an 'index of evenness' below $1$ you can be pretty sure the sequence of 50 wasn't randomly chosen.

Here are three randomly generated sequences and their indexes of smoothness:

a = sort(sample(1:100, 50)); a
 [1]   1   2   4   5   8   9  13  17  19  22
[11]  26  27  28  29  30  32  33  34  35  40
[21]  44  49  50  52  53  57  60  61  63  64
[31]  68  70  71  72  74  78  80  82  84  85
[41]  86  87  89  93  94  96  97  98  99 100
sd(diff(a))
[1] 1.266389

b = sort(sample(1:100, 50)); b
 [1]   1   2   3   4   5  10  13  14  15  16
[11]  17  21  25  27  28  33  34  35  36  38
[21]  41  42  43  44  48  55  57  59  60  62
[31]  63  64  67  68  69  70  71  72  75  76
[41]  79  84  86  92  93  94  95  96  99 100
sd(diff(b))
[1] 1.547711

c = sort(sample(1:100, 50)); c
 [1]   1   4   5   6   9  10  15  17  19  21
[11]  22  24  25  27  29  32  33  36  38  39
[21]  40  42  43  44  47  48  50  52  56  57
[31]  60  63  64  65  68  69  73  75  79  80
[41]  81  82  84  85  86  96  97  98  99 100
sd(diff(c))
[1] 1.561113

By Stork's criterion, sequence a is 'smoother' than c. If we plot the sequences below, then the diagonal line is smoother for a. [Note the big jump in the plot of c.]

par(mfrow=c(1,2))
 plot(a, pch=20); abline(0, 2, col="green2")
 plot(c, pch=20); abline(0, 2, col="green2")
par(mfrow=c(1,2))

enter image description here

Note: The plots may show another useful way to define 'smoothness'. How nearly to they match a straight line? One way to answer that is by looking at the correlation of the sequence with the vector of numbers from 1 through 50. A correlation nearer to $1$ indicates better fit to a straight line. By this criterion, c is smoother than a. [Note the wobble in the plot of a.]

So you need to think about which criterion best matches your personal view of 'smoothness'.

cor(1:50, a)
[1] 0.9974012
cor(1:50, c)
[1] 0.9980791
Related Question