The first thing you have to understand is that notes are not uniquely defined. Everything depends on what tuning you use. I'll assume we're talking about equal temperament here. In equal temperament, a half-step is the same as a frequency ratio of $\sqrt[12]{2}$; that way, twelve half-steps makes up an octave. Why twelve?
At the end of the day, what we want out of our musical frequencies are nice ratios of small integers. For example, a perfect fifth is supposed to correspond to a frequency ratio of $3 : 2$, or $1.5 : 1$, but in equal temperament it doesn't; instead, it corresponds to a ratio of $2^{ \frac{7}{12} } : 1 \approx 1.498 : 1$. As you can see, this is not a fifth; however, it is quite close.
Similarly, a perfect fourth is supposed to correspond to a frequency ratio of $4 : 3$, or $1.333... : 1$, but in equal temperament it corresponds to a ratio of $2^{ \frac{5}{12} } : 1 \approx 1.335 : 1$. Again, this is not a perfect fourth, but is quite close.
And so on. What's going on here is a massively convenient mathematical coincidence: several of the powers of $\sqrt[12]{2}$ happen to be good approximations to ratios of small integers, and there are enough of these to play Western music.
Here's how this coincidence works. You get the white keys from $C$ using (part of) the circle of fifths. Start with $C$ and go up a fifth to get $G$, then $D$, then $A$, then $E$, then $B$. Then go down a fifth to get $F$. These are the "neighbors" of $C$ in the circle of fifths. You get the black keys from here using the rest of the circle of fifths. After you've gone up a "perfect" perfect fifth twelve times, you get a frequency ratio of $3^{12} : 2^{12} \approx 129.7 : 1$. This happens to be rather close to $2^7 : 1$, or seven octaves! And if we replace $3 : 2$ by $2^{ \frac{7}{12} } : 1$, then we get exactly seven octaves. In other words, the reason you can afford to identify these intervals is because $3^{12}$ happens to be rather close to $2^{19}$. Said another way,
$$\log_2 3 \approx \frac{19}{12}$$
happens to be a good rational approximation, and this is the main basis of equal temperament. (The other main coincidence here is that $\log_2 \frac{5}{4} \approx \frac{4}{12}$; this is what allows us to squeeze major thirds into equal temperament as well.)
It is a fundamental fact of mathematics that $\log_2 3$ is irrational, so it is impossible for any kind of equal temperament to have "perfect" perfect fifths regardless of how many notes you use. However, you can write down good rational approximations by looking at the continued fraction of $\log_2 3$ and writing down convergents, and these will correspond to equal-tempered scales with more notes.
Of course, you can use other types of temperament, such as well temperament; if you stick to $12$ notes (which not everybody does!), you will be forced to make some intervals sound better and some intervals sound worse. In particular, if you don't use equal temperament then different keys sound different. This is a major reason many Western composers composed in different keys; during their time, this actually made a difference. As a result when you're playing certain sufficiently old pieces you aren't actually playing them as they were intended to be heard - you're using the wrong tuning.
Edit: I suppose it is also good to say something about why we care about frequency ratios which are ratios of small integers. This has to do with the physics of sound, and I'm not particularly knowledgeable here, but this is my understanding of the situation.
You probably know that sound is a wave. More precisely, sound is a longitudinal wave carried by air molecules. You might think that there is a simple equation for the sound created by a single note, perhaps $\sin 2\pi f t$ if the corresponding tone has frequency $f$. Actually this only occurs for tones which are produced electronically; any tone you produce in nature carries with it overtones and has a Fourier series
$$\sum \left( a_n \sin 2 \pi n f t + b_n \cos 2 \pi n f t \right)$$
where the coefficients $a_n, b_n$ determine the timbre of the sound; this is why different instruments sound different even when they play the same notes, and has to do with the physics of vibration, which I don't understand too well. So any tone which you hear at frequency $f$ almost certainly also has components at frequency $2f, 3f, 4f, ...$.
If you play two notes of frequencies $f, f'$ together, then the resulting sound corresponds to what you get when you add their Fourier series. Now it's not hard to see that if $\frac{f}{f'}$ is a ratio of small integers, then many (but not all) of the overtones will match in frequency with each other; the result sounds a more complex note with certain overtones. Otherwise, you get dissonance as you hear both types of overtones simultaneously and their frequencies will be similar, but not similar enough.
Edit: You should probably check out David Benson's "Music: A Mathematical Offering", the book Rahul Narain recommended in the comments for the full story. There was a lot I didn't know, and I'm only in the introduction!
Since $7$ is prime and there is no $7$-note signature that sums to $12$ with all $7$ steps identical, we don't have to worry about periodicity; we can just divide by $7$ in the end. Thus, we just have to count the number of ways of distributing $12-7=5$ balls into $7$ bins with capacity $3-1=2$. There are $\binom75=21$ ways to have $5$ steps of $2$, $\binom7{1,3,3}=140$ ways to have $3$ steps of $2$ and $1$ step of $3$, and $\binom7{2,1,4}=105$ ways to have $1$ step of $2$ and $2$ steps of $3$, for a total of $21+140+105=266$ scales in $266/7=38$ cyclically inequivalent types.
In the present case, inclusion-exclusion would be a bit of an overkill, but since you said you'd like a method that generalises to any number of notes with any number of maximum steps, let's generalise: For $k$ notes with a maximum of $m$ steps that sum to $12$, we want to distribute $12-k$ balls into $k$ bins with capacity $m-1$. As explained at Balls In Bins With Limited Capacity, inclusion-exclusion yields a count of
$$
\sum_{t=0}^{12-k}(-1)^t\binom{12-k}t\binom{12-k+k-tm-1}{12-k-1}=\sum_{t=0}^{12-k}(-1)^t\binom{12-k}t\binom{11-tm}{11-k}\;,
$$
where, contrary to convention, binomial coefficients with negative upper index are taken to be zero. For the present case of $k=7$, $m=3$, this again yields
$$
\sum_{t=0}^7(-1)^t\binom7t\binom{11-3t}6=\binom{11}6-\binom71\binom86=266
$$
signatures. If $k$ isn't prime, or if it divides $12$, then you have to do a bit more to deal with periodicity; otherwise, you can just divide the above result by $k$.
Best Answer
Not a "proof" but a very interesting property that makes the diatonic scale unique. Summarizing from http://andrewduncan.net/cmt/ :
Diatonic scale (and its complementary, pentatonic scale) has the highest "entropy" (in other words, "variety") among all possible 7-note (or 5-note) scales (there are 66 of them). Therefore, the diatonic scale is the most rich in content 7-note scale which makes it a fertile ground for melodic ideas.
Neither 5 or 7 have common factors with 12 therefore it's not possible to distribute notes uniformly as it is with 6. Distributing 6 notes gives us the whole-tone scale {C, D, E, F♯, G♯, A♯, C} which is highly regular, has no tonality and creates a blurred, indistinct effect and thus, not very "useful".