Well, the number of ways you can arrange $a,a,a,b,b$ amounts to the number of ways to choose the locations for the three $a$s, since the $b$s will just go in the other two spots. But choosing locations for the $a$s is simply choosing a $3$-element subset of the set of all $5$ locations, of which there are $C[5,3],$ as you noticed.
Let's number our spots--$\{1,2,3,4,5\}.$ Then if we went with the arrangement $aabab,$ this would correspond to the subset $\{1,2,4\},$ for example. On the other hand, the subset $\{2,3,5\}$ would correspond to the arrangement $baaba.$
As you already observed, this only works with two types of objects to permute, since the locations of the first type of object tell the whole story. Otherwise, we can analogously use multinomial coefficients. (Note that the article uses "permutations with repetition" differently than you do.)
To your first question, yes, it does make sense to define $P(n,r)=\prod_{i=0}^{r-1}(n-i)$, thus extending domain of $n$ to all of $\mathbb C$ (or really any commutative ring with identity). Then $P(n,r)$ is also called the falling factorial, sometimes denoted $n^{\underline r}\,$.
Just like $C(n,k)$ counts sets of size $k$ for $n>0$, there is a nice combinatorial interpretation $|C(-n,k)|$; it counts multisets of size $k$. There is a similar duality for $P(n,k)$. Whereas $P(n,k)$ counts ordered lists without repetition, $|P(-n,k)|$ counts the number of ways to place $k$ distinct flags on $n$ distinct flagpoles, where the order of the stack of flags on each pole matters. Note that $|P(-n,k)|=n(n+1)\cdots (n+k-1)$ is the rising factorial, denoted $n^{\overline k}$. This is sort of like an ordered version of a multiset. You can think of the flagpoles as the objects being chosen, and putting a flag on it an object represents choosing it. You can put multiple flags on a pole, so you can choose an object multiple times. Since the flags are distinct, order matters.
Finally, the reason that Wolfram Alpha cannot compute $P(n,r)$ when $n$ is a negative integer is because $(-n)!$ is undefined. Specifically, the Gamma function has a simple pole at each nonnegative integer. However, we can still calculate $P(n,r)$ using $n!/(n-r)!=\Gamma(n+1)/\Gamma(n-r+1)$. Since we have a simple pole in the numerator and denominator, they effectively cancel. The ratio now has a removable singularity at each nonnegative integer, and we can define $P(n,r)$ as the limit for $n$ in the neighborhood of the negative integer. In this case, we recover the expected $P(n,r)=\prod_{i=0}^{r-1}(n-i)$ result.
Best Answer
The reason combinations come in can be seen in using a special example. The same logic applies in the general case but it becomes murkier through the abstraction.
Consider
$$(a+b)^3$$
If we were to multiply this out, and not group terms according to multiplication rules (for example, let $a^3$ remain as $aaa$ for the sake of our exercise), we see
$$(a+b)^3 = aaa + aab + aba + baa + abb + bab + bba + bbb$$
Notice that we can characterize the sum this way:
$$(a+b)^3 = (\text{terms with 3 a's}) + (\text{terms with 2 a's}) + (\text{terms with 1 a}) + (\text{terms with no a's})$$
(You can also do the same for $b$, the approach is equivalent.) Well, we see from our weird expansion that we have every possible sequence of length $3$ made up of only $a$'s and $b$'s. We also know some of these terms are going to group together, as, for example, $aba = aab = baa$.
So how many summands are actually equal? Well, since they all have the same length, two summands are equal if and only if they have the same number of $a$'s (or $b$'s, same thing). And we also know that every possible sequence of length $3$ and only $a$'s and $b$'s are here.
So we can conclude that
$$\begin{align} (\text{# of terms with 3 a's}) &= \binom{3}{3} = 1\\ (\text{# of terms with 2 a's}) &= \binom{3}{2} = 3\\ (\text{# of terms with 1 a}) &= \binom{3}{1} = 3\\ (\text{# of terms with no a's}) &= \binom{3}{0} = 1 \end{align}$$
Thus, we conclude:
Thus,
$$(a+b)^3 = \sum_{k=0}^3 \binom{3}{k}a^k b^{3-k}$$
and in general, for positive integers $n$,
$$(a+b)^n = \sum_{k=0}^n \binom{n}{k}a^k b^{n-k}$$
In short, the reason we use combinations is because the order does not matter, because we will get terms like $aab, baa, bab$ which are all equal in the expansion. Since multiplication is a commutative operation over the real numbers, then, we can say they're equal. Thus, the number of terms of that "type" (characterized by how many $a$'s or $b$'s they have) is given precisely by the number of sequences of length $n$ ($n=3$ in our example), made of only $a$ and $b$, that has exactly $k$ $a$'s (or $b$'s).
Of course this all relies on the central premise that multiplication commutes in the reals and thus ensures that the order of the factors does not matter. That suggests that it does not always hold in situations where multiplication does not commute - for example, the multiplication of a type of numbers known as quaternions is not commutative, and thus the binomial theorem does not hold there as it does here (since there $ab$ need not equal $ba$).
The nature of this commutativity, or the lack of it, and the consequences of each is better divulged in a discussion on abstract algebra, and this tangent is long enough as it is.