$\begin{Bmatrix}
n\\
k
\end{Bmatrix}$ is the number of ways to distribute $n$ distinct objects into $k$ identical boxes with no empty boxes allowed/each box containing at least $1$ object. We will first consider distributions into $k$ distinct boxes.
The total number of ways to distribute $n$ distinct objects into $k$ distinct boxes with empty boxes allowed is: $$k^n$$
Let $A_i$ be the number of ways to perform the distribution with the condition that box $i$ must be an empty box. There are no restrictions on the other boxes. (This is equivalent to distributing the $n$ objects into the remaining $k-1$ boxes, empty boxes allowed.)
For example $A_1$ is the number of ways to distribute the objects into the $k$ boxes with box 1 being empty and is equivalent to distributing the objects into the remaining $k-1$ boxes, empty boxes allowed.
$$\implies |A_1| = (k-1)^n $$
Now we notice that $|A_1|=|A_2|=|A_3|=\dots$ since it doesn't matter which box is empty. (We are always distributing into the remaining $k-1$ boxes).
We know there are $\binom{k}{1}$ possible $A_i $
$$\sum_{i=1}^k|A_i| = \binom{k}{1}(k-1)^n$$
Using a similar argument, $|A_i\cap A_j|=(k-2)^n$ because it involves distribution into $k-2$ remaining boxes. Also, using the fact that $|A_1\cap A_2|=|A_1\cap A_3|=|A_1\cap A_4|...$ and that there are $\binom{k}{2}$ such terms,
$$\sum_{1\leq i,j\leq k}|A_i\cap A_j| = \binom{k}{2}(k-2)^n$$
On a similar note, we try extending this to when $l$ boxes must be empty:
$$\sum_{1\leq i,j\dots l\leq k}|A_i\cap A_j\cap \dots\cap A_l| = \binom{k}{l}(k-l)^n$$
$|A_1\cup A_2\cup \dots \cup A_k|$ is the number of ways to distribute $n$ distinct objects into $k$ distinct boxes with at least one empty box.
We are trying to find the distribution with no empty boxes: $$|\overline{A_1\cup A_2\cup \dots \cup A_k}|=k^n -|A_1\cup A_2\cup \dots \cup A_k|$$
By the principle of inclusion and exclusion,
$$=k^n-(|A_1|+|A_2|+|A_3|+...)+(|A_1\cap A_2|+|A_1\cap A_3|+|A_1\cap A_4|+...)-...$$
$$=k^n-\binom{k}{1}(k-1)^n+\binom{k}{2}(k-2)^n-\binom{k}{3}(k-3)^n+\dots +(-1)^l\binom{k}{j}(k-j)^n$$
$$=\sum_{l=0}^{k}(-1)^l\binom{k}{l}(k-l)^n$$
Let $j=k-l$, when $l=0$, $j=k$. When $l=k$, $j=0$.
$$=\sum_{j=k}^{0}(-1)^{k-j}\binom{k}{k-(k-j)}j^n (\text{this is technically wrong notation})$$
This is just the sum backwards. So we have:
$$=\sum_{j=0}^{k}(-1)^{k-j}\binom{k}{j}j^n$$
Now we divide by $k!$ since we are dealing with identical boxes:
$$\begin{Bmatrix}
n\\
k
\end{Bmatrix}
=\frac{1}{k!}\sum_{j=0}^{k}(-1)^{k-j}\binom{k}{j}j^n$$
Best Answer
If you're familiar with combinatorics, this being equal to $n^n$ means it counts functions $f:[n]\to[n]$ ($[n]$ denoting the set $\{1,2,\ldots,n\}$, since each of the $n$ elements in the domain have $n$ choices of image in the codomain.
I'll give you the idea of a proof and let you fill in the details. The given expression counts functions from $[n]$ to $[n]$ with some fixed subset $S$ of size $\ell$ of $[n]$. The term $n^{n-\ell}$ counts functions from $[n]-S$ to $[n]$, since each of the $n-\ell$ elements in the domain have $n$ choices o image in the codomain. The remaining sum $\sum_{k=1}^\ell \binom{n}{k}k! \begin{Bmatrix} \ell \\ k \end{Bmatrix}$ determines the image of $S$, where each $k$ corresponds to a possible size of $f(S)$. In other words, for each $k$, $\binom{n}{k}k! \begin{Bmatrix} \ell \\ k \end{Bmatrix}$ counts the number of functions $f:S\to [n]$ where $|f(S)|=k$.
The only other major fact you need is that $k! \begin{Bmatrix} \ell \\ k \end{Bmatrix}$ counts surjections $f:[\ell]\to[k]$. The Stirling number of the second kind counts partitions of $[\ell]$ into $k$ parts. There are $k!$ ways to associate each block with an element of $[k]$, creating a surjection from $[\ell]$ onto $[k]$.