$\begin{Bmatrix}
n\\
k
\end{Bmatrix}$ is the number of ways to distribute $n$ distinct objects into $k$ identical boxes with no empty boxes allowed/each box containing at least $1$ object. We will first consider distributions into $k$ distinct boxes.
The total number of ways to distribute $n$ distinct objects into $k$ distinct boxes with empty boxes allowed is: $$k^n$$
Let $A_i$ be the number of ways to perform the distribution with the condition that box $i$ must be an empty box. There are no restrictions on the other boxes. (This is equivalent to distributing the $n$ objects into the remaining $k-1$ boxes, empty boxes allowed.)
For example $A_1$ is the number of ways to distribute the objects into the $k$ boxes with box 1 being empty and is equivalent to distributing the objects into the remaining $k-1$ boxes, empty boxes allowed.
$$\implies |A_1| = (k-1)^n $$
Now we notice that $|A_1|=|A_2|=|A_3|=\dots$ since it doesn't matter which box is empty. (We are always distributing into the remaining $k-1$ boxes).
We know there are $\binom{k}{1}$ possible $A_i $
$$\sum_{i=1}^k|A_i| = \binom{k}{1}(k-1)^n$$
Using a similar argument, $|A_i\cap A_j|=(k-2)^n$ because it involves distribution into $k-2$ remaining boxes. Also, using the fact that $|A_1\cap A_2|=|A_1\cap A_3|=|A_1\cap A_4|...$ and that there are $\binom{k}{2}$ such terms,
$$\sum_{1\leq i,j\leq k}|A_i\cap A_j| = \binom{k}{2}(k-2)^n$$
On a similar note, we try extending this to when $l$ boxes must be empty:
$$\sum_{1\leq i,j\dots l\leq k}|A_i\cap A_j\cap \dots\cap A_l| = \binom{k}{l}(k-l)^n$$
$|A_1\cup A_2\cup \dots \cup A_k|$ is the number of ways to distribute $n$ distinct objects into $k$ distinct boxes with at least one empty box.
We are trying to find the distribution with no empty boxes: $$|\overline{A_1\cup A_2\cup \dots \cup A_k}|=k^n -|A_1\cup A_2\cup \dots \cup A_k|$$
By the principle of inclusion and exclusion,
$$=k^n-(|A_1|+|A_2|+|A_3|+...)+(|A_1\cap A_2|+|A_1\cap A_3|+|A_1\cap A_4|+...)-...$$
$$=k^n-\binom{k}{1}(k-1)^n+\binom{k}{2}(k-2)^n-\binom{k}{3}(k-3)^n+\dots +(-1)^l\binom{k}{j}(k-j)^n$$
$$=\sum_{l=0}^{k}(-1)^l\binom{k}{l}(k-l)^n$$
Let $j=k-l$, when $l=0$, $j=k$. When $l=k$, $j=0$.
$$=\sum_{j=k}^{0}(-1)^{k-j}\binom{k}{k-(k-j)}j^n (\text{this is technically wrong notation})$$
This is just the sum backwards. So we have:
$$=\sum_{j=0}^{k}(-1)^{k-j}\binom{k}{j}j^n$$
Now we divide by $k!$ since we are dealing with identical boxes:
$$\begin{Bmatrix}
n\\
k
\end{Bmatrix}
=\frac{1}{k!}\sum_{j=0}^{k}(-1)^{k-j}\binom{k}{j}j^n$$
explain this "identical boxes"
When we put distinguishable objects into "identical boxes", it means we put distinguishable objects into distinguishable boxes, but then we want to consider rearranging the boxes to arrive at an equivalent arrangement, and we count the number of equivalence classes.
A better way of thinking about it is that we have a set of boxes (not a list), and each box is a nonempty set. So the order of the boxes doesn't matter, and the order of objects within each box doesn't matter.
Perhaps an example will help.
The number of ways to put $n$ distinguishable objects $1,2,3,\ldots, n$ into $k$ identical boxes when $n = 4$ and $k = 2$ is $7$:
Box 1 Box 2
123 4
124 3
134 2
234 1
124 3
12 34
13 24
14 23
1 how to construct the formula for it (the recursive one) preferably using bijection principle and just using "box and objects language"
Let's prove the recurrence relation
$$
\newcommand{\stir}{\genfrac\{\}{0pt}{}}
\stir{n+1}{k} = k \stir{n}{k} + \stir{n}{k-1}.
$$
Consider an arrangement of $n+1$ objects $(1,2,3,\ldots,n+1)$ put in $k$ boxes, of the kind counted by $\stir{n+1}{k}$.
There are two cases to consider: either object $n+1$ stands in its own box, or it shares its box with some other objects.
If it stands in its own box, then we have to distribute the remaining $n$ objects into the remaining $k-1$ boxes, and there are $\stir{n}{k-1}$ ways to do so. If not, then we have to put $n$ remaining objects into $k$ boxes, one of them distinguished from the rest, such that each box is nonempty (the box that contained object $n+1$ has to be nonempty too, as that is the case we are considering). The number of ways to do this is $k \cdot \stir{n}{k}$: first $\stir{n}{k}$ ways to put the $n$ objects into $k$ identical boxes, and then $n$ ways to choose which box to select as the distinguished one, which we will put $n+1$ into. (Note that nonempty is key here: because all boxes are nonempty, despite being indistuishable boxes they are distinguishable once any object is put in them; thus, we cannot put object $n+1$ into two different boxes and end up with the same resulting arrangement.)
2 explain the same but now with " set partition language"
In a partition of $1,2,3,\ldots,n+1$ into $k$ sets, there are two possibilities: either $\{k\}$ is an element of the partition, or $\{k\}$ is not an element of the partition. In the forms case,
there is a one-to-one, onto map (bijection) from all such partitions of $1,2,3,\ldots,n+1$ into $k$ parts to all partitions of $1,2,3,\ldots,n$ into $k-1$ parts, given by
(for $P$ a partition of $\{1,2,3,\ldots,n+1\}$
$$
P \quad\mapsto\quad P \setminus \{\{n+1\}\}.
$$
In the latter case, there is a $k$-to-one, onto map from all such partitions of $1,2,3\ldots,n+1$ into $k$ parts to all partitions of $1,2,3, \ldots, n$ into $k$ parts, given by
$$
P \quad\mapsto\quad \{x \setminus \{n+1\} \;|\; x \in P\}.
$$
3 define and construct the explicit formula for it
Explicit formulas are not always that nice :)
You can find a combinatorial answer to building an explicit formula here.
Best Answer
There are a couple of things going on here.
Let's expand on the second issue, as the two problems are clearly related. Since there's some notational confusion going on, let's let $T(r,n)$ denote the number of ways to distribute $r$ distinct objects into $n$ distinct boxes with no empty box (i.e., your problem), and we'll let $S(r,n)$ denote the number of ways to distribute $r$ distinct objects into $n$ indistinguishable boxes with no empty box. Then we have the relationship $T(r,n) = n! S(r,n)$. This is because the indistinguishable boxes can be made distinguishable by applying $n$ different labels to them, and there are $n!$ ways to assign $n$ labels to $n$ boxes.
This fits with your computations above. Your (corrected) computations have $$T(r,n) = \sum_{k=0}^n(-1)^{n-k}\binom{n}{k}k^r.$$ The formula you cite for the Stirling numbers of the second kind has $$S(r,n) = \frac{1}{n!}\sum_{k=0}^{n}(-1)^k\binom{n}{k}(n-k)^r.$$ Since $\binom{n}{n-k} = \binom{n}{k}$, though, by reindexing the sum we get $$S(r,n) = \frac{1}{n!}\sum_{k=0}^{n}(-1)^{n-k}\binom{n}{k}k^r,$$ in agreement with the argument above that $T(r,n) = n!S(r,n)$.