The appropriate distribution is Wallenius's noncentral hypergeometric distribution. Using an urn analogy, the problem is equivalent to picking $n$ of $k$ balls without replacement, where each ball is a different color: the parameters $p$ are analogous to the weights of picking a particular color.
The problem: it's not very convenient to work with, though there is an R package.
A demonstration using "equations" was requested in a comment. Here is a short, simple one that is practically painless.
Notation and definitions
Let the random $K$-vector $X$ have a multinomial distribution with parameters $\mathbb p = (p_1, p_2, \ldots, p_K)$. This means that $p_1 + p_2 + \cdots + p_K=1$, $0 \le p_i$ for $i=1, 2, \ldots, K$, and the probability that $X = (m_1, m_2, \ldots, m_K) = \mathbb m$ is given by
$$\Pr(X=\mathbb m) =\binom{N}{\mathbb m}\mathbb p^\mathbb m$$
In this shorthand notation $\binom{N}{\mathbb m} = N!/(m_1! m_2! \ldots m_K!)$ is a multinomial coefficient (which is nonzero only when all the $m_i$ are natural numbers and sum to $N \ge 1$) and $\mathbb p ^ \mathbb m = p_1^{m_1}p_2^{m_2}\cdots p_K^{m_k}.$
By definition, the expectation of $X$ is the vector
$$\mathbb E[X] = \sum_{\mathbb m} \Pr(X = \mathbb m)\mathbb m =\sum_{\mathbb m} \binom{N}{\mathbb m}\mathbb p^\mathbb m \mathbb m$$
where the sum extends over the (finite number of) values of $\mathbb m$ for which the probability is nonzero.
Solution
By expanding the sum using the definition of the multinomial coefficients, notice that
$$1 = 1^N = (p_1 + p_2 + \cdots + p_K)^N = \sum_{\mathbb m}\binom{N}{\mathbb m}\mathbb p^\mathbb m.$$
Viewing the $p_i$ as variables, we can recognize the component terms $\binom{N}{\mathbb m}\mathbb p^\mathbb m m_i$ in the expectation as the result of applying the differential operator $p_i\frac{\partial}{\partial p_i}$ to the right hand side, because $p_i\frac{\partial}{\partial p_i} \left(p_i^{m_i}\right) = m_i p_i^{m_i}.$ Another way to compute the same thing is to use the Chain Rule to differentiate the penultimate term in the preceding multinomial expansion:
$$p_i\frac{\partial}{\partial p_i}(p_1 + p_2 + \cdots + p_K)^N = p_iN(p_1 + p_2 + \cdots + p_K)^{N-1}\frac{\partial p_i}{\partial p_i} = Np_i(1)^{N-1} = Np_i.$$
Therefore
$$\mathbb E[X] = (Np_1, Np_2, \ldots, Np_K),$$
QED.
Best Answer
Suppose you roll a 6-sided die $N$ times.
The outcome of roll $i$, $i=1,\ldots,N$, is represented by the random variable $X_i$. The tuple $\mathbf{X}=\left(X_1,\ldots,X_N\right)$ contains the outcome of each roll.
We can obtain category-level count information from $\mathbf{X}$ by taking $N_j=\sum_{i=1}^{N}\delta\left(X_i=j\right)$, $j=1,\ldots,6$. The tuple $\mathbf{N}=\left(N_1,\ldots,N_6\right)$ contains the counts for each category.
What's the difference between having $\mathbf{X}$ and $\mathbf{N}$? They both arise from $N$ trials of a multinomial distribution with six possible outcomes, each with equal probability of occurring. However, when we discuss probability with respect to $\mathbf{X}$ we are talking about the probability of a specific sequence of outcomes. When we discuss probability with respect to $\mathbf{N}$ we are talking about the probability of a specific set of counts. There is a normalizing factor with the trial-level information, but it's just $1$ because there is only one way to get any specific sequence of outcomes.
EDIT The second section of the paper actually discusses when to use counts and when to use samples.