Suppose you roll a 6-sided die $N$ times.
The outcome of roll $i$, $i=1,\ldots,N$, is represented by the random variable $X_i$. The tuple $\mathbf{X}=\left(X_1,\ldots,X_N\right)$ contains the outcome of each roll.
We can obtain category-level count information from $\mathbf{X}$ by taking $N_j=\sum_{i=1}^{N}\delta\left(X_i=j\right)$, $j=1,\ldots,6$. The tuple $\mathbf{N}=\left(N_1,\ldots,N_6\right)$ contains the counts for each category.
What's the difference between having $\mathbf{X}$ and $\mathbf{N}$? They both arise from $N$ trials of a multinomial distribution with six possible outcomes, each with equal probability of occurring. However, when we discuss probability with respect to $\mathbf{X}$ we are talking about the probability of a specific sequence of outcomes. When we discuss probability with respect to $\mathbf{N}$ we are talking about the probability of a specific set of counts. There is a normalizing factor with the trial-level information, but it's just $1$ because there is only one way to get any specific sequence of outcomes.
EDIT The second section of the paper actually discusses when to use counts and when to use samples.
The direct answer to the question - how to deal with small expected counts - is that one might either
(a) combine ranges of cells at the end (a very common approach),
(b) use a different (and perhaps better) goodness of fit test, or
(c) consider dropping the chi-square approximation, and see if one can deal with the discrete distribution of the test statistic more directly, perhaps by simulation.
Approach (a) can be found in many texts. There are many ways you can go about combining cells, but many people simply work from one end or the other, combining cells either to the left, or to the right, until the expected counts are sufficiently high for their purpose.
(However, the most commonly cited rule of thumb for the expected number - that it should be at least 5 for the chi-square approximation to hold - is unnecessarily strict for the sort of approximation most people would require. Many subsequent papers have suggested less stringent rules.)
The other answer by user36381 suggests that with such a large sample size, goodness of fit tests are almost certain to reject; this is true. However, I'm not so sure comparing to other reference distributions will help, since they, too, would almost certainly be rejected by a decent goodness of fit test.
(Why are you testing whether it's Poisson? If you have around a million data points, the sample itself contains a lot of information about distributional shape - do you actually need a name for the distribution?)
Best Answer
Multinomial and Poisson are very different. Multinomial regression should be used when your response is categorical with more than 2 categories. Poisson regression is used when your response is a count of an incident.
Classic Poisson regression: response is # of accidents at an intersection in a day. This can be modeled using Poisson regression since there is no obvious upper bound to the # of accidents. Also note that the numbers actually mean something and have quantitative value.
Classic example of multinomial regression: response is dietary habits of an individual, Non-vegetarian, Vegetarian or Vegan. Note that the variable in qualitative and not necessarily quantitative. And there are restrictions to what values the response can take (1 of three).
There are almost no obvious scenarios that I can think of where these two methods can be used interchangeably. For count data, if you want other options, you might look at Negative-Binomial Regression.
If the goal is to model the effect of the treatments, where the effect can be categorical (got well, stayed same, got better) a multinomial (or logistic) regression should be used with the treatment as a predictor.