I think this is a poorly chosen data set for exploring the advantages of zero inflated models, because, as you note, there isn't that much zero inflation.
plot(fitted(fm_pois), fitted(fm_zinb))
shows that the predicted values are almost identical.
In data sets with more zero-inflation, the ZI models give different (and usually better fitting) results than Poisson.
Another way to compare the fit of the models is to compare the size of residuals:
boxplot(abs(resid(fm_pois) - resid(fm_zinb)))
shows that, even here, the residuals from the Poisson are smaller than those from the ZINB. If you have some idea of a magnitude of the residual that is really problematic, you can see what proportion of the residuals in each model are above that. E.g. if being off by more than 1 was unacceptable
sum(abs(resid(fm_pois) > 1))
sum(abs(resid(fm_zinb) > 1))
shows the latter is a bit better - 20 fewer large residuals.
Then the question is whether the added complexity of the models is worth it to you.
Both the Poisson distribution and the geometric distribution are special cases of the negative binomial (NB) distribution. One common notation is that the variance of the NB is $\mu + 1/\theta \cdot \mu^2$ where $\mu$ is the expectation and $\theta$ is responsible for the amount of (over-)dispersion. Sometimes $\alpha = 1/\theta$ is also used. The Poisson model has $\theta = \infty$, i.e., equidispersion, and the geometric has $\theta = 1$.
So in case of doubt between these three models, I would recommend to estimate the NB: The worst case is that you lose a little bit of efficiency by estimating one parameter too many. But, of course, there are also formal tests for assessing whether a certain value for $\theta$ (e.g., 1 or $\infty$) is sufficient. Or you can use information criteria etc.
Of course, there are also loads of other single- or multi-parameter count data distributions (including the compound Poisson you mentioned) which sometimes may or may not lead to significantly better fits.
As for excess zeros: The two standard strategies are to either use a zero-inflated count data distribution or a hurdle model consisting of a binary model for zero or greater plus a zero-truncated count data model. As you mention excess zeros and overdispersion may be confounded but often considerable overdispersion remains even after adjusting the model for excess zeros. Again, in case of doubt, I would recommend to use an NB-based zero inflation or hurdle model by the same logic as above.
Disclaimer: This is a very brief and simple overview. When applying the models in practice, I would recommend to consult a textbook on the topic. Personally, I like the count data books by Winkelmann and that by Cameron & Trivedi. But there are other good ones as well. For an R-based discussion you might also like our paper in JSS (http://www.jstatsoft.org/v27/i08/).
Best Answer
A one-inflated Poisson model for a count $Y_i$ is
$$\begin{align}\Pr(Y_i = 1) &= \pi_i +(1-\pi_i)\cdot\mu_i\mathrm{e}^{-\mu_i}\\ \Pr(Y_i = y_i) &= (1-\pi_i)\cdot\frac{\mu_i^{y_i}\mathrm{e}^{-\mu_i}}{y_i!} \qquad \text{when } y_i\neq 1 \end{align}$$
where the Poisson mean $\mu_i$ & Bernoulli probability $\pi_i$ are related to the predictors through appropriate link functions. You can define a similar model to inflate probabilities for any values you choose.
Still, zero has a special (& once controversial) place among the counting numbers—in a sense representing the absence of anything to count. And it's the "nothing" vs "something" distinction, rather than the "one" vs "any other count" distinction that tends to be relevant across a wide range of phenomena we like to model: there's one process that gives a nought, one, two, ... count & another that gives no count at all.