I have data on the abundance of a particular organism across a sampling area. However, instead of counts, I have the estimated biomass of the organism at at each sampling location (that is, the estimated total weight of all the organisms at that location, but not the actual number of organisms). I know that a Poisson or negative binomial model is often appropriate for count data, and the NB seems particularly appropriate for these data (the variance large relative to mean and the species is known to be spatially aggregated). Can continuous data that is really an index of of a discrete variable be modeled using a continuous distribution? I've found one or two papers where biomass data was fit to a NB distribution but they are light on details and statistical justification.
Solved – Fitting continuous data with zeros to a discrete distribution
continuous datacount-datadistributions
Related Solutions
Probably the most common way to look at this kind of thing, if you're only interested in the proportions, is to assume that at the $i$th location $A_i$ & $B_i$ are independent Poisson variables with rates $\lambda_i$ & $\mu_i$ respectively. (That doesn't seem unreasonable for two types of car crashes at the same location over a limited period of time.) The joint mass function is
$$\newcommand{\e}{\mathrm{e}} f_{A_i,B_i}(a_i,b_i) = \frac{\lambda_i^{a_i} \e^{-\lambda_i}}{a_i!} \cdot \frac{\mu_i^{b_i} \e^{-\mu_i}}{b_i!}$$
Reparametrize with $$\pi_i = \frac{\lambda_i}{\lambda_i+\mu_i}$$ $$\nu_i= \mu_i+\lambda_i$$
, let $$N_i = A_i+B_i$$
, & the joint density can be written as
$$f_{A_i,N_i}(a_i,n_i)=\frac{1}{a_i!(n_i-a_i!)}\cdot\pi_i^{a_i} (1-\pi_i)^{n_i-a_i}\cdot \nu_i^{n_i} \e^{\nu_i}$$
Note that $\pi_i$, what you're interested in, & $\nu_i$, the nuisance parameter, separate cleanly; $N_i$ is sufficient for $\nu_i$, & $(A_i,N_i)$ sufficient for $\pi_i$. Sum over $a_i$ to get the marginal distribution of $N_i$, which is also Poisson, with rate $\nu_i$:
$$f_{N_i}(n_i)= \frac{\nu_i^{n_i} \e^{-\nu_i}}{n_i!}$$
Conditioning on the observed value of the ancillary complement $N_i=n_i$ gives
$$f_{A_i|N_i=n_i}(a_i;n_i)=\frac{n_i!}{a_i!(n_i-a_i!)}\cdot\pi_i^{a_i} (1-\pi_i)^{n_i-a_i}$$
, i.e. a binomial distribution for $A_i$ successes out of $n_i$ trials.
I'm not sure what your concern is about locations where there are no events—there's simply no data at these to estimate the proportion of type-A crashes because there weren't any crashes. That doesn't stop you estimating $\pi_i$ at other locations. If location is the only predictor you have a simple $2\times k$ contingency table for the $k$ locations with data. If there are continuous predictors you can use a logistic regression model. If you want to make estimates for the $n=0$ locations you need in some way to borrow information from other locations: e.g. with predictors whose coefficients are estimated from other locations, treating location as a random effect. A Bayesian multi-level model might be quite useful, as some locations will have small, though non-zero, event counts, & estimates for these will be pulled further in the direction of the global model.
I would suggest using Canonical Correspondence Analysis (sometimes called Constrained Correspondence Analysis). In your case, the "sites" are temporal, rather than spatial, but it should work just fine. You'll need a sites by species abundance matrix (which you seem to have) and a sites by environmental data matrix (which I presume you have or can construct).
There is a great discussion of CCA (and associated methods) in Numerical Ecology with R. It's geared towards using the R programming language, but the underlying theory is well described and should be extendable to whatever programming language/software you use.
If you don't have free access through your university to the book, then there are a few websites out there that describe how to use it. Just google it, but be careful not to confuse it for Canonical Correlation Analysis (which is different).
If that still doesn't work for you, try these tutorials on basic ordination analyses like PCA, DCA, and NMDS (the precursors to CCA), and work your way up.
Best Answer
If the counts are all likely to be large, the main potential issue I see here is the variance function, since you don't have anything that scales the biomass to an actual count. It's like having a noisy scaled count without knowing the scaling factor. That may not be such an issue with the negative binomial as it is with the Poisson, though.
If you have some atoms of probability but the data are otherwise continuous you have a mixed distribution (a mixture of continuous and discrete); when the only atom is at zero, it's sometimes called a zero-inflated continuous distribution.
Zero-inflated gamma and Zero-inflated lognormal distributions are commonly used; either might suit your case. Typical models include zero-inflated and hurdle models (yes, the term zero-inflated is overloaded). These are often applied to discrete data (e.g. for otherwise Poisson data you have Poisson hurdle and zero-inflated Poisson, or ZIP models), where the models are different in how they treat zeros, but the distinction is less clearly drawn for continuous models; but if I used different variables to model the zeros from the model for the continuous part I'd tend to call it a hurdle model rather than zero-inflated. If I used the same form of linear predictor (but with different betas), or if I had a constant probability of zero, I'd probably call it a zero-inflated model -- however, I'm not an expert on such models, so you may be better off following other people's way of dividing up models for continuous zero-inflated data.
There are some posts on our site relating to zero-inflated gamma models and other zero-inflated distributions, and on continuous zero-inflated and/or hurdle models.
On this page, Sean Anderson talks about gamma hurdle models and specifically mentions its use for modelling biomass.
Portion of older answer given under the original post (which stated the distribution was continuous):
I'd be inclined to model it as a gamma; it's continuous, and it arguably has roughly similar properties to the negative binomial.
Is there a particular reason you need the negative binomial?