This is a (super) late answer, but I myself was looking for some information related to gamma-gamma models for monetary value, and came across this. The short answer is yes, the negative values for expected transaction values exposes issues with the underlying dataset used to fit the model.
In case it is helpful for you or others with similar questions, I'll try to illustrate why it's concerning to have $q<1$. The purpose of these spend models is to understand observed spend per transaction with the goal of predicting future spend per transaction at the individual level. The use of a gamma distribution was first proposed by Colombo and Jiang (1999) and was motivated by the observation that if transactions are distributed normal, then 1) it is not bounded below by $0$ for any choice of mean and variance parameters, and 2) you get symmetric spend distributions, when the observed data consistently appears to be right skewed.
Following the paper you refer to, a customer with $x$ transactions values $z_1,\dots,z_x$ is modeled such that $z_i \sim \text{Gamma}(p,\nu),$ and we allow for heterogeneity across customers by also having that $\nu \sim \text{Gamma}(q,\gamma)$. A key observation is that conditional on $p$ and $\nu$, a customer's mean transaction value $\delta$ is $\delta = p/\nu$. Now $\nu$ varies across customers, so you may want to know what the mean transaction value $\delta$ is across all individuals. Denote this random variable $D$. It can be shown that
$$E[D|p,q,\gamma] = \frac{p\gamma}{q-1}$$
which says that the mean transaction value for customers is $\frac{p\gamma}{q-1}$ (showing this is a bit involved, but the way to do it is to derive the distribution and show it is an inverse-gamma distribution with specific parameters and find the expected value given that). In any gamma distribution, the parameters are strictly positive, so $p>0,\gamma >0$, so if you have $q<1$, then it must be that the expected transaction value across individuals is negative.
This should be pause for concern: why is the expected transaction value negative? You can try to validate this by thinking of compensating individuals for each transaction, but this is quite odd and there are other models if this is the kind of situation you are dealing with, and so the fact that your model finds $q<1$ should immediately raise some serious concerns for this reason alone.
As a final point, I think it's nice to better understand
$$
\begin{align}
\mathbb{E}(M\mid p, q, \gamma, m_x, x) & = \frac{(\gamma + m_xx)p}{px+q-1}\\
& = \bigg(\frac{q-1}{px+q-1}\bigg)\frac{\gamma p}{q-1}+\bigg(\frac{px}{px+q-1}\bigg)m_x\\
\end{align}
$$
as noting that it is simply the weighted average of the population mean transaction value $E[D|p,q,\gamma] = \frac{p\gamma}{q-1}$ and the observed average transaction value $m_x = \frac{1}{x}\sum_{i=1}^x z_i$ of a given customer, and the weightings can be fully understood from a bayesian framework as having a prior (the mean average transaction value), and the weight you place on it goes down as you observe more data $x$ on an given individual!
Imagine you're the newly appointed manager of a flower shop. You've got a record of last year's customers – the frequency with which they shop and how long since their last visit. You want to know how much business the listed customers are likely to bring in this year. There are a few things to consider:
[assumption (ii)] Customers have different shopping habits.
Some people like having fresh flowers all the time, while others only by them on special occasions. It makes more sense to have a distribution for the transaction rate $\lambda$, rather than assuming that a single $\lambda$ explains everyone’s behaviour.
The distribution needs to have few parameters (you don’t necessarily have a lot of data), to be fairly flexible (you’re presumably not a mind-reading entrepreneurial guru and don’t know all about shopping habits), and to take values in the positive real numbers. The Gamma distribution ticks all of those boxes, and is well-studied and relatively easy to work with. It’s often used as a prior for positive parameters in different settings.
[assumption (iii)] You might have already lost some of the customers on the list.
If Andrea has bought flowers about once a month every month in the last year, it’s a fairly safe bet she’ll be returning this year. If Ben used to buy flowers weekly, but he hasn’t been around for months, then maybe he’s found a different flower shop. In making future business plans, you might want to count on Andrea but not on Ben.
Customers won’t tell you when they’ve moved on, which is where the “unobserved lifetime” assumption kicks in for both models. Imagine a third customer, Cary. The Pareto/NBD and BG/NBD models give you two different ways to think about Cary dropping out of the shop for good.
For the Pareto/NBD case, imagine that at any point in time, there is a small chance that Cary might come across a better shop than yours. This constant infinitesimal risk gives you the exponential lifetime – and the longer it’s been since Cary’s last visit, the longer he’s been exposed to other (potentially better) flower shops.
The BG/NBD case is a little more contrived. Every time Cary arrives in your shop, he’s committed to buying some flowers. While browsing, he’ll consider the changes in price, quality and variety since his last visit, and that will ultimately make him decide whether to come back again next time, or look for another shop. So rather than being constantly at risk, Cary has some probability p of just deciding to leave after each purchase.
[assumption (iv)] Not all customers are equally committed to your shop.
Some customers are regulars, and only death – or a sharp price increase – will force them to leave. Others might like to explore, and would happily leave you for the sake of the new hipster flower shop across the street. Rather than a single drop-out rate for all customers, it makes more sense to have a distribution of drop-out rates (or probabilities in the BG/NBD case).
This works very much in the same vein as the shopping habits. We’re after a flexible, well-established distribution with few parameters. In the Pareto/NBD case we use a Gamma, since the rate $\mu$ is in the positive real numbers. In the BG/NBD case we use a Beta, which is the standard prior for parameters in $(0; 1)$.
I hope this helps. Have a look at the original paper (Schmittlein et al., 1987) if you haven't already -- they go through some of the intuition there.
Best Answer
I suspect the reason is primarily theoretical, while there may be mathematical implications as well I think these are more pronounced in the BG/NBD model than the Double-Gamma model. Consider that Fader & Hardie's research has built upon work by Ehrenberg, who began with modeling repeat transactions ... well technically it began with modeling customer buying patterns but it takes multiple transactions to generate a pattern.
We see the language in the papers - the initial transaction is seen as 'trial' rather than repeat and is considered to be of a different nature. (see the initial statement of problem in this paper: http://www.brucehardie.com/notes/006/creating_dor_summary_2004-05-04.pdf). It is a different problem to predict initial customer acquisition (in which a success is defined as a trial purchase) than to predict incremental customer transactions from already-acquired customers (in which success is defined as an additional repeat transaction).