Solved – Ratio of Means or Mean of Ratios? Inference Across Groups

hypothesis testingratio

Edit:

For the purposes of the bounty, it would suffice to answer my third question below for the statistic $\hat{P}_g$.

I have two groups $g \in \{1, 2\}$ and customers $i$ within those groups who generate revenue $R_{ig} \in \mathbb{R}^+$ by purchasing a quantity $Q_{ig} \in \mathbb{R}^+$. There is price discrimination, so the effective price per unit of the good for each customer, $P_{ig}$, is different.

An "average" price can be estimated for each group in at least two ways:

as $\hat{P}_g = \dfrac{\sum_iR_{ig}}{\sum_iQ_{ig}}$.

By dividing by the sample size, $N_g$, this can be seen to be equivalent to the ratio estimator $\hat{P}_g = \dfrac{\overline{R}_{g}}{\overline{Q}_{g}}$, or
as $\tilde{P}_g = \dfrac{1}{N_g}\sum_i{\dfrac{R_{ig}}{Q_{ig}}}$.

I have several questions:

Is there a difference in interpretation between the two estimators $\hat{P}_g$ and $\tilde{P}_g$ — are they estimating the same underlying population quantity? I came across this, but there does not appear to be a version of this online.
From a statistical point of view, is one of the estimators better than the other (for the common or for their respective population quantities), especially in terms of their bias properties?
Lastly, for either one of the estimators, what is the right way to compare them across groups to test the hypothesis that $P_1 = P_2$? Is there a variance estimator for either of the two estimators? I want to be able to say that the price is, on average, the same across the two groups.

I believe this is related to the Fieller-Creasy class of problems, but I am not familiar with the problem family.

Best Answer

I feel like the concern should be with the underlying data-generating process and where you suspect 'error' or noise in your data is coming from. The whole point of taking averages is to be able to invoke a law-of-large-numbers argument that noise 'cancels out'.

1) For example, let's say we have measurement error in the amount of revenue generated but not in quantity (in reality this might be due to rounding, which is a very specific and ugly type of error), i.e. a demon decides to add an iid epsilon noise to our observations of $R_{ig}=R^*_{ig}+\epsilon_{ig}$ where $R^*$ is the true revenue generated and what we observe is $R$. Then taking the average over all observed $R$ will minimize the relative influence of the noise, and dividing through by the average of $Q$ should be the most efficient way to proceed. This is $\hat{P}$.

2) However, let's say that instead of observing $R$ and $Q$, we actually observe $Q$ and a noisy measure of price $P=P^*+\epsilon$ (although this begs the question of why we even bother with $Q$ and $R$ in the first place since we already have what we want, $P$, directly), and we use a spreadsheet to find $R$ by multiplying $Q$ and $P$, it makes a lot more sense to calculate $R/Q$ and take averages of the ratios instead, as in $\tilde{P}$.

Roughly speaking, how you want the noise to 'cancel' will determine what averages you take, but you cannot know how the noise cancels unless you first specify where it's coming into play. What if instead of additive noise you had multiplicative noise (multiply $R$ by +/- a few percent, this is actually very similar to part 2)? Then you'd want to take logs, add up the logs (since multiplicate noises are additive in logs), then re-exponentiate. etc.

Edit:

I'd argue that the above answers your 2nd point, since bias properties are a function of the error structure, of which I gave 2 examples, but I'll answer the 3rd question in particular.

If we assume no noise in our observations of $Q$ but additive iid $\epsilon \sim N(0,\sigma^2)$ noise in $R$, then our modeling assumption is $$R_i=R^*_i+\epsilon_i=Q_i P_i+\epsilon_i$$

Solving for $P_i$ is just an OLS of $R_i$ on $Q_i$ with a forced intercept through zero and two subgroups $g$, which means we can run a Chow test for equality of $P$ in the two subgroups. Using the example in Wikipedia, you would just have $y_t=b_1 x_{1t}+\epsilon$ and $y_t=b_2 x_{2t} + \epsilon$ for your two groups, and $y=R, b=P,x=Q$

If you insist on using the quotient estimators directly (which you shouldn't if you have enough assumptions to use an OLS-based method), then still assuming additive errors gives $$\sum_i R_i \sim N(\sum_i R_i^*, N\sigma^2)\\ \sum_i R_i / \sum_i Q_i \sim N(P_i, \sigma^2 \frac{N}{(\sum Q_i)^2})$$

But then you have to estimate $\sigma^2$ which is usually done by taking residuals after performing an OLS fit anyways. As before, any discussion of variance properties hinges on the properties of the underlying DGP and where the noise is: if we assumed $Q$ was measured with error we can't even analytically derive the variance since the variance of a quotient of 2 random variables is generally a mess.

Edit:

Best Answer

Related Solutions

Solved – Calculating significance and uplift on Revenue A/B Tests

Hypothesis Testing – Performing Two Sample T-Test for Non-Normal Population

Related Question