Hoeffding’s Inequality – Is My Statistical Test Derivation Correct?

hoeffdings-inequalityhypothesis testingprobabilityprobability-inequalitiestype-i-and-ii-errors

I'm trying to deduce from samples of observations from two finite sets of random variables $$X_{1}, …, X_{n}$$ and $$Y_{1}, …, Y_{m}$$ that the expected values of the average of those random variables are not the same. Let's denote $$\frac{1}{n}\sum\limits_{i=1}^{n}{X_{i}}$$ and $$\frac{1}{m}\sum\limits_{i=1}^{m}{Y_{i}}$$, $$\bar{X}$$ and $$\bar{Y}$$ respectively. So I'm trying to deduce from observations $$x_{1}, …, x_{n}$$ and $$y_{1}, …, y_{m}$$ that $$E[\bar{X}] \neq E[\bar{Y}]$$.

So my null hypothesis, $$H_{0}$$, would be $$E[\bar{X}] = E[\bar{Y}]$$ and the alternative one, $$H_{1}$$, would be $$E[\bar{X}] \neq E[\bar{Y}]$$. I thought of using Hoeffding's inequality this way:

If my random variables are all bounded and between $$0$$ and $$1$$ (which I have in practice), we then have: $$P(|\bar{X}-\bar{Y}-(E[\bar{X}]-E[\bar{Y}])|\geq \varepsilon) \leq e^{-\frac{2\varepsilon^{2}}{\frac{1}{n}+\frac{1}{m}}}$$

Under the null hypothesis we have then : $$P(|\bar{X}-\bar{Y})|\geq \varepsilon) \leq 2e^{-\frac{2\varepsilon^{2}}{\frac{1}{n}+\frac{1}{m}}}$$, we can fix the type $$I$$ error rate to a maximum of $$\delta$$ if we take $$\varepsilon_{\delta} = \sqrt{\frac{1}{2(\frac{1}{n}+\frac{1}{m})}ln(\frac{2}{\delta})}$$.
Now, suppose we get observations such as $$|\bar{X}-\bar{Y}| < \varepsilon_{\delta}$$, there's a probability of $$P(|\bar{X}-\bar{Y}| < \varepsilon_{\delta}) = 1 – P(|\bar{X}-\bar{Y}| \geq \varepsilon_{\delta})$$ under $$H_{0}$$, so our type $$II$$ error is at most $$1 – \delta$$.

I hope you can help me understand if what I'm doing is correct or not. Thank you in advance.

Your $$\epsilon$$ increases with $$m$$ and $$n$$ for fixed $$\delta$$, so the test becomes less powerful with more data. There must either be an error in the calculation or in the idea.
Starting with $$P(|\bar X-\bar Y|>\epsilon)\leq 2\exp\left( \frac{-2\epsilon^2}{\frac{1}{n}+\frac{1}{m}}\right)$$ we're going to write $$\alpha$$ for the right-hand side, because we'll need $$\delta$$ for more standard purposes late. I'm also going to write $$\tilde n$$ for $$1/(1/n)+(1/m)$$ to cut down on the opportunities for getting the reciprocals wrong (and cut down on the typesetting). We want to guarantee the LHS is less than $$\delta$$, so we want $$\alpha \leq 2\exp\left( -2\tilde n\epsilon^2\right)$$ Rearranging: $$\frac{\alpha}{2}\leq exp\left( -2\tilde n\epsilon^2\right)$$ $$\log \frac{\alpha}{2}\leq -2\tilde n\epsilon^2$$ $$\frac{1}{2\tilde n}\log \frac{\alpha}{2}\leq -\epsilon^2$$ $$-\frac{1}{2\tilde n}\log \frac{\alpha}{2}\geq \epsilon^2$$ $$\sqrt{-\frac{1}{2\tilde n}\log \frac{\alpha}{2}}\geq \epsilon$$ $$\sqrt{\frac{1}{2\tilde n}\log \frac{2}{\alpha}}\geq \epsilon$$ which is like your bound except it decreases with increasing sample size.
Now, the other thing that's wrong is that you haven't specified a distance of departure from the null in getting your power -- the power must decrease to $$\delta$$ as $$E[X]-E[Y]\to 0$$, so you need to specify what $$|E[X]-E[Y]|$$ is under the alternative for which you want the specified power. Without loss of generality, suppose $$X$$ has the larger mean and write $$\delta=E[X]-E[Y]$$. Also write $$Z$$ for $$X-\delta$$, so that the means of $$Z$$ and $$Y$$ are the same. Now, $$\bar Z-\bar Y$$ is controlled by Hoeffding's Theorem, since $$Z$$ is still bounded, and if $$P[|\bar X-\bar Y| >\delta/2]\leq \alpha$$ under the null and $$P[|\bar Z-\bar Y| >\delta/2]\leq \alpha$$ under the alternative, then you have a useful test, since at most one of $$|\bar X-\bar Y|>\delta/2$$ and $$|\bar Z-\bar Y|>\delta/2$$ can be true