First, it should be a Beta($\alpha, 1$) and Beta($\beta, 1$). Look at the numerator.
$$
P(V_1 / W \le x \cap W \le 1)
= \int_0^1 \int_0 ^ {wx} f_{V_1}(v) f_{V_2}(w - v) \ dv \ dw,
$$
using a substitution to get $f_{V_1, W}$ from $f_{V_1, V_2}$. Next, interchange order of integration
$$
\int_0 ^ x \int_{v/x} ^ 1 \alpha v^{\alpha - 1} \beta (w - v)^{\beta - 1} \ dw \ dv
= \int_0 ^ x \alpha v^{\alpha - 1}\left[(1 - v)^\beta - v^\beta \left(\frac{1 - x}{x}\right)^\beta\right].
$$
If $g(x)$ is the density of our proposed rejection sampler, then $g(x)$ is equal to the derivative of this expression, up-to a factor of $P(W \le 1)$,
$$
g(x) \propto \frac{d}{dx} \int_0 ^ x \alpha v^{\alpha - 1}\left[(1 - v)^\beta - v^\beta \left(\frac{1 - x}{x}\right)^\beta\right].
$$
By Leibniz rule,
$$
g(x) \propto \alpha \underbrace{x^{\alpha - 1} \left[(1 - x)^\beta - x^\beta \left(\frac{1 - x}{x}\right)^\beta\right]}_{= 0} + \int_0 ^ x \frac{d}{dx} \alpha v^{\alpha - 1}\left[(1 - v)^\beta - v^\beta \left(\frac{1 - x}{x}\right)^\beta\right] \ dv
\\
\propto \int_0 ^ x v^{\alpha + \beta - 1} \frac{d}{dx} \left(\frac{1 - x}{x}\right)^\beta \ dv.
$$
This final integral can be evaluated to get
$$
g(x) \propto x^{\alpha - 1}(1 - x)^{\beta - 1}.
$$
This argument is valid for $x \in [0, 1]$ while the density is trivially $0$ otherwise. This is the kernel of a Beta($\alpha, \beta$) distribution, so the conclusion follows.
You should think of the algorithm as producing draws from a random variable, to show that the algorithm works, it suffices to show that the algorithm draws from the random variable you want it to.
Let $X$ and $Y$ be scalar random variables with pdfs $f_X$ and $f_Y$ respectively, where $Y$ is something we already know how to sample from. We can also know that we can bound $f_X$ by $Mf_Y$ where $M\ge1$.
We now form a new random variable $A$ where $A | y \sim \text{Bernoulli } \left (\frac{f_X(y)}{Mf_Y(y)}\right )$, this takes the value $1$ with probability $\frac{f_X(y)}{Mf_Y(y)} $ and $0$ otherwise. This represents the algorithm 'accepting' a draw from $Y$.
Now we run the algorithm and collect all the draws from $Y$ that are accepted, lets call this random variable $Z = Y|A=1$.
To show that $Z \equiv X$, for any event $E$, we must show that $P(Z \in E) =P(X \in E)$.
So let's try that, first use Bayes' rule:
$P(Z \in E) = P(Y \in E | A =1) = \frac{P(Y \in E \& A=1)}{P(A=1)}$,
and the top part we write as
\begin{align*}P(Y \in E \& A=1) &= \int_E f_{Y, A}(y,1) \, dy \\ &= \int_E f_{A|Y}(1,y)f_Y(y) \, dy =\int_E f_Y(y) \frac{f_X(y)}{Mf_Y(y)} \, dy =\frac{P(X \in E)}{M}.\end{align*}
And then the bottom part is simply
$P(A=1) = \int_{-\infty}^{\infty}f_{Y,A}(y,1) \, dy = \frac{1}{M}$,
by the same reasoning as above, setting $E=(-\infty, +\infty)$.
And these combine to give $P(X \in E)$, which is what we wanted, $Z \equiv X$.
That is how the algorithm works, but at the end of your question you seem to be concerned about a more general idea, that is when does an empirical distribution converge to the distribution sampled from? This is a general phenomenon concerning any sampling whatsoever if I understand you correctly.
In this case, let $X_1, \dots, X_n$ be iid random variables all with distribution $\equiv X$. Then for any event $E$, $\frac{\sum_{i=1}^n1_{X_i \in E}}{n}$ has expectation $P(X \in E)$ by the linearity of expectation.
Furthermore, given suitable assumptions you could use the strong law of large numbers to show that the empirical probability converges almost surely to the true probability.
Best Answer
I will try to explain how rejection sampling works "in general". If I am clear enough, you should then be able to answer your own question. I won't give proofs but intuitive (or I hope so) "facts".
Fact 1 If $X$ is drawn in a distribution of density $g(x)$ and $U$ in a uniform $U(0,1)$, then the point $(X, U\times M\times g(X))$ is drawn uniformly in the subgraph of $M g(x)$. Illustration with the normal density and $M = 5$:
This fact should be intuitive from the definition of the density: the height $g(x)$ of the subgraph above $x$ is the "probability" to fall in $x$.
Fact 2 If you can draw uniformly points $(X,Y)$ in the subgraph of $f(x)$ (a density function), then $X$ is distributed with density $f(x)$.
Fact 3 If $f(x) \le M g(x)$ and you draw points uniformly in the subgraph of $M g(x)$, the points which fall in the subgraph of $f(x)$ are uniformly distributed in it.
Illustration with the tent function $f(x) = \begin{cases} 0 & \text{if } |x| \ge 1 \\ x+1 & \text{if } -1 < x \le 0 \\ 1-x & \text{if } 0 < x < 1 \end{cases}$
All these facts together implies that if you keep only the points $X$ such that $U \times M \times G(X) \le f(X)$, these points are distributed with density $f$.
Of course this example is only intended for illustration, the proposed algorithm would be a bad way to sample in the tent distribution.
Note that one nice feature of importance sampling, that you can deduce from the above argumentation, is that if you know $f(x)$ only up to some unknown multiplicative constant, you can use the algorithm as well.