I think I understand your problem is one of a model building approach and how to gauge whether a difference in the $R^2$ value, before and after adjusting for a certain variable, is of notable magnitude rather than a consequence of spurious associations in the data. This calls to mind information criteria (AIC and BIC) which encourage parsimonious models by penalizing the likelihood by a number of parameters. In OLS models, the $R^2$ is related to the likelihood, so if the approach stems from this problems, there have been other developments.
To address your specific question: One minimal assumption is that any subsequent covariate must not be collinear with covariates in your provisional model ($k$ taking any one of $1, 2, \ldots, p-1$). This assumption allows the inequality to be strict. Nothing further may be said in general. This is because the change $R_{k+1}^2 - R_{k}^2$ will be a function of the residuals $r_k^2$ or design $\mathbf{X}_k$ for a provisional model as well as the distribution of $x_{k+1}$. I suspect it is possible to choose sequences of either $\{x_{k+1}\}_i$ or $\{r_k\}_i$ such that $R_{k+1}^2 - R_{k}^2 \rightarrow_p 0$. To see this, consider $x_{k+1}$ an indicator function for the 1st observation, and take $y_1 \rightarrow x_{k+1} \mathbf{X}_k \beta$. Then $R_{k+1}^2 - R_{k}^2 \rightarrow_p 0$. Furthermore, conditioning on all these factors is trivial since it's equivalent to conducting the very inference you are interested in flat-out.
It can be said however, that with a full rank design matrix of rank $n$, you will have $R^2=1$ exactly, but that also is trivial and does not shed any light on any of the previous covariates. If you were interested in the random process of including subsequent, unrelated vectors to a model there might be some specific derivations under a number of highly sensitive assumptions like multivariate normality, independence or orthogonality.
I will give you an aswer considering a regression based of $Y$ on $X_1$ and $X_2$ only, since it is clearer and it simplifies the algebra. It does not change anything in terms of intuition or results.
Note that I am assuming that the intercept is part of $X_1$, and that you regress it only once (which is all that matters, you could also assume that the vector of $1$ is part of $X_2$.
First we regress $Y = X_2 \beta_2 + \epsilon$ (so, without $X_1$)
The coefficient $\hat{\beta}_2 = (X_2'X_2)^{-1}X_2'Y$
Therefore the residual $e= Y -X_2(X_2'X_2)^{-1}X_2'Y = [\mathbf{I} -X_2(X_2'X_2)^{-1}X_2']Y = M_2 Y$, where $M_2$ is the orthogonal component to $X_2$
Second, we regress the auxiliary $X_1 = X_2 \gamma_2 + \epsilon$
Following the same steps you will find that $e_{aux} = M_2X_1$
As a final step, we regress $e = e_{aux}\delta$ and we want to show that $\hat{\delta} = \hat{\alpha}_2$, where $\alpha_2$ is the coefficient from the "full" regression $Y = X_1\alpha_1 + X_2\alpha_2 + \epsilon$
Substituting from the first two parts:
$M_2Y = M_2X_1\delta +\epsilon$
Therefore $\hat{\delta} = (X_1'M_2'M_2X_1)^{-1}X_1'M_2'M_2Y = (X_1'M_2X_1)^{-1}X_1'M_2Y$ since $M_2$ is idempotent.
Finally, what is the FWL theorem telling you? That running the whole regression or doing this steps gives the same coefficient, since we are removing from each variable the part already explained by the others.
How to see that it is actually the same?
Set the normal equations for the full regression, compute $\hat{\alpha_1}$ and $\hat{\alpha_2}$, plug one of the two in the expression of the other. You will eventually obtain the same matrix result I show for $\delta$.
Best Answer
We have $$\begin{align*} R^2 = 1 - \frac{\sum{e_i^2}}{\sum{(y_i - \bar{y})^2}} = 1 - \frac{e^\prime e}{\tilde{y}^\prime\tilde{y}}, \end{align*}$$ where $\tilde{y}$ is a vector $y$ demeaned.
Recall that $\hat{\beta} = (X^\prime X)^{-1} X^\prime y$, implying that $e= y - X\hat{\beta} = y - X(X^\prime X)^{-1}X^\prime y$. Regression on a vector of 1s, written as $l$, gives the mean of $y$ as the predicted value and residuals from that model produce demeaned $y$ values; $\tilde{y} = y - \bar{y} = y - l(l^\prime l)^{-1}l^\prime y$.
Let $H = X(X^\prime X)^{-1}X^\prime$ and let $M = l(l^\prime l)^{-1}l^\prime$, where $l$ is a vector of 1's. Also, let $I$ be an identity matrix of the requisite size. Then we have
$$\begin{align*} R^2 &= 1- \frac{e^\prime e}{\tilde{y}^\prime\tilde{y}} \\ &= 1 - \frac{y^\prime(I - H)^\prime(I-H)y}{y^\prime (I - M)^\prime(I-M)y} \\ &= 1 - \frac{y^\prime(I-H)y}{y^\prime (I-M)y}, \end{align*}$$
where the second line comes from the fact that $H$ and $M$ (and $I$) are idempotent.
In the weighted case, let $\Omega$ be the weighting matrix used in the OLS objective function, $e^\prime \Omega e$. Additionally, let $H_w = X \Omega^{1/2} (X^\prime \Omega X)^{-1} \Omega^{1/2} X^\prime$ and $M_w = l \Omega^{1/2}(l^\prime \Omega l)^{-1} \Omega^{1/2} l^\prime$. Then, $$\begin{align*} R^2 &= 1 - \frac{y^\prime \Omega^{1/2} (I-H_w) \Omega^{1/2} y}{y^\prime \Omega^{1/2}(I-M_w) \Omega^{1/2}y}, \end{align*}$$