Okay, the proof is actually simple from this point. Choose the domain $\lambda \in (0, \frac{2}{\beta})$. This is chosen as at $\frac{2}{\beta}$, the map becomes non-expansive (this is easy to verify).
Recall the bound we had earlier:
$$
\|Tx - Ty \|^{2} \leq \left(1 - 2\lambda\left(\frac{\alpha\beta}{\alpha + \beta}\right)\right)\|x - y\|^{2} + \left(\lambda^{2} - \frac{2\lambda}{\alpha + \beta}\right)\|\nabla f(x) - \nabla f(y)\|^{2}
$$
When $\lambda \leq \frac{2}{\alpha +\beta}$, the second term on the right-hand side is nonpositive and thus we can replace $\|\nabla f(x) - \nabla f(y)\|$ with a lower bound. We will use the lower bound that follows trivially from $\alpha$-strong convexity:
$$
\|\nabla f(x) - \nabla f(y)\| \geq \alpha \|x - y\|
$$
Plugging this in then gives:
$$
\|Tx - Ty \|^{2} \leq \left(1 - 2\lambda\left(\frac{\alpha\beta}{\alpha + \beta}\right)\right)\|x - y\|^{2} + \left(\lambda^{2}\alpha^{2} - \frac{2\alpha^{2}\lambda}{\alpha + \beta}\right)\|x - y\|^{2}
$$
Rearranging and simplifying then yields:
$$
\|Tx - Ty\|^{2} \leq (\lambda{2}\alpha^{2} - 2\lambda\alpha + 1)\|x-y\|^{2}
$$
Now factorising we have
$$
\|Tx - Ty\|^{2} \leq (1- \lambda\alpha)^{2}\|x-y\|^{2} \iff \|Tx - Ty\| \leq (1- \lambda\alpha)\|x - y\|
$$
Note that when $\lambda \leq \frac{2}{\alpha + \beta}$, that $1 - \lambda\alpha = \max\{|1 - \lambda\alpha|, |1 - \lambda\beta|\}$:
- $1 - \lambda\alpha \geq 1- \lambda\beta$ follows from $\beta > \alpha$
- $1 - \lambda\alpha \geq \lambda\beta - 1 \iff 2 \geq \lambda(\alpha + \beta) \iff \frac{2}{\alpha + \beta} \geq \lambda$ which is true by assumption.
- $1 - \lambda\alpha \geq \alpha\lambda - 1$ follows from the above fact and the fact that $\beta > \alpha$.
Thus we have proved that gradient descent is a $\max \{|1 - \lambda\alpha|, |1 - \lambda\beta|\}$ contraction for all $\lambda \leq \frac{2}{\alpha+\beta}$. For $\lambda \geq \frac{2}{\alpha +\beta}$, note that the second term on the RHS of our inequality is nonnegative, and thus we can replace $\|\nabla f(x) - \nabla f(y)\|$ with an upper bound using $\beta$-smoothness:
$$
\|\nabla f(x) - \nabla f(y)\| \leq \beta \|x - y\|
$$
Plugging this in yields:
$$
\|Tx - Ty \|^{2} \leq \left(1 - 2\lambda\left(\frac{\alpha\beta}{\alpha + \beta}\right)\right)\|x - y\|^{2} + \left(\lambda^{2}\beta^{2} - \frac{2\beta^{2}\lambda}{\alpha + \beta}\right)\|x - y\|^{2}
$$
Simplifying and factorising as before we find that:
$$
\|Tx - Ty\|^{2} \leq (\lambda\beta - 1)^{2}\|x-y\|^{2} \iff \|Tx - Ty\| \leq (\lambda\beta - 1)\|x - y\|
$$
Similarly to before, when $\lambda \geq \frac{2}{\alpha + \beta}$, we see that $\lambda\beta - 1 = \max \{|1 - \lambda\alpha|, |1 - \lambda\beta|\}$. Thus we have shown that gradient descent is a $\max \{|1 - \lambda\alpha|, |1 - \lambda\beta|\}$ contraction for all $\lambda$ such that $\frac{2}{\beta} > \lambda \geq \frac{2}{\alpha + \beta}$.
Combining both cases ($\lambda \geq \frac{2}{\alpha+\beta}$, $\lambda \leq \frac{2}{\alpha +\beta}$) we see that gradient descent is a $\max \{|1 - \lambda\alpha|, |1 - \lambda\beta|\}$ contraction for all $\lambda \in \left(0, \frac{2}{\beta}\right)$, i.e.
$$
\|Tx - Ty\| \leq \max\{|1 - \lambda\alpha|, |1 - \lambda\beta|\}\|x - y\|
$$
It now also immediately obvious why $\frac{2}{\beta}$ is an upper bound on the step size. Setting $\lambda \geq \frac{2}{\alpha + \beta}$ we have
$$
\lambda\beta - 1 = \max\{|1 - \lambda\alpha|, |1 - \lambda\beta|\}
$$
But
$$
\lambda\beta - 1 < 1 \iff \lambda < \frac{\beta}{2}
$$
So bounding $\lambda$ above by $\frac{2}{\beta}$ ensures $\max\{|1 - \lambda\alpha|, |1 - \lambda\beta|\}$ is less than 1.
Best Answer
A convex function $f$ is non-differentiable at a point $x$ iff the subgradient $\nabla f|_x$ has more than one vector. I presume the inequality $\|\nabla f|_x - \nabla f|_y\| \le \beta \|x - y \|$ means $\|v - w\| \le \beta \|x - y\|$ for all $v \in \nabla f|_x$ and $w \in \nabla f|_y$. In particular, for $y = x$ you see that $\nabla f|_x$ must contain only one vector for this to be true.