We have the quadratically constrained linear program (QCLP) in $\beta \in \mathbb R^p$
$$\begin{array}{ll} \text{minimize} & \langle \mathrm w, \beta \rangle\\ \text{subject to} & \| \mathrm X (\beta - \hat{\beta}) \|_2^2 \leq \gamma^2 \| \mathrm y - \mathrm X \hat{\beta} \|_2^2\end{array}$$
where $\gamma > 0$. Let
$$r := \gamma \| \mathrm y - \mathrm X \hat{\beta} \|_2$$
and
$$\mathrm z := \mathrm X (\beta - \hat{\beta})$$
If $\mathrm X$ has full column rank, then
$$\beta = \hat{\beta} + (\mathrm X^{\top} \mathrm X)^{-1} \mathrm X^{\top} \mathrm z$$
Hence, we have the following QCLP in $\mathrm z \in \mathbb R^n$
$$\begin{array}{ll} \text{minimize} & \langle \mathrm w, (\mathrm X^{\top} \mathrm X)^{-1} \mathrm X^{\top} \mathrm z \rangle\\ \text{subject to} & \| \mathrm z \|_2^2 \leq r^2\end{array}$$
which can be rewritten as
$$\begin{array}{ll} \text{minimize} & \langle \tilde{\mathrm w}, \mathrm z \rangle\\ \text{subject to} & \| \mathrm z \|_2^2 \leq r^2\end{array}$$
where $\tilde{\mathrm w} := \mathrm X (\mathrm X^{\top} \mathrm X)^{-1} \mathrm w$. Since the objective function is linear and the feasible region is convex, the minimum should be attained on the boundary of the Euclidean ball of radius $r$ centered at the origin. Hence, let us solve the following equality-constrained (non-convex) QCLP instead
$$\begin{array}{ll} \text{minimize} & \langle \tilde{\mathrm w}, \mathrm z \rangle\\ \text{subject to} & \| \mathrm z \|_2^2 = r^2\end{array}$$
We define the Lagrangian
$$\mathcal L (\mathrm z, \lambda) := \langle \tilde{\mathrm w}, \mathrm z \rangle - \frac{\lambda}{2} (\mathrm z^{\top} \mathrm z - r^2)$$
which produces the minimizer
$$\mathrm z_{\min} := - r \, \frac{\tilde{\mathrm w} \,\,}{\| \tilde{\mathrm w} \|_2} = - r \, \frac{\mathrm X (\mathrm X^{\top} \mathrm X)^{-1} \mathrm w \,\,}{\| \mathrm X (\mathrm X^{\top} \mathrm X)^{-1} \mathrm w \|_2}$$
Thus, the minimizer is
$$\boxed{\beta_{\min} := \hat{\beta} + (\mathrm X^{\top} \mathrm X)^{-1} \mathrm X^{\top} \mathrm z_{\min} = \hat{\beta} - r \, \frac{(\mathrm X^{\top} \mathrm X)^{-1} \mathrm w \,\,}{\| \mathrm X (\mathrm X^{\top} \mathrm X)^{-1} \mathrm w \|_2}}$$
and the minimum is
$$\begin{array}{rl} \langle \mathrm w, \beta_{\min} \rangle &= \langle \mathrm w, \hat{\beta} \rangle - r \, \dfrac{\mathrm w^{\top} (\mathrm X^{\top} \mathrm X)^{-1} \mathrm w \,\,}{\| \mathrm X (\mathrm X^{\top} \mathrm X)^{-1} \mathrm w \|_2}\\\\ &= \langle \mathrm w, \hat{\beta} \rangle - r \, \dfrac{\mathrm w^{\top} (\mathrm X^{\top} \mathrm X)^{-1} \mathrm w}{\sqrt{\mathrm w^{\top} (\mathrm X^{\top} \mathrm X)^{-1} \mathrm w}}\\\\ &= \langle \mathrm w, \hat{\beta} \rangle - r \sqrt{\mathrm w^{\top} (\mathrm X^{\top} \mathrm X)^{-1} \mathrm w}\end{array}$$
Unless you have that the equality holds at optimality even though you relax the equality to $\max(b,c) \leq a$, you have to, as you say, suck it up and go for a MILP representation.
The representation you are looking for is a disjunction of the two cases $a = b, b\geq c$ and $a = c, c \geq b$. A simple big-M model for that would be to introduce two binaries $\delta_1$ and $\delta_2$ and
$$-M(1-\delta_1) \leq a-b \leq M(1-\delta_1), c\leq b + M(1-\delta_1),\\
-M(1-\delta_2) \leq a-c \leq M(1-\delta_2), b\leq c + M(1-\delta_2), \\\delta_1+\delta_2 =1$$
with carefully selected as-small-as-possible constants $M$ (different on all the constraints)
Best Answer
Yes, this is part of a set of standard tricks that can be used to linearize polynomial terms involving integer, and especially binary variables. There are lots of such transformations, and most good integer programming (and especially mixed integer nonlinear programming) texts summarize them.
The best way to understand the transformation is to take it on a case by case basis. First consider $\delta_i = 0$. The first set of inequalities guarantees in this case that $z_i = 0$ which is what we want, and all the other constraints are redundant. If $\delta_i = 1$, then the first set of inequalities simply enforces the bounds since $z_i = y_i$ in this case. Moreover, $1 - \delta_i = 0$ means that the last 2 constraints enforce $z_i = y_i$.
There might be other ways to transform the quadratic term. For instance you could use some Big M type models, but those are usually not desirable since they yield weak relaxations if you pick your Big M parameter wrong.
You can do away with some of the constraints if your objective function "pushes" your variables in the right direction.