The best you can do is a Lipschitz estimate at zero itself, that is, there is an $A > 0$ such that
\begin{equation*}
\sup \left\{ \|p\| \, \mid \, p \in \partial f(x), \, \, x \in B(0,r) \right\} \leq A r.
\end{equation*}
(Notice $f$ is differentiable at $0$ and $Df(0) = 0$.) In fact, there is also a lower bound:
\begin{equation*}
\|p\| \geq c_{1}\|x\| \quad \text{if} \, \, p \in \partial f(x).
\end{equation*}
The lower bound is much simpler to obtain. If $x \in \mathbb{R}^{n} \setminus \{0\}$ and $p \in \partial f(x)$, then
\begin{equation*}
0 = f(0) \geq f(x) - \langle p, x \rangle \geq c_{1} \|x\|^{2} - \langle p, x \rangle.
\end{equation*}
Hence $\langle p, \frac{x}{\|x\|} \rangle \geq c_{1}\|x\|$, implying $\|p\| \geq c_{1} \|x\|$.
To obtain the upper bound, fix $r > 0$ and assume that $\|x\| \leq r$. If $p \in \partial f(x)$ and $\xi \in S^{n-1}$, then
\begin{equation*}
c_{2} \|x + r \xi\|^{2} \geq f(x + r \xi) \geq f(x) + r \langle p, \xi \rangle \geq c_{1} \|x\|^{2} + r \langle p, \xi \rangle.
\end{equation*}
Thus,
\begin{equation*}
\langle p, \xi \rangle \leq \frac{1}{r} \left( c_{2} \|x\|^{2} + c_{2} r^{2} + 2 c_{2} r \langle x, \xi \rangle - c_{1} \|x\|^{2} \right) \leq (4c_{2} - c_{1}) r.
\end{equation*}
This gives us the desired upper bound with $A = 4c_{2} - c_{1}$.
The reason this is the best we could hope for (i.e. we can't get $Df$ Lipschitz in a neighborhood of $0$) is $c_{1} < c_{2}$. Therefore, as soon as you get away from $0$, $f$ has a positive amount of space to wiggle around --- if $f$ can wiggle around a bunch, then certainly $Df$ can't be constrained, there's no reason for it to be continuous.
To make this more precise, let's go to $n = 1$. Let $h : [0,\infty] \to [c_{1},c_{2}]$ be a non-decreasing function with a sequence of discontinuities converging at zero. Define $g : \mathbb{R} \to \mathbb{R}$ with $g(x) = -g(-x)$ by
\begin{equation*}
g(x) = 2 h(x) x.
\end{equation*}
$g$ is non-decreasing. Therefore, the function $f : \mathbb{R} \to \mathbb{R}$ given by $f(x) = \int_{0}^{x} g(s) \, ds$ is convex. Furthermore, if $x > 0$, then
\begin{equation*}
c_{1}|x|^{2} \leq f(x) \leq c_{2} |x|^{2}.
\end{equation*}
Since $f(x) = f(-x)$, $f$ satisfies the desired inequalities, but $f'$ is not continuous in any neighborhood of zero.
We can generalize the previous example to $n > 1$ simpy by setting $F(x) = f(\|x\|)$.
By the fundamental theorem
$$
f(y) - f(x) - \nabla f(x)^T(y-x) = \int_0^1 (\nabla f(x+t(y-x))-\nabla f(x))^T(y-x) dt.
$$
Let me first prove the claim for $O=\mathbb R^n$.
Using (2) this implies
$$
|f(y) - f(x) - \nabla f(x)^T(y-x)| \le \frac12 \|y-x\|^2.
$$
Now take $d\in \mathbb R^n$. Then the inequality above implies
$$
f(y+d) - f(x) - \nabla f(x)^T(y+d-x) \le \frac12 \|y+d-x\|^2\\
f(x-d) - f(y) - \nabla f(y)^T(x-d-y) \le \frac12 \|y+d-x\|^2\\
-(f(y+d) - f(y) - \nabla f(y)^Td) \le \frac 12\|d\|^2\\
-(f(x-d) - f(x) - \nabla f(x)^T(-d)) \le \frac 12\|d\|^2.
$$
Addding all these inequalities results in
$$
(\nabla f(x) - \nabla f(y))^T(x-y-2d)
\le \|y+d-x\|^2 + \|d\|^2.
$$
Let me set $g:=\nabla f(x) - \nabla f(y)$ and choose $d$ such that $x-y-2d=g$.
Then $d = \frac12(x-y-g)$ and $y+d-x = -\frac12(x-y+g)$.
This results in
$$
\|g\|^2 \le \frac14\|x-y+g\|^2 + \frac14\| x-y-g\|^2
=\frac12 \|x-y\|^2 + \frac12\|g\|^2,
$$
which implies (1).
Let now $x,y\in O\ne \mathbb R^n$. Then $B_\rho(x),B_\rho(y)\subset O$ for some $\rho>0$. In addition, $B_\rho(\lambda x+(1-\lambda y))\subset O$ for all $\lambda \in (0,1)$.
Now take $\tilde x,\tilde y$ on the line between $x$ and $y$
such that $\|\tilde x-\tilde y\|<\rho$.
Take $d$ such that $\tilde y+d,\tilde x-d\in O$.
As above, we get
$$
(\nabla f(\tilde x) - \nabla f(\tilde y))^T(\tilde x-\tilde y-2d)
\le \|\tilde y+d-\tilde x\|^2 + \|d\|^2.
$$
Denote $\tilde g:=\nabla f(\tilde x) - \nabla f(\tilde y)$.
Now I would like to set $d:=\frac12(\tilde x-\tilde y - t\tilde g)$ for some $t>0$ small.
Note
$$
\tilde y+d = \frac12(\tilde x+\tilde y) -\frac t2 \tilde g,\quad
\tilde x-d = \frac12(\tilde x+\tilde y) +\frac t2 \tilde g.
$$
Hence it is enough to choose $t := \min(1,\frac\rho{\|\tilde g\|})$. Putting the choice of $d$ in above inequality results in
$$
t \|\tilde g\|^2
\le \frac14\|\tilde x-\tilde y+\tilde g\|^2 + \frac14\| \tilde x-\tilde y-\tilde g\|^2
=\frac12 \|\tilde x-\tilde y\|^2 + \frac{t^2}2\|\tilde g\|^2.
$$
If $t=\frac\rho{\|\tilde g\|}$ then
$$
\rho \|\tilde g\|
\le \frac12 \|\tilde x-\tilde y\|^2 + \frac{\rho^2}2 < \rho^2.
$$
Then $\|\tilde g\|<\rho$, and $t=1<\frac\rho{\|\tilde g\|}$, a contradiction. Hence $t$ is equal to $1$, and
$$
\|\nabla f(\tilde x) - \nabla f(\tilde y)\| \le \|\tilde x-\tilde y\|
$$
for all $\tilde x,\tilde y$ on the line between $x$ and $y$ such that $\|\tilde x-\tilde y\|< \rho $.
Now let $n>\frac{1}\rho\|x-y\|$. Divide the line between $x$ and $y$ into $n$ parts of equal length. Set
$x_i:=x+\frac in(x-y)$, $i=0\dots n$.
Then $\|x_i-x_{i+1}\|<\rho$ and hence $\|\nabla f(x_i) - \nabla f(x_{i+1})\| \le \|x_i-x_{i+1}\|$. Then
$$
\|\nabla f(x)-\nabla f(y)
\le \sum_{i=0}^{n-1} \|\nabla f(x_i) - \nabla f(x_{i+1})\| \\
\le \sum_{i=0}^{n-1}\|x_i-x_{i+1}\| \\
= \|x-y\|,
$$
which finishes the proof.
Best Answer
Here's how to do it if $f$ is defined on all of $\mathbb{R}^n$: If $\nabla f(x)=0$, we're done, otherwise, take $y=x+\nabla f(x)$. Then by convexity and Lipschitzness we have $$L\|\nabla f(x)\| =L \|x-y\|\geq | f(y)-f(x)|\geq \left|\left<\nabla f(x),\nabla f(x)\right>\right|=\|\nabla f(x)\|^2$$ which gives $\|\nabla f(x)\|\leq L$.
If $f$ is defined on a convex set and $x$ is an interior point, the same argument but with $y=x+\eta\nabla f(x)$ for some small $\eta>0$ gives us the same bound.
The annoying case is when $x$ is a point where the gradient is defined but moving in the direction of $\nabla f(x)$ takes us out of the convex set $K$. This can happen when $x$ is on the boundary of $K$ and yet we can approach it from each coordinate direction from within $K$ to get a one-sided partial derivative. Then, by convexity of the set, observe that either $\nabla f(x)$ or $-\nabla f(x)$ points in a direction where there are points from $K$. If it's $\nabla f(x)$, we already saw how to deal with it. So suppose that for all $\eta>0$ small enought, $x-\eta\nabla f(x)\in K$. Then we use the coordinate-free formulation of the gradient, which says that $$\lim_{\|x-y\|\to0} \frac{f(y)-f(x)-\left<\nabla f(x),y-x\right>}{\|y-x\|} =0$$ which in our case gives $$ \lim_{\eta\to 0} \frac{f(x-\eta\nabla f(x))-f(x) +\eta\|\nabla f(x)\|^2}{\eta\|\nabla f(x)\|} = 0$$ and thus $$ \|\nabla f(x)\| = \lim_{\eta\to0}\frac{f(x-\eta\nabla f(x))-f(x)}{\eta\|\nabla f(x)\|}$$ and this is a limit of quantities each of which is $\leq L$.