To elaborate on my comment, suppose $f: \mathbb{R}^n \to \mathbb{R}$ is $C^2$. Then Taylor's theorem gives
$$f(x) = f(x_0) + \frac{\partial f (x_0)}{\partial x} (x-x_0) + \int_0^1 (1-t) (x-x_0)^T \frac{\partial^2 f (x_0+t (x-x_0))}{\partial x^2} (x-x_0) \, dt .$$
Now suppose that $m \leq \frac{\partial^2 f (x)}{\partial x^2} \leq M$ on some convex set $K \subset \mathbb{R}^n$. Then is is easy to see that if $x,x_0 \in K$, then
$$f(x_0) + \frac{\partial f (x_0)}{\partial x} (x-x_0) + \frac{m}{2} \|x-x_0\|^2 \leq f(x) \leq f(x_0) + \frac{\partial f (x_0)}{\partial x} (x-x_0) + \frac{M}{2} \|x-x_0\|^2 .$$
This is true for any $f \in C^2$, not just convex $f$. If $f$ happens to be convex, then you know that $m=0$ will serve as a lower bound. By considering the convex function $f(x) = x^4$ near $0$, you can see that even a strictly convex function may have $m=0$ as the 'best' lower bound in some sense.
I assume that your question concerns convex functions only; without convexity much of it would be false.
Question 2: strictly speaking, being Lipschitz smooth ($C^{1,1}$) does not imply $\nabla^2 f$ exists. But the statement is true if we interpret $\nabla^2 f\preceq LI$ as holding almost everywhere. Indeed, $\nabla^2 f$ is a positive semidefinite matrix, so having $\nabla^2 f\preceq LI$ a.e. is equivalent to $\nabla^2 f\in L^\infty$. And it is well known that having $L^\infty$ derivative is equivalent to being Lipschitz; thus $$\nabla^2 f\in L^\infty \iff \nabla f\in C^{0,1} \iff f\in C^{1,1}$$
Question 3: You misremembered. The correct inequality characterizing $\alpha$-strong convexity is
$$f(x+y) \ge f(x) + y^\top\nabla f(x) + \frac{\alpha}{2} \| x - y \|^2 \tag{1}$$
Indeed, (1) is equivalent to saying that the function $g(x)=f(x)-\frac{\alpha}{2} \| x \|^2$ is convex. The latter is equivalent to $\nabla^2 g\succeq 0$, which is $\nabla^2 f\succeq \alpha\, I$.
Question 4. Yes, there is a direct and important relation: a function is strongly convex if and only if its convex conjugate (a.k.a. Legendre-Fenchel transform) is Lipschitz smooth. Indeed, the gradients maps are inverses of each other, which implies that the Hessian of convex conjugate of $f$ is the inverse of the Hessian of $f$ (at an appropriate point). So, a uniform upper bound on $\nabla^2 f$ is equivalent to a uniform lower bound on $\nabla^2 (f^{*}) $, and vice versa. One can also argue without referring to the Hessian (which may fail to exist at some points): the Lipschitz smoothness of $f$, by your item 1, gives us at every $x_0$ a quadratic function $q$ so that $q(x_0)=f(x_0)$ and $f \le q$ everywhere. Taking convex conjugate reverses the order: $q^*\le f^*$; and this means that $f^*$ is strongly convex.
Question 1. The converse is true, but the only proof I see goes through the convex conjugate as described in Q4. Since strong convexity is characterized by the comparison property (1), taking the conjugate gives a matching characterization of Lipschitz smoothness.
Reference: Chapter 5 of Convex functions by Jonathan M. Borwein and Jon D. Vanderwerff.
Best Answer
The best you can do is a Lipschitz estimate at zero itself, that is, there is an $A > 0$ such that \begin{equation*} \sup \left\{ \|p\| \, \mid \, p \in \partial f(x), \, \, x \in B(0,r) \right\} \leq A r. \end{equation*} (Notice $f$ is differentiable at $0$ and $Df(0) = 0$.) In fact, there is also a lower bound: \begin{equation*} \|p\| \geq c_{1}\|x\| \quad \text{if} \, \, p \in \partial f(x). \end{equation*}
The lower bound is much simpler to obtain. If $x \in \mathbb{R}^{n} \setminus \{0\}$ and $p \in \partial f(x)$, then \begin{equation*} 0 = f(0) \geq f(x) - \langle p, x \rangle \geq c_{1} \|x\|^{2} - \langle p, x \rangle. \end{equation*} Hence $\langle p, \frac{x}{\|x\|} \rangle \geq c_{1}\|x\|$, implying $\|p\| \geq c_{1} \|x\|$.
To obtain the upper bound, fix $r > 0$ and assume that $\|x\| \leq r$. If $p \in \partial f(x)$ and $\xi \in S^{n-1}$, then \begin{equation*} c_{2} \|x + r \xi\|^{2} \geq f(x + r \xi) \geq f(x) + r \langle p, \xi \rangle \geq c_{1} \|x\|^{2} + r \langle p, \xi \rangle. \end{equation*} Thus, \begin{equation*} \langle p, \xi \rangle \leq \frac{1}{r} \left( c_{2} \|x\|^{2} + c_{2} r^{2} + 2 c_{2} r \langle x, \xi \rangle - c_{1} \|x\|^{2} \right) \leq (4c_{2} - c_{1}) r. \end{equation*} This gives us the desired upper bound with $A = 4c_{2} - c_{1}$.
The reason this is the best we could hope for (i.e. we can't get $Df$ Lipschitz in a neighborhood of $0$) is $c_{1} < c_{2}$. Therefore, as soon as you get away from $0$, $f$ has a positive amount of space to wiggle around --- if $f$ can wiggle around a bunch, then certainly $Df$ can't be constrained, there's no reason for it to be continuous.
To make this more precise, let's go to $n = 1$. Let $h : [0,\infty] \to [c_{1},c_{2}]$ be a non-decreasing function with a sequence of discontinuities converging at zero. Define $g : \mathbb{R} \to \mathbb{R}$ with $g(x) = -g(-x)$ by \begin{equation*} g(x) = 2 h(x) x. \end{equation*} $g$ is non-decreasing. Therefore, the function $f : \mathbb{R} \to \mathbb{R}$ given by $f(x) = \int_{0}^{x} g(s) \, ds$ is convex. Furthermore, if $x > 0$, then \begin{equation*} c_{1}|x|^{2} \leq f(x) \leq c_{2} |x|^{2}. \end{equation*} Since $f(x) = f(-x)$, $f$ satisfies the desired inequalities, but $f'$ is not continuous in any neighborhood of zero.
We can generalize the previous example to $n > 1$ simpy by setting $F(x) = f(\|x\|)$.