I assume that your question concerns convex functions only; without convexity much of it would be false.
Question 2: strictly speaking, being Lipschitz smooth ($C^{1,1}$) does not imply $\nabla^2 f$ exists. But the statement is true if we interpret $\nabla^2 f\preceq LI$ as holding almost everywhere. Indeed, $\nabla^2 f$ is a positive semidefinite matrix, so having $\nabla^2 f\preceq LI$ a.e. is equivalent to $\nabla^2 f\in L^\infty$. And it is well known that having $L^\infty$ derivative is equivalent to being Lipschitz; thus $$\nabla^2 f\in L^\infty \iff \nabla f\in C^{0,1} \iff f\in C^{1,1}$$
Question 3: You misremembered. The correct inequality characterizing $\alpha$-strong convexity is
$$f(x+y) \ge f(x) + y^\top\nabla f(x) + \frac{\alpha}{2} \| x - y \|^2 \tag{1}$$
Indeed, (1) is equivalent to saying that the function $g(x)=f(x)-\frac{\alpha}{2} \| x \|^2$ is convex. The latter is equivalent to $\nabla^2 g\succeq 0$, which is $\nabla^2 f\succeq \alpha\, I$.
Question 4. Yes, there is a direct and important relation: a function is strongly convex if and only if its convex conjugate (a.k.a. Legendre-Fenchel transform) is Lipschitz smooth. Indeed, the gradients maps are inverses of each other, which implies that the Hessian of convex conjugate of $f$ is the inverse of the Hessian of $f$ (at an appropriate point). So, a uniform upper bound on $\nabla^2 f$ is equivalent to a uniform lower bound on $\nabla^2 (f^{*}) $, and vice versa. One can also argue without referring to the Hessian (which may fail to exist at some points): the Lipschitz smoothness of $f$, by your item 1, gives us at every $x_0$ a quadratic function $q$ so that $q(x_0)=f(x_0)$ and $f \le q$ everywhere. Taking convex conjugate reverses the order: $q^*\le f^*$; and this means that $f^*$ is strongly convex.
Question 1. The converse is true, but the only proof I see goes through the convex conjugate as described in Q4. Since strong convexity is characterized by the comparison property (1), taking the conjugate gives a matching characterization of Lipschitz smoothness.
Reference: Chapter 5 of Convex functions by Jonathan M. Borwein and Jon D. Vanderwerff.
For concrete example 1:
The function $f(x) = \frac{1}{2}||x||^{2}$ is $m$-strongly convex, with $m\in (-\infty, 1]$. We say it is $1$ strongly convex because this is the largest $m$ where the inequality, $(\nabla f(x)-\nabla f(y))^{T}(x-y) \geq m ||x-y||^{2}$ holds.
For concrete example 2:
If $(\nabla f(x)-\nabla f(y))^{T}(x-y) > 0$ it is not necessarily true that there is "some constant factor m, extremely small, that makes this equality true". For example $f(x) = \frac{1}{4}x^{4}$ is convex but there is no $m>0$ where $(\nabla f(x)-\nabla f(y))^{T}(x-y) \geq m ||x-y||^{2}$, i.e., If it is possible then $(x^3-y^3)(x-y) \geq m ||x-y||^{2}$ and setting $y=0$ we get that $x^4 \geq m x^2 \implies x^2(x^2-m) \geq 0$. But, when $x < \sqrt{m}$, then $x^2(x^2-m)<0$ therefore we can always find an $x$ small enough where $(\nabla f(x)-\nabla f(y))^{T}(x-y) \geq m ||x-y||^{2}$ doesn't hold for any $m>0$.
Question: does $m$ matter when we say that a function is $m$-strongly convex?
The reason that $m$ matters is because algorithms such as gradient descent will not converge if the step size $\alpha_{k}$ isn't selected appropriately, where $k$ is $k$th iteration of the algorithm. There are many different step size rules depending on the assumptions made on $f$. You can use diminishing step sizes $\alpha_{k} = \frac{1}{k}$ but as the algorithm runs longer the steps made will get smaller and smaller, i.e., slower. The algorithm would converge faster if we could use a constant step size, $\alpha_k = \alpha$. For the class of $m$-strongly convex function that are $L$-Lipschitz then gradient descent will converge if $\alpha \in (0,\frac{2}{L})$, page 15 (pdf page 13) https://stanford.edu/~boyd/papers/pdf/monotone_primer.pdf. The optimal step size is when $\alpha = \frac{2}{m+L}$. The reason why $m$ is important is that it improves the convergence rate since the optimal step size directly depends on $m$. If we use a smaller $m$ then $\alpha$ will be smaller and we will converge slower. Additionally, depending on the algorithm or type of problem you are looking at the bounds on a constant step size might directly depend on $m$. For example, using gradient descent to find a zero of a monotone operator for $m$-strongly monotone and $L$-lipschitz problems need $\alpha \in (0, \frac{2m}{L^{2}})$ page 16 (pdf page 14) https://stanford.edu/~boyd/papers/pdf/monotone_primer.pdf.
Best Answer
For example, if $f(x) = \|x\|^2$, then $\nabla f(x) = 2 x$, and it's easy to show that your inequality becomes an equality if $\mu = 2$.