Definition: A nonempty set $F$ is a face of $A$ if whenever $\alpha x+(1-\alpha) y \in F$, for some $0 \leq \alpha \leq 1$ and $x, y \in A$ then $x, y \in F$.
Lemma: Take any element $\ell \in X'$ (continuous linear functional). We claim that a set $F_{\ell}:=\left\{y \in A \mid \ell(y)=\max _{x \in A} \ell(x)\right\}$ is a face of $A$.
Let $\mathcal F$ be the collection of all compact faces of $A$. Then $A \in\mathcal F$ and thus $\mathcal F \neq \emptyset$. We endow $\mathcal F$ with a partial order $<$ such that $F_1<F_2\iff F_2 \subset F_1$. Let $(F_i)_{i\in I}$ be a chain in $\mathcal F$. First, $\bigcap_{i\in I} F_i$ is compact. If $\alpha x+(1-\alpha) y \in \bigcap_{i\in I} F_i$, for some $0 \leq \alpha \leq 1$ and $x, y \in A$, then $\alpha x+(1-\alpha) y \in F_{i}$ for all $i \in I$. Because $F_{i}$ is a face of $A$ for all $i \in I$, then $x, y\in F_{i}$ for all $i \in I$. Hence $x,y\in\bigcap_{i\in I} F_i$. So $\bigcap_{i\in I} F_i$ is a face of $A$. It follows that $\bigcap_{i\in I} F_i$ is an upper bound of $(F_i)_{i\in I}$.
Then $(\mathcal F, <)$ satisfies conditions in Zorn's lemma. Thus it contains a maximal element $F^*$. Of course, $F^*$ is a compact face of $A$. If $x,y\in F^*$ such that $x \neq y$, then by Hahn-Banach theorem, there is $\ell \in X'$ such that $\ell (x) > \ell (y)$. By our lemma, $F_{\ell}:=\left\{y \in F^* \mid \ell(y)=\max _{x \in A} \ell(x)\right\}$ is a face of $F^*$ and compact. Assume $\alpha x + (1-\alpha)y \in F_\ell$ for some $0 \leq \alpha \leq 1$ and $x, y \in A$. Then $x,y \in F^*$ because $F_\ell \subsetneq F^*$ and $F^*$ is a face of $A$. This combines with the fact that $F_\ell$ is a face of $F^*$ gives $x,y\in F_\ell$. Hence $F_\ell$ is a face of $A$, which is a contradiction. Hence $F^*$ is a singleton containing an extreme point.
Let $E$ be the set of all extreme points of $A$. Clearly, $\overline{\operatorname{conv} A} \subseteq A$ is convex. If $\overline{\operatorname{conv} A} \subsetneq A$. Then there is $a \in A \setminus \overline{\operatorname{conv} A}$. By Hahn-Banach theorem, there is $\ell \in X'$ such that $\max_{x\in \overline{\operatorname{conv} A}} \ell (x) <\ \ell (a)$. Let $F_{\ell}:=\left\{y \in A \mid \ell(y)=\max _{x \in A} \ell(x)\right\}$. Then $F_\ell$ is non-empty compact convex. As shown previously, $F_\ell$ has an extreme point $b$. Clearly, $b \notin \overline{\operatorname{conv} A}$. Moreover, $F_\ell$ is a face of $A$. So $b$ is also an extreme point of $A$, which is a contradiction.
We need the following lemmas:
Lemma 1: Let $C$ be an open convex subset of a Banach space $X$ and $f: C \to \mathbb{R}$ convex. If $f$ is l.s.c., then $f$ is continuous on $C$.
and
Lemma 2: Let $X$ be a n.v.s. Recall that $B(x, r)$ (resp. $\overline B(x, r)$) denotes the open (resp. closed) ball of radius $r$ and center $x$. Fix $a \in X, r>0, \varepsilon \in (0, r)$, and $m, M \in \mathbb R$. Let $f: \overline B(a, r) \to \mathbb R$ be convex.
- If $f(x) \le m$ for all $x \in \overline B(a, r)$, then $|f(x)| \le |m| + 2|f(a)|$ for all $x \in B(a, r)$.
- If $|f(x)| \le M$ for all $x \in \overline B(a, r)$, then $f$ is $\frac{2M}{\varepsilon}$-Lipschitz on $\overline B(a, r - \varepsilon)$.
We define $g:X \to \mathbb R$ by $g(x) := \sup_{f\in \mathcal F} f(x)$. Clearly, $g$ is also lower semi-continuous because, for all $\alpha \in \mathbb R$, we have
$$
\begin{align}
\{x\in X \mid g(x) \le \alpha\} &= \{x\in X \mid f(x) \le \alpha \text{ for all } f\in \mathcal F\} \\
&= \bigcap_{f \in \mathcal F} \underbrace{\{x\in X \mid f(x) \le \alpha\}}_{\text{closed in } X}.
\end{align}
$$
By Lemma 1, $g$ is continuous on $C$. It follows that $g$ is locally bounded on $C$. This in turn implies $\mathcal F$ is locally equi-bounded from above on $C$. The claim then follows from Lemma 2.
Best Answer
WLOG, we assume $a:=0$. By convexity of $f$, we get $$ f(0) \le \frac{1}{2} f(x) + \frac{1}{2} f(-x) \quad \forall x\in \overline B(0, r). $$
Notice that $x \in \overline B(0, r) \iff -x \in \overline B(0, r)$, so $$ f(x) \ge 2f(0)-f(-x) \ge 2f(0)-m \quad \forall x \in \overline B(0, r). $$
It follows that $$ |f(x)| \le \max\{|m|, |2f(0)-m|\} \le 2|f(0)|+|m| \quad \forall x \in \overline B(0, r). $$
WLOG, we assume $a:=0$. Fix $x,y \in \overline B(0, r - \varepsilon)$ such that $x\neq y$. Consider $$ \varphi: \mathbb R \to \mathbb R, t \mapsto \| t(y-x)+x \|. $$
Then $\varphi$ is continuous. Let $T := \{t \in \mathbb R \mid \varphi(t) \le r\}$. There are $t_1, t_2 \in T$ such that $1<t_1<t_2$ and $\varphi_1 :=\varphi (t_1)= r - \varepsilon/2$ and $\varphi_2 :=\varphi(t_2) = r$. Then
It follows that $$ y = \frac{\|\varphi_1-y\| x + \|y-x\| \varphi_1}{\|x-\varphi_1\|}. $$
By convexity of $f$, we have $$ f(y) \le \frac{\|\varphi_1-y\| }{\|x-\varphi_1\|} f(x) + \frac{\|y-x\|}{\|x-\varphi_1\|} f(\varphi_1), $$ which implies $$ \frac{f(y)-f(x)}{\|y-x\|} \le \frac{f(\varphi_1)-f(y)}{\|\varphi_1-y\|}. $$
Similarly, we get $$ \frac{f(\varphi_1)-f(y)}{\|\varphi_1-y\|} \le \frac{f(\varphi_2)-f(\varphi_1)}{\|\varphi_2-\varphi_1\|}. $$
It follows that $$ \frac{f(y)-f(x)}{\|y-x\|} \le \frac{f(\varphi_2)-f(\varphi_1)}{\|\varphi_2-\varphi_1\|} \le \frac{4M}{\varepsilon}. $$
By symmetry, we obtain $$ \frac{f(x)-f(y)}{\|x-y\|} \le \frac{f(\varphi_2)-f(\varphi_1)}{\|\varphi_2-\varphi_1\|} \le \frac{4M}{\varepsilon}. $$
Finally, $$ \frac{|f(x)-f(y)|}{\|x-y\|} \le \frac{4M}{\varepsilon}. $$
I have found a cleaner approach for 2. as follows.
WLOG, we assume $a:=0$. Fix $x,y \in \overline B(0, r - \varepsilon)$ such that $x\neq y$. We fix $\lambda>0$ such that $$ z_\lambda := y + \lambda \frac{y-x}{\|y-x\|} \in \overline B(0, r). $$
It follows that $$ y = t_\lambda x+(1-t_\lambda) z_\lambda \quad \text{with} \quad t_\lambda := \frac{\lambda}{\lambda+\|y-x\|}. $$
By convexity of $f$, we get $$ f(y) \le t_\lambda f(x)+(1-t_\lambda)f(z_\lambda), $$ which implies $$ \frac{f(y)-f(x)}{1-t_\lambda} \le \frac{f(z_\lambda) - f(y)}{t_\lambda}. $$
It follows that $$ \frac{f(y)-f(x)}{|y-x|} \le \frac{f(z_\lambda) - f(y)}{\lambda} \le \frac{2M}{\lambda}. $$
We have $$ \|z_\lambda\| \le \|y\| + \lambda \le r - \varepsilon+\lambda. $$
For $z_\lambda \in \overline B(0, r)$, it suffices to pick $\lambda>0$ such that $r - \varepsilon+\lambda< r$, i.e., $\lambda<\varepsilon$. Hence $$ \frac{f(y)-f(x)}{|y-x|} \le \frac{2M}{\lambda} \le \frac{2M}{\varepsilon}. $$
By symmetry, we also have $$ \frac{f(x)-f(y)}{|x-y|}\le \frac{2M}{\varepsilon}. $$
This completes the proof.