Here are some basic facts about strongly continuous semigroups that will be useful:
- Let $A$ be the infinitesimal generator of $S(\cdot)$. Then $D(A)$ (the domain of $A$) is dense and $A$ is closed.
- For any $x\in D(A)$ we have $S(t)Ax=\frac{d}{dt}S(t)x$.
- If two strongly continuous semigroups have the same infinitesimal generator, then in fact they are the same semigroup.
Fix $\lambda>0$, $x\in H$ and define $J(t):=\int_0^t S(\tau)x\,d\tau$ ($J$ depends on $x$ as well, but I will omit it for simplicity).
By hypothesis we have $\|J(t)\|\le t\|x\|$, so the integral
$$R(\lambda)x:=\lambda\int_0^\infty e^{-\lambda t}J(t)\,dt$$
makes sense. Let us check that $R(\lambda)=(\lambda I-A)^{-1}$.
$\bullet$ $(S(\epsilon)-I)\int_0^T e^{-\lambda t}J(t)\,dt=\int_0^T e^{-\lambda t}\left(\int_0^t(S(\epsilon)-I)S(\tau)x\,d\tau\right)\,dt$
$=\int_0^T e^{-\lambda t}\left(\int_\epsilon^{t+\epsilon}S(\tau)x\,d\tau-\int_0^t S(\tau)x\,d\tau\right)\,dt$
$=\int_0^T e^{-\lambda t}\left( J(t+\epsilon)-J(\epsilon)-J(t)\right)\,dt$
$=e^{\lambda\epsilon}\int_\epsilon^{T+\epsilon}e^{-\lambda t}J(t)\,dt-\int_0^T e^{-\lambda t}J(t)\,dt-\frac{1-e^{-\lambda T}}{\lambda}J(\epsilon)$
and taking the limit as $T\to\infty$ at the beginning and the end of this chain of equalities and dividing by $\epsilon$ we get
$$\frac{S(\epsilon)-I}{\epsilon}\int_0^\infty e^{-\lambda t}J(t)\,dt=\frac{e^{\lambda\epsilon}-1}{\epsilon}\int_0^\infty e^{-\lambda t}J(t)\,dt-\frac{J(\epsilon)}{\lambda\epsilon}-\frac{e^{\lambda\epsilon}}{\epsilon}\int_0^\epsilon e^{-\lambda t}J(t)\,dt$$
But the RHS possesses a limit as $\epsilon\to 0$, namely $R(\lambda)x-\frac{x}{\lambda}$
(the last term tends to $0$ since $e^{-\lambda t}J(t)=o(1)$): thus
$\frac{R(\lambda)x}{\lambda}=\int_0^\infty e^{-\lambda t}J(t)\,dt\in D(A)$ and
$$A\frac{R(\lambda)x}{\lambda}=R(\lambda)x-\frac{x}{\lambda}$$
i.e. $(\lambda I-A)R(\lambda)x=x$.
$\bullet$ Suppose now $x\in D(A)$. The second fact stated at the beginning gives
$\int_0^T e^{-\lambda t}\left(\int_0^t S(\tau)Ax\,d\tau\right)\,dt
=\int_0^T e^{-\lambda t}(S(t)x-x)\,dt$
$=e^{-\lambda T}J(T)+\lambda\int_0^T e^{-\lambda t}J(t)\,dt-\frac{1-e^{-\lambda T}}{\lambda}x$ (in the last equality we integrated by parts).
Sending $T\to\infty$ we obtain
$$R(\lambda)Ax=\lim_{T\to\infty}\int_0^T e^{-\lambda t}\left(\int_0^t S(\tau)Ax\,d\tau\right)\,dt=R(\lambda)x-\frac{x}{\lambda}$$
so $(\lambda I-A)R(\lambda)x=x$. Moreover $R(\lambda):H\to D(A)$ is a bounded operator.
This proves that $\lambda$ belongs to the resolvent set of $A$ and that $R(\lambda)=(\lambda I-A)^{-1}$.
Finally $\|R(\lambda)x\|\le \lambda\int_0^\infty e^{-\lambda t}\|J(t)\|\,dt
\le \lambda\int_0^\infty e^{-\lambda t}t\|x\|\,dt=\frac{\|x\|}{\lambda}$,
so $\|(\lambda I-A)^{-1}\|\le\frac{1}{\lambda}$.
So Hille-Yosida theorem for contraction semigroups implies that $A$ generates a contraction semigroup, which coincides with $S(\cdot)$.
For all $t$, and all $u$
$$\lim_{h\to 0}S(t+h)u=\lim_{h\to 0}S(t)S(h)u=S(t)\lim_{h\to 0}S(h)u=S(t)u,$$
because $S(t)$ is continuous. If $h$ is small, then for all $s$ in $[t,t+h]$ we have
$$S(s)u\approx S(t)u$$
(the approximation in norm), so the averages of both sides over $[t,t+h]$ are also close:
$$\frac{1}{h}\int_t^{t+h}S(s)uds\approx S(t)u.$$
To formalize this, use subadditivity of integration and the fact that $S(t)u=\frac{1}{h}\int_t^{t+h}S(t)uds$.
More details for the last part: Let $\epsilon>0$. Choose $\delta>0$ such that $|h|<\delta$ implies $\Vert S(h)u-u\Vert<\epsilon/(\Vert S(t)\Vert+1)$. Then if $s\in[t,t+\delta]$, we have
$$\Vert S(s)u-S(t)u\Vert=\Vert S(t)(S(s-t)u-u)\Vert\leq\Vert S(t)\Vert \Vert S(s-t)u-u\Vert\leq\epsilon,$$
because $|s-t|<\delta$.
Let $|h|<\delta$. If we integrate the constant function $s\mapsto S(t)u$, we have
$$\int_t^{t+h}S(t)uds=h S(t)u$$
so
\begin{align*}
\left|\int_t^{t+h}S(s)uds-hS(t)u \right|
&=\left|\int_t^{t+h}(S(s)u-S(t)u)ds\right|\\
&\leq\int_t^{t+h}\Vert S(s)u-S(t)u\Vert ds\\
&\leq\int_t^{t+h}\epsilon ds\\
&=\epsilon h,
\end{align*}
where the first inequality is the subadditivity of the integral. This means that for all $|h|<\delta$,
$$\left|\frac{1}{h}\int_t^{t+h}S(s)uds-S(t)u\right|<\epsilon.$$
This is precisely what it means to have $\lim_{h\to 0}\frac{1}{h}\int_t^{t+h}S(s)uds=S(t)u$.
Best Answer
Usually:
A semigroup on a given Banach space $\mathcal N$ is a family $(P^t)_{0\leq t<\infty}$ of bounded linear operators defined everywhere in $\mathcal N$ such that $$P^0=1,\qquad P^tP^s=P^{t+s}, \qquad t,s\ge 0$$ where $1$ is the identity operator on $\mathcal{N}$.
A contraction semigroup is a semigroup (in the sense defined above) which satisfies the extra condition $$\|P^t\|_{\mathcal{L}}\leq 1,\quad\forall\ t\geq 0$$ where $\|P^t\|_{\mathcal{L}}$ is the norm defined in your post (which is the usual operator norm).
The generators of a semigroup is defined by $$ A\psi=\lim_{t\to 0^+}\frac{P^t\psi-\psi}{t} $$ with domain $D(A)=\{\psi\mid \text{the above limit exists (in the sense of $\mathcal{N}$)}\}$.
A semigroup is strongly continuous if it satisfies $$\lim_{t\to 0^+} \|P^t\phi-\phi\|_{\mathcal{N}}=0,\quad\forall \ \phi\in\mathcal{N}.$$
A semigroup is uniformly continuous if it satisfies $$\lim_{t\to 0^+} \|P^t-1\|_{\mathcal{L}}=0.$$
If the right-hand side means the exponential of a bounded operator, it holds if and only if the semigroup is uniformly continuous (see Theorem 1.2, Theorem 1.3 and Corollary 1.4 in Section 1.1 of Pazy's book).
Often the right-hand side is used only as a notation.
If you are using definitions which are different from the definitions in my post, please clarify.