Split into the cases $0 < t \leq T$ and $t > T$, to conclude that
$$
(s*h)(t) = \int_0^{\min \{ t,T\} } {1 \cdot h(t - \tau )d\tau } = \int_0^{\min \{ t,T\} } {\alpha e^{ - \alpha (t - \tau )} d\tau } .
$$
EDIT: It follows that
$$
(s*h)(t) = 1 - e^{-\alpha t}, \;\; 0 < t \leq T,
$$
$$
(s*h)(t) = e^{ - \alpha (t - T)} - e^{ - \alpha t} ,\;\; t > T.
$$
(Since $(s*h)(t) = 0$ for $t \leq 0$, we see that $(s*h)(t)$ is continuous on $\mathbb{R}$.)
Relation to probability theory: With $s$ and $h$ as above, let $X$ be an exponential random variable with density function $h$, and $Y$ an independent uniform$[0,T]$ random variable, so that $Y$ has density function $\tilde s = s/T$.
By the law of total probability, conditioning on $Y$, we have
$$
{\rm P}(X + Y \le t) = \int_0^T {{\rm P}(X \le t - \tau )\frac{1}{T}d\tau }.
$$
It follows that if $0 < t \leq T$, then
$$
{\rm P}(X + Y \le t) = \int_0^t {{\rm P}(X \le t - \tau )\frac{1}{T}d\tau } = \frac{1}{T}\int_0^t {(1 - e^{ - \alpha (t - \tau )} )d\tau } = \frac{{t - (1 - e^{ - \alpha t} )/\alpha }}{T},
$$
while if $t > T$, then
$$
{\rm P}(X + Y \le t) = \int_0^T {{\rm P}(X \le t - \tau )\frac{1}{T}d\tau } = \frac{1}{T}\int_0^T {(1 - e^{ - \alpha (t - \tau )} )d\tau } = \frac{{T - (e^{ - \alpha (t - T)} - e^{ - \alpha t} )/\alpha }}{T}.
$$
Hence, the density function of $X+Y$ is given by
$$
f_{X+Y} (t) = \frac{{1 - e^{ - \alpha t} }}{T} ,\;\; 0 < t \leq T,
$$
$$
f_{X+Y} (t) = \frac{{e^{ - \alpha (t - T)} - e^{ - \alpha t} }}{T}, \;\; t > T.
$$
On the other hand, since $X$ and $Y$ are independent with respective densities $h$ and $\tilde s \,(=s/T)$,
$$
f_{X+Y} (t) = (h*\tilde s)(t) = (\tilde s * h)(t) = \frac{{(s*h)(t)}}{T},
$$
from which it follows that
$$
(s*h)(t) = 1 - e^{ - \alpha t} ,\;\; 0 < t \leq T,
$$
$$
(s*h)(t) = e^{ - \alpha (t - T)} - e^{ - \alpha t} ,\;\; t > T
$$
(as we have already seen above).
Your questions are not stupid questions. In fact, they are the most important questions, which you should not forget when making an analysis of signals and systems. This is because if you apply erroneously the unit step function to a particular signal, the analysis will surely be wrong (in engineering, can make drop a bridge, almost without exaggeration ...).
If you are not completely internalized or do not understand the answers here, check my answer to the following question that explains a little more detailed the matter: Unilateral Laplace Transform vs Bilateral Fourier Transform.
1. Why can we do this multiplication?
This multiplication is done to make a system acquires the behavior of a physical system, or more specifically, the behavior of a causal system, since:
$$
h(t) \mbox{ is a casual system} \quad\Leftrightarrow\quad \forall t\in\mathbb{R},\, t < 0:\quad h(t) = 0
$$
$$
\therefore\quad h(t)u(t) \mbox{ is always a casual system}
$$
Note that you should not always multiply a system for the unit step function. In fact, if a system is not causal, then you never should multiply the system function $h(t)$ by the unit step function. In summary:
- Causal System (Non-anticipative System): always multiply by the unit step.
- No-Causal System (Anticipative System): never multiply by the unit step.
The reason for this is explained in my answer in the link above.
2. Why is it not a convolution?
Because the objective of the Laplace transform is just avoid convolution. Convolution is difficult to calculate and needs a lot of computing power, while a transformed simplifies the process of convolution to a simple multiplication.
$$
y(t) = h(t) \ast x(t) \quad\xrightarrow{\mathcal{L}}\quad Y(s) = H(s) X(s)
$$
Again, the reason for this is explained in my answer in the link above.
Best Answer
Firstly, $\theta^{\star1}(x)=\theta(x) = \theta(x)\frac{x^0}{0!}$.
Now, assume statement is true for some $k\in\Bbb N$. Then, for $x\ge0$,
$$\begin{align} \theta^{\star (k+1)}(x)=&\left(\theta\star\theta^{\star k}\right)(x)\\ =& \int_{-\infty}^{+\infty} \theta(x-y)\cdot\theta(y)\frac{y^{k-1}}{(k-1)!}dy\\ =& \int_{0}^{x} \frac{y^{k-1}}{(k-1)!}dy\\ =& \left[\frac{y^k}{k(k-1)!}\right]_0^x\\ =& \frac{x^k}{k!}\\ \end{align}$$
For $x<0$, $\theta^{\star (k+1)}(x)=0$ as $\theta(x-y)\cdot\theta(y)=0$ for all $y\in\Bbb R$. Therefore, combining two cases, $\theta^{\star (k+1)}(x) = \theta(x)\frac{x^k}{k!}$, and this finishes the induction.
Remaining to prove is that convolution is associative, and this repeated convolution can be performed in any order.
If you know Laplace transform, you can also show this result easily using that.