According to the mean you give, you use the following parametrisation for the Weibull distribution:
$$
\textrm{if }X\sim \textrm{Weibull}(\lambda, \alpha) \textrm{ then } f_X(x) = \lambda \alpha x^{\alpha - 1} \exp(-\lambda x^\alpha),
$$
with $\lambda > 0$ a scale parameter, and $\alpha > 0$ a shape parameter.
dweibull() from R, as well as wikipedia, use another parametrisation. The conversion is as follows:
$$
\textrm{shape} = \alpha \quad \textrm{and} \quad \textrm{scale} = \left(\frac{1}{\lambda} \right)^{\tfrac{1}{\alpha}},
$$
where $\textrm{shape}$ and $\textrm{scale}$ are those given in dweibull() and wikipeida.
Let $\mathbf{x}'\mathbf{\beta} = x_1\beta_1 + x_2\beta_2 + \dotsb$ be the linear predictor.
Assuming a proportional hazards structure and a $\textrm{Weibull}(\lambda, \alpha)$ distribution at baseline, the hazard rate is written
\begin{align*}
h(t) & = h_0(t) \exp(\mathbf{x}'\mathbf{\beta}) \\
& = \lambda \alpha t^{\alpha - 1} \exp(\mathbf{x}'\mathbf{\beta}).
\end{align*}
The corresponding pdf is
$$
f(t) = \lambda \alpha t^{\alpha - 1} \exp(\mathbf{x}'\mathbf{\beta}) \exp \left( - \lambda t^\alpha \exp(\mathbf{x}'\mathbf{\beta}) \right).
$$
That is, $T$ has a Weibull distribution with the same shape $\alpha$ but the scale parameter is changed from $\lambda$ to $\lambda \exp(\mathbf{x}'\mathbf{\beta})$:
$$
T \sim \textrm{Weibull}(\lambda \exp(\mathbf{x}'\mathbf{\beta}), \alpha)
$$
and we have
$$
E[T] = \frac{\Gamma(1 + \tfrac{1}{\alpha})}{\left(\lambda\exp(\mathbf{x}'\mathbf{\beta})\right)^{\tfrac{1}{\alpha}}}.
$$
An example without covariate:
> #------ scale and shape parameters in your parametrisation ------
> lambda <- 3
> alpha <- 0.88
> #----------------------------------------------------------------
>
> #------ conversion ------
> shape <- alpha
> scale <- (1 / lambda)^(1 / alpha)
> #------------------------
>
> #------ some data ------
> T <- rweibull(n=10000, shape=shape, scale=scale)
> #-----------------------
>
> #------ theoretical and empirical means ------
> gamma(1 + 1 / alpha) / (lambda^(1 / alpha))
[1] 0.305765
> mean(T)
[1] 0.3026293
> #---------------------------------------------
The reason the Weibull distribution is widely used in reliability and life data analysis is most likely due to its versatility. Depending on the parameters used, the Weibull distribution can be used to model a variety of failure laws.
For example, this source http://www.weibull.com/hotwire/issue14/relbasics14.htm provides a good understanding of the versatility of the beta parameter, to quote:
"Effect of beta on Weibull failure rate
This is one of the most important aspects of the effect of β on the Weibull distribution. As is indicated by the plot, Weibull distributions with β < 1 have a failure rate that decreases with time, also known as infantile or early-life failures. Weibull distributions with β close to or equal to 1 have a fairly constant failure rate, indicative of useful life or random failures. Weibull distributions with β > 1 have a failure rate that increases with time, also known as wear-out failures. These comprise the three sections of the classic "bathtub curve." A mixed Weibull distribution with one subpopulation with β < 1, one subpopulation with β = 1 and one subpopulation with β > 1 would have a failure rate plot that was identical to the bathtub curve. An example of a bathtub curve is shown in the following chart."
Best Answer
If the failure times have a distribution $F(t)$ then the corresponding survival function is $S(t)=1-F(t)$. That's a critical thing to keep in mind.
The Weibull plots are just plots of a transformed $F(t)=1-S(t)$ against the log of time. So the Weibull plot is just a particular replotting of the survival curve.
A Weibull model can be written in the form:
$$\log T= \alpha + \sigma W, $$
where $W$ has a minimum extreme-value distribution and $\sigma$ is a scale factor. The y-axis transformation of $F(t)$ in a Weibull plot gives a straight line if the underlying distribution $W$ is minimum extreme-value. If there is a well fitting line, then the values of $\alpha$ and $\sigma$ can be deduced from the plot.
My sense is that the plots were most important before modern computational technology became available. I don't see that there's anything they do that can't be done with more general survival modeling, which doesn't restrict you to a Weibull form. You can always generate a Weibull plot from your survival model if you want to.