Suppose that $\left(X_{i}\right)_{1\leq i\leq n}$
is independant and identically distributed according to an exponential law with a parameter $\lambda>0$
. For all $x\in\mathbb{R}$
, we have$$\mathbb{P}\left[X_{i}\leq x\right]=1-e^{-\lambda x}.$$
Now, it is straightforward to use the characteristic functions : if $S_{n}=\sum_{i=1}^{n}X_{i}$
, then we have for all $t\in\mathbb{R}$
$$\chi_{_{S_{n}}}\left(t\right)=\prod_{i=1}^{n}\chi_{_{X_{i}}}\left(t\right)=\left(\chi_{_{X_{1}}}\left(t\right)\right)^{n}=\left(\left(1-i\lambda t\right)^{-1}\right)^{n}=\left(1-i\lambda t\right)^{-n}$$
which is the characteristic function of a random variable that follow a gamma law of parameters $\left(n,\lambda\right)$ (I used independance and identical distribution here). Thus, the density of the law of $S_{n}$
is the density of the gamma law (see for example http://en.wikipedia.org/wiki/Gamma_distribution).
You can also show by induction that the density of the sum of INDEPENDANT random variables is the convolution of the densities.
Don't make this so hard for yourself. Simply compute the kernels rather than explicitly integrating. I will change your notation because $\bar X$ is typically used for the sample mean $$\bar X = \frac{1}{n} \sum_{i=1}^n X_i$$ rather than the sample total. Then
$$p(\lambda \mid X, \alpha, \beta) \propto \lambda^n e^{-\lambda n \bar X} \lambda^{\alpha - 1} e^{-\beta \lambda} = \lambda^{n + \alpha - 1} e^{-(n\bar X + \beta)\lambda}$$ which is the kernel of a gamma density with posterior shape hyperparameter $\alpha^* = n + \alpha$ and rate hyperparameter $\beta^* = n \bar X + \beta$, which agrees with your computation (keeping in mind my $\bar X$ differs from yours by a factor of $1/n$). In performing the computation, we discarded any factors that were not functions of $\lambda$. You inadvertently did this by ignoring the fact that your computation resulted in a posterior likelihood which does not integrate to unity; i.e., your expression for $p(\lambda \mid X, \alpha, \beta)$ is not a proper density since the constant terms with respect to $\lambda$ are not the required normalizing factors for a gamma density with shape $\alpha^*$ and rate $\beta^*$.
Next, if we know that the original gamma prior has a normalizing factor of $\beta^\alpha/\Gamma(\alpha)$ since $$p(\lambda, \mid \alpha, \beta) = \frac{\beta^\alpha}{\Gamma(\alpha)} K(\lambda \mid \alpha, \beta)$$ where $K$ is the kernel, and the posterior is gamma with hyperparameters $\alpha^*$, $\beta^*$, it immediately follows that the marginal likelihood is the ratio of the prior normalizing factor divided by the posterior normalizing factor; i.e., $$p(X \mid \alpha, \beta) = \frac{\beta^\alpha/\Gamma(\alpha)}{(\beta^*)^{\alpha^*}/\Gamma(\alpha^*)} = \frac{\beta^\alpha \Gamma(n+\alpha)}{(n\bar X + \beta)^{n + \alpha} \Gamma(\alpha)},$$ because of Bayes' rule: $$p(\lambda \mid X, \alpha, \beta) = \frac{p(X \mid \lambda)p(\lambda \mid \alpha,\beta)}{p(X \mid \alpha, \beta)}.$$ No integration is required. It is important to note that $p(X \mid \alpha, \beta)$ is multivariate with respect to the sample $X = (X_1, \ldots, X_n)$, thus is not itself gamma distributed.
For the posterior predictive distribution, we apply the same principles as described above. First, by Bayes' rule, $$p(x \mid X , \alpha, \beta) = \frac{p(x \mid \lambda) p(\lambda \mid X, \alpha, \beta)}{p(\lambda \mid X, x, \alpha, \beta)}$$ where the denominator is the posterior given the sample $X$ and the new observation $x$, and the numerator is the likelihood; thus the RHS is again the ratio of normalizing factors, but this time the normalizing factors correspond to the posterior density in the numerator, and the posterior plus new observation in the denominator: $$p(x \mid X, \alpha, \beta) = \frac{(\beta^*)^{\alpha^*}/\Gamma(\alpha^*)}{(\beta')^{\alpha'}/\Gamma(\alpha')},$$ where $$\alpha' = \alpha^* + 1 = n+\alpha + 1,$$ and $$\beta' = \beta^* + x = n\bar X + \beta + x.$$ Note that the posterior predictive is a univariate density in the new observation $x$, hence it is instructive to consider its kernel: $$p(x \mid X, \alpha, \beta) \propto \frac{1}{(\beta')^{\alpha'}} = (n \bar X + \beta + x)^{-\alpha'}$$ which is proportional to a Pareto (Type II) density with minimum value parameter $0$, scale parameter $\beta^* = n \bar X + \beta$ and shape parameter $\alpha^* = n + \alpha$.
Best Answer
You seem to have a fundamental misconception of the relationship between functions of random variables, and functions of their densities.
When I say $X$ and $Y$ are IID exponential random variables, say each with density $$f_X(x) = \lambda e^{-\lambda x}, \quad f_Y(y) = \lambda e^{-\lambda y},$$ when I multiply the densities together, you get a joint density on a quadrant of $\mathbb R^2$ $$f_{X,Y}(x,y) = \lambda e^{-\lambda x} \lambda e^{-\lambda y},$$ for ordered pairs of positive reals $(x,y)$. This is a bivariate distribution and is not gamma or anything of that sort.
However, if I take the sum of the random variables $X + Y$, I obtain a new univariate random variable, say $W = X + Y$, whose support is on the positive reals and is Gamma distributed with shape $2$ and rate $\lambda$: $$f_W(w) = \lambda^2 w e^{-\lambda w}.$$ When I add random variables together, I get another random variable of the same dimension as their summands, just like when I add $2 + 3$ I do not get a vector, but a scalar; and when I add $(3,4) + (2,1)$ I don't get a scalar, I get an ordered pair.
To address your computation for the original question, $\Pr[X < Y + Z]$, both methods are correct and equivalent but not in the way that you are conceptualizing it. This becomes apparent if you write out the statement precisely as follows. Let $W = Y + Z \sim \operatorname{Gamma}(2,\lambda)$ as stated above. Then $\Pr[X < Y + Z] = \Pr[X < W]$ and the first method gives $$\Pr[X < W] = \int_{w=0}^\infty \Pr[X < w \mid W = w]f_W(w) \, dw \\ = \int_{w=0}^\infty (1 - e^{-\lambda w})\lambda^2 w e^{-\lambda w} \, dw.$$ The second statement is $$\Pr[X < Y + Z] = \int_{y=0}^\infty \int_{z=0}^\infty \Pr[X < Y + Z \mid (Y,Z) = (y,z)] f_{Y,Z}(y,z) \, dy \, dz \\ = \int_{y=0}^\infty \int_{z=0}^\infty (1 - e^{-\lambda (y+z)}) \lambda e^{-\lambda y} \lambda e^{-\lambda z} \, dy \, dz.$$ They integrate to the same result but the integrands are different and they are not comparable, because the variables of integration are not the same. In particular, $w$ is neither $y$ nor $z$, so just because the limits of integration are $(0, \infty)$, you can't assume the single integral in the first method represents one of the two integrals in the second.
Now, you can perform the convolution of $f_Y$ and $f_Z$ to prove that $W = Y + Z$ has the density that we claim it does, but this is not the point of your question.