The general proof of this can be found in Feller (An Introduction to Probability Theory and Its Applications, Vol. 2). It is an inversion problem involving Laplace transform theory. Did you notice that the mgf bears a striking resemblance to the Laplace transform?. For use of Laplace Transformation you can see Widder (Calcus Vol I) .
Proof of a special case:
Suppose that X and Y are random varaibles both taking only possible values in {$0, 1, 2,\dots, n$}.
Further, suppose that X and Y have the same mgf for all t:
$$\sum_{x=0}^ne^{tx}f_X(x)=\sum_{y=0}^ne^{ty}f_Y(y)$$
For simplicity, we will let $s = e^t$
and we will define $c_i = f_X(i) − f_Y (i)$ for $i = 0, 1,\dots,n$.
Now
$$\sum_{x=0}^ne^{tx}f_X(x)-\sum_{y=0}^ne^{ty}f_Y(y)=0$$
$$\Rightarrow \sum_{x=0}^ns^xf_X(x)-\sum_{y=0}^ns^yf_Y(y)=0$$
$$\Rightarrow \sum_{x=0}^ns^xf_X(x)-\sum_{x=0}^ns^xf_Y(x)=0$$
$$\Rightarrow\sum_{x=0}^ns^x[f_X(x)-f_Y(x)]=0$$
$$\Rightarrow \sum_{x=0}^ns^xc_x=0~∀s>0$$
The above is simply a polynomial in s with coefficients $c_0, c_1,\dots,c_n$. The only way it can be zero for all values of s is if $c_0=c_1=\cdots= c_n=0$.So, we have that $0=c_i=f_X(i)−f_Y(i)$ for $i=0, 1,\dots,n$.
Therefore, $f_X(i)=f_Y(i)$ for $i=0,1,\dots,n$.
In other words the density functions for $X$ and $Y$ are exactly the same. In other other words, $X$ and $Y$ have the same distributions.
Adding to Bey's answer, there's a reason you might care about this. The idea is that the MGF is a Laplace transform, and in this case it requires that your (continuous) probability density $f(x)$ decreases at least exponentially fast for large $x$, i.e. $e^{tx}f(x)\rightarrow 0$ for $x\rightarrow\infty$. This can be somewhat weakened but the main idea survives.
Anyways, it's usually the case that if $t$ is too large, this becomes false. So for example if $f(x)=2e^{-2x}$, then the MGF exists (i.e. is finite), for $t\in[0,2)$. As long as $f(t)$ is a density, everything is fine for $t<0$ but it turns out $t>0$ contains a wealth more of information. In general, saying that the MGF exists in a neighborhood of $0$ means that there is some $\epsilon>0$ such that your MGF is finite for all $t\in[0,\epsilon)$. Once your MGF exists, by abstract nonsense it corresponds to a unique distribution (your $f(x)$) and you can exploit all of it's nice properties, for example use it to bound probabilities. In similar vein to characteristic functions (i.e. fourier transforms), the regularity of your MGF near $t=0$ is intimately connected to the rate of decay of your density $f(x)$ as $x\rightarrow\infty$, an example of which you can see in the last link.
Perhaps more familiar to you, derivatives of the MGF, evaluated at $t=0$ give you back the moments of your distribution, so perhaps you can believe why you really only need to know what your $MGF$ looks likes near $t=0$ to extract almost everything about your random variable.
Best Answer
Answered in comments:
The trick works – Dilip Sarwate
More details:
Wikipedia on moment generating functions gives a slightly more general result in an obvious place to look (what were you searching for?). Also see math.SE: Moment generating function of $X+Y$ using convolution of $X$ and $Y$. – Glen_b
MGFs and characteristic functions are conceptually the same. Concerning the latter, see and visit 1 for worked examples. – whuber