In a sense, an MGF is simply a way of encoding a set of moments into a convenient function in a way that you can do some useful things with the function.
The variable $t$ in no way relates to the random variable $X$. You could as readily write $M_X(s)$ or $M_X(u)$... it is, in essence a kind of dummy variable. It doesn't stand for anything beyond being the argument of the mgf.
Herbert Wilf [1] calls a generating function:
a clothesline on which we hang up a sequence of numbers for display
It really wouldn't matter which exact clothesline you hung them on; another would do just as well.
Is there any way to derive the functions from anywhere?
There's more than one way to turn a set of moments into a generating function (e.g. a discrete distribution has a probability generating function, a moment generating function, a cumulant generating function and a characteristic function and you can recover the moments (in some cases less directly than others) from any of them.
So there's not a unique way to encode a set of moments into a function; it's a matter of choice about how you set it up. While they're similar (and, naturally, related), some are more convenient for particular kinds of tasks.
I see a certain analogy between mgf and Laplace transform and cf and Fourier transform.
Not merely an analogy, at least if we consider the bilateral Laplace transform (which I'll still denote as $\mathcal{L}$ here). We see $M_X(t) = \mathcal{L}_X(-t)$ is (at least up to a change of sign) really a Laplace transform (indeed, consider $\mathcal{L}_X(-t) =\mathcal{L}_{-X}(t)$, so it's the bilateral Laplace transform of a flipped variate). One can convert readily from one to the other, and use results for Laplace transforms on mgfs quite happily (and, for that matter, tables of Laplace transforms, if we keep that sign issue in mind). Similarly, characteristic functions are not merely analogous to Fourier transforms, they are Fourier transforms (again, up to the sign of the argument which is of no consequence outside the obvious effect swapping the sign of the argument has on a function).
If Fourier transforms and Laplace transforms help give you intuition about what mgfs and cfs "are" you should certainly exploit those intuitions, but on the other hand, it's not always necessary to have intuition when manipulating these things.
In fact when playing with cfs, because they always exist and are unique, I often tend to think of them as just the distribution looked at through a different lens.
I can see that taking the derivative of the function and evaluating at t=0 gives the moment (if the integral is absolutely convergent), but why?
Because the particular generating function we chose to use (the mgf) was set up to work that way. In order to be able to extract the set of moments from the function again you need something like that -- a way to eliminate all the lower ones (such as differentiation) and eliminate all the higher ones (such as set the argument to 0) so that you can pick out the exact one you need. For that to happen you already need something that works kind of like an mgf. At the same time, it's nice if it has some other properties you can exploit (as the various generating functions we use with random variables do), so that restricts our set of choices even further.
[1] Wilf, H. (1994)
generatingfunctionology, 2nd ed
Academic Press Inc., San Diego
https://www.math.upenn.edu/~wilf/DownldGF.html
You are right that mgf's can seem somewhat unmotivated in introductory courses. So, some examples of use. First, in discrete probability problems often we use the probability generating function, but that is only a different packaging of the mgf, see What is the difference between moment generating function and probability generating function?. The pgf can be used to solve some probability problems which could be hard to solve otherwise, for a recent example on this site, see PMF of the number of trials required for two successive heads or sum of $N$ gamma distributions with $N$ being a poisson distribution. Some not-so-obvious applications which still could be used in an introductory course, is given in Expectation of reciprocal of a variable, Expected value of $1/x$ when $x$ follows a Beta distribution and For independent RVs $X_1,X_2,X_3$, does $X_1+X_2\stackrel{d}{=}X_1+X_3$ imply $X_2\stackrel{d}{=}X_3$? .
Another kind of use is constructing approximations of probability distributions, one example is the saddlepoint approximation, which take as starting point the natural logarithms of the mgf, called the cumulant generating function. See How does saddlepoint approximation work? and for some examples, see Bound for weighted sum of Poisson random variables and Generic sum of Gamma random variables
Mgf's can also be used to prove limit theorems, for instance the poisson limit of binomial distributions Intuitively understand why the Poisson distribution is the limiting case of the binomial distribution can be proved via mgf's.
Some examples (exercise sets with solutions) of actuarial use of mgf's can be found here: https://faculty.math.illinois.edu/~hildebr/370/370mgfproblemssol.pdf Search the internet with "moment generating function actuarial" will give lots of similar examples. The actuaries seem to be using mgf's to solve some problems (that arises for instances in premium calculations) that is difficult to solve otherwise. One example in section 3.5 page 21 and books about actuarial risk theory. One source of (estimated) mgf's for such applications could be empirical mgf's (strangely, I cannot find even one post here about empirical moment generating functions).
Best Answer
Exam P is entry level probability, so you can't get much more basic than that. The SOA syllabus has a list of suggested texts, all of which are good. Older textbooks are just as good as new ones, without the added expense.
Depending on your style of learning, if you just want a quick overview, the Schaum's "Introduction to Probability and Statistics" by Seymour Lipschutz is good, or anything else in the Schaum's series you can find at your local library. This is not enough to prepare you for passing the exam, however.
The syllabus textbooks are all good, and for all intents and purposes equivalent in material they cover. Whichever one is "best" depends on your own particular learning style.