For the first question, I think the answer is no. Consider the following example:
$X = Spec k[x,y,z]/(xy - z^2)$ a quadric cone. Consider the Cartier divisor $D = V(z)$. It has two irreducible components corresponding to the ideals $(x,z)$ and $(y,z)$ respectively (these are non-Cartier, Q-Cartier divisors). Both components smooth and they meet at the origin and so the multiplicity of $D$ (of this nodal singularity) is $2$. By multiplicity, I assume you mean the multiplicity of the scheme $D$ at a point.
On the other hand, if you blow up the origin $(x,y,z)$ you get a chart $$k[x/z, y/z, z]/( (x/z)(y/z) - 1).$$ The pull back of $D$ on this chart is just $z = 0$ (one copy of the exceptional divisor) so the order of $\mu^*(D)$ along $E$ is equal to 1. (The order of the components along the exceptional divisor is $1/2$ in each case, but they are $\mathbb{Q}$-Cartier)
There's a deeper problem in your first question though. If I recall correctly, in general, when you blow-up a point $x \in X$ on a singular variety, there isn't a unique prime exceptional divisor lying over $x$. There are probably multiple such divisors. To make matters worse, the pull back of your given Cartier divisor can have different multiplicities along these different exceptional divisors.
For the second question:
You assume that the pair $(X, \Delta)$ is klt, and you define the discrepancy at $E$ to be the order along $E$ of $K_Y - \mu^*(K_X + \Delta)$. Then you say that you know that $a(E, X, \Delta) \leq 1$ if $X$ is smooth. This isn't true.
I assume you know that the definition of klt implies that these discrepancies are all $> -1$. However, consider the following example.
$X = Spec k[x,y,z]$ and $\Delta = 0$. This pair is certainly klt. When you blow up the origin though, the relative canonical divisor $K_{Y/X} = 2E$, two copies of the exceptional (if you blow up the origin in $\mathbb{A}^n$, you get $n-1$ copies of the exceptional divisor). If you blow up points on that exceptional divisor (and repeat), you get further exceptional divisors with greater and greater discrepancy.
Hopefully I didn't misunderstand the question.
These things are quite involved and any reference that says something holds by [BCHM] without specifying the actual statement and how it needs to be applied is completely unfair.
Also as you indicate [BCHM] is not the easiest read. Another account of most of the things that are in [BCHM] are also included in Hacon-K10, so you can at least try to look at two sources when you get stuck.
Also, whatever you do, in order for $X$ to admit a terminalization as stated, $X$ must have canonical singularities to begin with. Otherwise you cannot get a crepant morphism from something with terminal singularities. In other words, the existence of $Y$ implies that $X$ has canonical singularities. I assume that you get that for the particular $X$ this is applied to.
There are a few things one can say towards proving terminalization: For threefolds terminalization was proved by Reid in '83 and $\mathbb Q$-factorialization by Kawamata in '88. Both of these can be found in Kollár-Mori98, p.195. Often people refer to [BCHM] for the general fact that it provides a missing piece that had only been known up to dimension $3$ before and hence a lot of statements that had been only known up to dimension $3$ are now OK in arbitrary dimension. Of course, one should actually go through and check that everything works. In particular, if you look at Reid's proof of terminalization in dimension $3$ it uses some classification of $3$-dimensional canonical singularities, so the proof does not adapt to the arbitrary dimension right away.
One approach of trying to do both terminalization and $\mathbb Q$-factorialization is the following:
The main theorem of [BCHM] is that the minimal model program can be run under fairly general conditions. One condition that is included in their statement that is often overlooked is that the models they get are all $\mathbb Q$-factorial. So the idea is this: run the minimal model program, then you end up with a klt $\mathbb Q$-factorial model. You probably don't even need this part since you should have an $X$ with canonical singularities as remarked above.
Anyway, once there you can take an arbitrary resolution and try to contract divisors that are non-crepant. The usual way to do this is to run a well-chosen mmp, that is, run the mmp with a well-chosen boundary divisor. This is vaguely explained in Hacon-K10, p.57. The relevant statement from [BCHM] is Corollary 1.4.3. See the paragraph following the statement. I suppose this may have been what Namikawa was referring to.
A variant of this idea is worked out in detail in Theorem 3.1 of Kollár-K10.
I don't know what's going on with the homotopy equivalence part of your question. I know that in general it is very hard to follow how the ample cone changes so I am guessing that in order to get what you want you will need to use some special properties of your situation.
Best Answer
I am also just learning this stuff, and I'm partly writing this out for my own benefit. Experts, please correct and up/down vote as appropriate!
The goal of the minimal model program is to give a standard, nonsingular, representative for each birational class of algebraic variety. As stated, this goal is too ambitious, but it will help us to understand the minimal model program if we think of it as a partially successful attempt at this goal.
Let $X$ be a compact, smooth algebraic variety of dimension $n$. Let $\omega$ be the top wedge power of the holomorphic cotangent bundle. Then the vector space, $V:=H^0(X, \omega)$, of holomorphic $n$-forms on $X$ is a birational invariant of $X$. This means that we should be able to see $V$ from just the field of meromorphic functions on $X$; here is a sketch of how to do that. So we get a rational map $X \to \mathbb{P}(V^{\*})$ by the standard recipe. More generally, we can replace $\mathbb{P}(V)$ with Proj of the ring $\bigoplus H^0(X, \omega^{\otimes n})$. This is called the canonical ring; you may have heard of the recent breakthrough in proving that the canonical ring is finitely generated. We can map $X$ rationally to this Proj; the image is called the log model. This is a partial success: it is a canonical, birational construction, but it may not be birational to $X$ and may not be smooth.
There are certain well understood rules of thumb for how various subobjects of $X$ behave in the log model. For example, if $X$ is a surface and $C$ a curve with negative self intersection, then $C$ will be blown down in the log model.
Here is a more complicated example, which is relevant to your question. Let $Y$ be some variety that locally looks like the cone on the Segre embedding of $\mathbb{P}^1 \times \mathbb{P}^1$. So $Y$ is a $3$-fold with an isolated singularity. If you are familiar with the toric1 picture, it looks like the tip of a square pyramid. Inside $Y$, let $Z$ be the cone on one of the $\mathbb{P}^1$'s. This is a surface, but not a Cartier divisor. Let $X$ be $Y$ blown up along $Z$; so that the isolated singularity becomes a line. In the toric picture, the point of the pyramid has lengthened into a line segment, and two of the faces which used to touch at the point now border along an entire edge. In the log model, the line will blow back down to become a point. So the log model can turn a smooth variety, like $X$, into a singular one like $Y$.
Now, birational geometers did not rest on their laurels when they had constructed the log model. They made other constructions, which are smoother but less canonical. Many of these constructions can be thought of as taking the log model and modifying it in some way. If the log model looks like the example of the previous paragraph, they want to take the singular point of $Y$ and replace it by a line, to look like $X$. But they have two ways they can do this; they can blow up one $\mathbb{P}^1$ or the other; giving either $X$ or $X'$. Often, replacing $X$ by $X'$ is crucial in order to improve the model somewhere else. The relationship between $X$ and $X'$ is called a flip, because we take the line inside $X$ and flip it around to point in a different direction.
1 Cautionary note: although the toric picture is excellent for visualizing what is going on locally, you shouldn't take $X$ itself to be a toric variety. There are no global sections of $\omega$ on a toric variety, so the log model is empty. You want $X$ to locally look like a toric variety, but have global geometry which is nontoric in a way that creates lots of sections of $\omega$.