I think you're missing that the partial derivative in $D_\mu$ acts on the phase factor. If you start only with the phase transformation of the Dirac term:
$$\mathcal{L}_D' = \bar \psi e^{i\alpha(x)} \left[ i\gamma^\mu(\partial_\mu + ieA_\mu) - m \right] \psi e^{-i\alpha(x)} $$
you can almost pull the $e^{i\alpha(x)} $ through and cancel them. However, the $\partial_\mu=\partial/\partial x^\mu$ acts on $ e^{i\alpha(x)} $ and gives you an additional term (product, then chain rule). So after canceling, $\partial_\mu \rightarrow \partial_\mu + $ something with $\alpha(x)$. Then perform the $A_\mu \rightarrow A_\mu'$ substitution, and the terms should exactly cancel out.
A gauge transformation is defined to be the simultaneous transformation
\begin{align}
A_\mu & \mapsto A_\mu + e^{-1} \partial_\mu A \\
\phi & \mapsto \mathrm{e}^{\mathrm{i}\Lambda}.
\end{align}
It is not that the transformation of the gauge field "necessarily" leads to the transformation of the charged field, but only under this joint transformation everything is properly covariant. More abstractly, we conceive of both transformations to be different representations/realizations of the same abstract gauge algebra, consisting of functions $\Lambda : \mathbb{R}^4\to\mathbb{R}$. $\Lambda$ acts in one way on gauge fields and in another on charged fields, it's perfectly analogous to how the rotation group acts differently on scalars, vectors and tensors, yet "a rotation" is transforming all scalars, vectors and tensors and not just the vectors.
In a way, the covariance of $D_\mu\phi$ is the whole reason we invent the gauge field because $\partial_\mu \phi$ would not be covariant under the transformation of just $\phi$. It's less that the transformation of $\phi$ derives from that of $A$ and most that that of $A$ derives from that of $\phi$, at least if we take the viewpoint that the local symmetry is the fundamental object here. If we want to build a theory with such a local symmetry dependening on a spacetime function $\Lambda(x)$, we need to modify the derivative in order to build an invariant Lagrangian. So the first thing to try (perhaps motivated by the addition of the Christoffel symbols to the ordinary derivative in GR, which are effectively also only a type of gauge field) is to modify $\partial_\mu$ as $D_\mu := \partial_\mu + A_\mu$ for some $A_\mu$.
Now you examine $D_\mu (\mathrm{e}^{\mathrm{i}\Lambda}\phi)$, which we want to be equal to $\mathrm{e}^{\mathrm{i}\Lambda}D_\mu\phi$, and just look at the extra terms. After staring long enough at them, you will realize that those are exactly the terms that get cancelled if we let $A_\mu$ transform as we usually do, that is, the transformation of $A_\mu$ is engineered such that the covariant derivative is truly covariant.
Best Answer
The scalar QED Lagrangian in your question is for a complex scalar field $\psi(x)$ interacting with an electromagnetic field given by potential $A_{\mu}(x)$. At any point $x$, the scalar field is a complex number. We model this situation by constructing a space - a vector bundle $V$ - which is isomorphic to $M \ X \ \mathbb{C}$. In general a bundle is only locally isomorphic to the product space, since it might have twists in it, but we'll ignore this here. The bundle has a projection map onto the spacetime manifold $\pi: V \rightarrow M$. The set of points which are projected onto $x \in M$ is called the fibre over $x$, and denoted $F_x$. Each $F_x$ is isomorphic to the complex numbers $\mathbb{C}$.
Now to make explicit such an isomorphism, we effectively choose a coordinate $z$ for each fibre. So our bundle $V$ then has coordinates $\{x^{\mu}, z \}$. Our spacetime field $\psi(x)$ as a map from $M \rightarrow V$ is called a section of $V$. If we compose a section with the projection $\pi$ we get back the spacetime point we started with. We can think of the choice of fibre coordinates as a gauge choice. A gauge transformation is the choice of a new fibre coordinate, related to the old by $z \rightarrow g.z$ where $g \in U(1)$. In the case of a local gauge transformation, this new choice of coordinate becomes $z \rightarrow g(x).z$ where $g$ is now a function of $x$.
Now, given the interpretation of $\psi(x)$ as a section of $V$, in order to construct the Lagrangian, we need to be able to differentiate it i.e. we need to be able to compute a derivative which is a limit $$ \lim_{\Delta x \to 0}\frac{\psi(x+\Delta x)-\psi(x)}{\Delta x} $$ The problem is: $\psi(x)$ lives in the fibre $F_x$ over $x$, and $\psi(x+dx)$ lives in the fibre $F_{x+dx}$ over $x+dx$. These are different spaces, so we can't perform the subtraction unless we can map points in $F_{x+dx}$ to points in $F_x$. If we've chosen a gauge, this is no problem - we have an explicit mapping of both fibres to the complex numbers, so we can perform the subtraction, but we want something that makes sense when we make changes of gauge, in particular local changes of gauge.
The recipe to do this is to introduce a connection. If we start at a point $p$ in the fibre $F_x$ and infinitesimally perturb the point $x$ to $x+dx$, to specify where $p$ moves to, we need to give it in general a horizontal component (in the M coordinate direction), and a vertical component (in the fibre directions). Given a gauge, describing an infinitesimal fibrewise displacement is easy - we just apply an infinitesimal element of the gauge group. Such an infinitesimal element belongs to the Lie algebra of the group. For $U(1)$, this Lie algebra is just the real numbers, so the vertical displacements corresponding to movement of $p$ in the 4 spacetime coordinate directions are just given by four real numbers. As a function of spacetime coordinates, they become four functions $A_{\mu}(x)$. The gauge covariant derivative is then just $$ D_{\mu}\psi(x) = \partial_{\mu}\psi(x) + A_{\mu}(x)\psi(x)$$
If we perform a local gauge transformation $$\psi(x) \rightarrow \psi'(x) = g(x)\psi(x)$$ then, provided we make a corresponding transformation $$ A_{\mu}(x) \rightarrow A'_{\mu}(x) = A_{\mu}(x) + g(x)\partial_{\mu}g^{-1}(x) \ \ (1) $$ the gauge covariant derivative transforms like $$ D_{\mu}\psi'(x) = D_{\mu}(g(x)\psi(x)$$ $$ = \partial_{\mu}(g(x)\psi(x)) + A'_{\mu}(x)g(x)\psi(x)$$ $$ = \partial_{\mu}(g(x)\psi(x)) + [A_{\mu}(x)+g(x)\partial_{\mu}g^{-1}(x)]g(x)\psi(x)$$ $$ = g(x)\partial_{\mu}\psi(x) + (\partial_{\mu}g)\psi(x) + A_{\mu}(x)g(x)\psi(x) + g(x)(\partial_{\mu}g^{-1}(x))g(x)\psi(x)$$ $$ = g(x)(\partial_{\mu}\psi(x) + A_{\mu}(x)\psi(x) = g(x)D_{\mu}\psi(x)$$
Where for the last step we used $$ 0 = \partial_{\mu}1 = \partial_{\mu}(g(x)g^{-1}(x)) = (\partial_{\mu}g(x))g^{-1}(x)+g(x)(\partial_{\mu}g^{-1}(x)) $$ So $D_{\mu}\psi(x)$ transforms covariantly, in a way that ensures the Lagrangian is gauge invariant. If we write $g(x) = e^{i\alpha(x)}$ then the transformation law (1) becomes $$ A_{\mu}(x) \rightarrow A'_{\mu}(x) = A_{\mu}(x) - i\partial_{\mu}\alpha(x) $$