The scalar QED Lagrangian in your question is for a complex scalar field $\psi(x)$ interacting with an electromagnetic field given by potential $A_{\mu}(x)$. At any point $x$, the scalar field is a complex number. We model this situation by constructing a space - a vector bundle $V$ - which is isomorphic to $M \ X \ \mathbb{C}$. In general a bundle is only locally isomorphic to the product space, since it might have twists in it, but we'll ignore this here. The bundle has a projection map onto the spacetime manifold $\pi: V \rightarrow M$. The set of points which are projected onto $x \in M$ is called the fibre over $x$, and denoted $F_x$. Each $F_x$ is isomorphic to the complex numbers $\mathbb{C}$.
Now to make explicit such an isomorphism, we effectively choose a coordinate $z$ for each fibre. So our bundle $V$ then has coordinates $\{x^{\mu}, z \}$. Our spacetime field $\psi(x)$ as a map from $M \rightarrow V$ is called a section of $V$. If we compose a section with the projection $\pi$ we get back the spacetime point we started with. We can think of the choice of fibre coordinates as a gauge choice. A gauge transformation is the choice of a new fibre coordinate, related to the old by $z \rightarrow g.z$ where $g \in U(1)$. In the case of a local gauge transformation, this new choice of coordinate becomes $z \rightarrow g(x).z$ where $g$ is now a function of $x$.
Now, given the interpretation of $\psi(x)$ as a section of $V$, in order to construct the Lagrangian, we need to be able to differentiate it i.e. we need to be able to compute a derivative which is a limit $$ \lim_{\Delta x \to 0}\frac{\psi(x+\Delta x)-\psi(x)}{\Delta x} $$ The problem is: $\psi(x)$ lives in the fibre $F_x$ over $x$, and $\psi(x+dx)$ lives in the fibre $F_{x+dx}$ over $x+dx$. These are different spaces, so we can't perform the subtraction unless we can map points in $F_{x+dx}$ to points in $F_x$. If we've chosen a gauge, this is no problem - we have an explicit mapping of both fibres to the complex numbers, so we can perform the subtraction, but we want something that makes sense when we make changes of gauge, in particular local changes of gauge.
The recipe to do this is to introduce a connection. If we start at a point $p$ in the fibre $F_x$ and infinitesimally perturb the point $x$ to $x+dx$, to specify where $p$ moves to, we need to give it in general a horizontal component (in the M coordinate direction), and a vertical component (in the fibre directions). Given a gauge, describing an infinitesimal fibrewise displacement is easy - we just apply an infinitesimal element of the gauge group. Such an infinitesimal element belongs to the Lie algebra of the group. For $U(1)$, this Lie algebra is just the real numbers, so the vertical displacements corresponding to movement of $p$ in the 4 spacetime coordinate directions are just given by four real numbers. As a function of spacetime coordinates, they become four functions $A_{\mu}(x)$. The gauge covariant derivative is then just $$ D_{\mu}\psi(x) = \partial_{\mu}\psi(x) + A_{\mu}(x)\psi(x)$$
If we perform a local gauge transformation $$\psi(x) \rightarrow \psi'(x) = g(x)\psi(x)$$ then, provided we make a corresponding transformation $$ A_{\mu}(x) \rightarrow A'_{\mu}(x) = A_{\mu}(x) + g(x)\partial_{\mu}g^{-1}(x) \ \ (1) $$ the gauge covariant derivative transforms like $$ D_{\mu}\psi'(x) = D_{\mu}(g(x)\psi(x)$$ $$ = \partial_{\mu}(g(x)\psi(x)) + A'_{\mu}(x)g(x)\psi(x)$$ $$ = \partial_{\mu}(g(x)\psi(x)) + [A_{\mu}(x)+g(x)\partial_{\mu}g^{-1}(x)]g(x)\psi(x)$$ $$ = g(x)\partial_{\mu}\psi(x) + (\partial_{\mu}g)\psi(x) + A_{\mu}(x)g(x)\psi(x) + g(x)(\partial_{\mu}g^{-1}(x))g(x)\psi(x)$$ $$ = g(x)(\partial_{\mu}\psi(x) + A_{\mu}(x)\psi(x) = g(x)D_{\mu}\psi(x)$$
Where for the last step we used $$ 0 = \partial_{\mu}1 = \partial_{\mu}(g(x)g^{-1}(x)) = (\partial_{\mu}g(x))g^{-1}(x)+g(x)(\partial_{\mu}g^{-1}(x)) $$ So $D_{\mu}\psi(x)$ transforms covariantly, in a way that ensures the Lagrangian is gauge invariant. If we write $g(x) = e^{i\alpha(x)}$ then the transformation law (1) becomes $$ A_{\mu}(x) \rightarrow A'_{\mu}(x) = A_{\mu}(x) - i\partial_{\mu}\alpha(x) $$
However, this conserved current is only obtained when the matter fields are on-shell, but we (presumably) require gauge invariance of the action even when the fields are off-shell. (Please correct me if this is incorrect).
We only require gauge invariance on-shell. In an Abelian theory, something special happens and the equations $L_\mu = \partial^\nu F_{\nu\mu} - j_\mu = 0$ is gauge invariant, off-shell, i.e. $\delta L_\mu = 0$. However, in non-Abelian gauge theories, this is not true and there you have $\delta L_\mu = - i [ L_\mu , \Lambda ]$ which is only invariant on-shell.
Why should we want to do this anyway? Even if we have the extra term $j^\mu\partial_\mu\Lambda$ Maxwell's equations are unaffected which was the reason we wanted gauge invarance in the first place.
Why are the Maxwell equations unchanged by this term? For instance, this is not true in scalar QED, where the "conserved" current depends on the scalar field $\phi$ as well as the gauge field $A_\mu$. Also, the presence of this term will screw with the matter equations of motion.
A thought has also occurred to me when writing this post; if we allow the term $j^\mu\partial_\mu\Lambda$ to appear in the Lagrangian then $\partial_\mu\Lambda$ itself becomes a dynamical field.
Yes, you can do this. This is closely related to the Stueckelberg action.
Another related question that might help you is - Why we need gauge invariance in the first place? For this read this.
Best Answer
I think you're missing that the partial derivative in $D_\mu$ acts on the phase factor. If you start only with the phase transformation of the Dirac term:
$$\mathcal{L}_D' = \bar \psi e^{i\alpha(x)} \left[ i\gamma^\mu(\partial_\mu + ieA_\mu) - m \right] \psi e^{-i\alpha(x)} $$
you can almost pull the $e^{i\alpha(x)} $ through and cancel them. However, the $\partial_\mu=\partial/\partial x^\mu$ acts on $ e^{i\alpha(x)} $ and gives you an additional term (product, then chain rule). So after canceling, $\partial_\mu \rightarrow \partial_\mu + $ something with $\alpha(x)$. Then perform the $A_\mu \rightarrow A_\mu'$ substitution, and the terms should exactly cancel out.