It is actually possible, and not too difficult, to prove this without expanding the exponentials to first order only.
What you are trying to prove is $S^\dagger \gamma^0 = \gamma^0 S^{-1}$, this is equivalent to
$$ \gamma^0 S^\dagger \gamma^0 = S^{-1} $$
because $( \gamma^0 )^2 = 1$.
Expand $S^\dagger = \sum_n \frac{1}{n!} \left( \frac i 4 \omega_{\mu\nu} \sigma^{\mu\nu\dagger} \right)^n$ and you see that it is sufficient to prove
$$ \gamma^0 \left( \sigma^{\mu\nu\dagger} \right)^n \gamma^0 = \left( \sigma^{\mu\nu} \right)^n \;. $$
Now, the left hand side is equal to $\left( \gamma^0 \sigma^{\mu\nu\dagger} \gamma^0 \right)^n$, hence we only need to prove that
$$ \gamma^0 ( \sigma^{\mu\nu} )^\dagger \gamma^0 = \sigma^{\mu\nu} \;. $$
This is obvious from your equation (2).
By the way that is not the Dirac equation, but the Dirac Lagrangian/action.
1) $m$ is a scalar. Mass terms for fermionic fields are allowed in the Standard Model, you are confusing mass terms for gauge fields which are not allowed on their own, but come in through spontaneous symmetry breaking (Higgs mechanism).
2) $m\Psi \bar{\Psi}$ is allowed, as any phase change (local of global) will cancel out. Again, you are confusing a mass term for a gauge field $mA^\mu A_\mu$. This would violate gauge invariance, $A_\mu \rightarrow A_\mu + \partial_\mu\Lambda$.
EDIT The above two answers are true for Dirac Lagrangians and EM interactions, as stated in the question. In the presence of a weak interaction, fermions are affected differently depending on their chirality. This then introduces a gauge-dependent mass term, only saved by the Higgs mechanism.
3) No, it's only the equation of motion for a spin-1/2 fermion. If you construct the spin operator $\mathbf{S}^2$, you'll find that the eigenvalues are $\frac{3}{4}\hbar^2$, corresponding to $S(S+1)$ with $S = 1/2$.
For spin 3/2 fermions, the equation is this, etc.
4) What is the interpretaion of the complex conjugate of a number? Really, you just make up whatever term in the Lagrangian gives you the correct Dirac equation (when applying the Euler-Lagrange equations), which you know is correct from experiment.
You can always justify the form of the Lagrangian, for example having $\Psi \bar{\Psi}$ means that you have local and global phase invariance, and that the resulting potential $\propto \Psi^2$ has a minimum, thereby leading to a stable field theory.
5) $\bar{\Psi}$ is not an interaction. The Dirac equation is the equation obeyed by the a free massive spin-1/2 fermion. Or, more correctly, by its field operator (whereby I am making the distinction between relativistic quantum mechanism and quantum field theory).
NB though that you can just set the mass to $0$, and you get the so called Weyl fermions.
To get interacations, you need non-linear terms.
The one that usually comes up is $\propto J^\mu A_\mu = \Psi \gamma^\mu \bar{\Psi}A_\mu$, where $A_\mu$ is the electro-magnetic gauge potential. This term is not linear, and it represents the interaction between a spin-1/2 fermion $\Psi$ and a spin-1 vector boson $A_\mu$.
You can also make two different fermions interact by having a term that goes like $\Psi_1 \cdot \Psi_2$, where both obey their individual Dirac equation.
Wikipedia is really bad for this stuff unless you already know roughly what is going on, I would recomend looking any undergraduate lecture series on gauge field theories. The Cambridge one is quite good.
Best Answer
A scalar is, like other scalars, merely just a number. Think about their matrix representation: $$ \psi=(\phi_R\; \phi_L)^T$$ and $$\bar{\psi}=\psi^\dagger\gamma^0 =(\phi^*_L \; \phi_R^*).$$ It is clear that $\bar{\psi}\psi$ is a 1x1 matrix (scalar), and of course the operation is legitimate.
Those other forms are also 1x1 matrices. However under Lorentz transformation, an expression transforms like a scalar, or like a vector, etc. So it has nothing to do with the term "scalar" as a 1×1 matrix.