Your definitions are in fact those for proper, orthochronous Lorentz transformation, not for general Lorentz transformations, that's why you're having trouble telling the difference! (If it makes you feel any better, yesterday a collegue and I were trying to debug his test setup and two hours of complex testing passed before we two geniusses realised we hadn't switched the power to a key bit of kit on!)
A general Lorentz transformation is defined by criterion 1) alone - it is simply any linear transformation that preserves the quadratic form $t^2 - x^2 - y^2 - z^2$.
The proper, orthochronous transformations are those that belong to the identity connected component $SO^+(1,\,3)$ of the full Lorentz group $O(1,\,3)$. That is, the proper, orthochronous transformations are those that can be reached from the $4\times4$ identity matrix by following a continuous path through the Lorentz group. Equivalently, they are the matrices that are on paths through the Lorentz group defined by the differential equation:
$$\begin{array}{lcl}\mathrm{d}_s L &=& (a_x(s)\, J_x + a_2(s)\, J_y+a_z(s)\, J_z + b_x(s)\, K_x + b_y(s)\, K_y+b_z(s)\, K_z)\,L\\L(0) &=& \mathrm{id}\end{array}\tag{1}$$
where $\mathrm{id}$ is the $4\times 4$ identity, $a_j(s),\,b(s)$ are continuous functions of the parameter $s$ and the $J_j,\,K_J$ are six matrices $4\times 4$ that span the Lie algebra of the Lorentz group, i.e. the real vector space of all possible "tangents to the identity", i.e. all possible values of $\mathrm{d}_s L|_{s=0}$. One possible set is:
$$\begin{array}{lcllcllcl}J_x&=&\left(\begin{array}{cccc}0&0&0&0\\0&0&0&0\\0&0&0&-1\\0&0&1&0\end{array}\right)&J_y&=&\left(\begin{array}{cccc}0&0&0&0\\0&0&0&1\\0&0&0&0\\0&-1&0&0\end{array}\right)&J_z&=&\left(\begin{array}{cccc}0&0&0&0\\0&0&-1&0\\0&1&0&0\\0&0&0&0\end{array}\right)\\K_x&=&\left(\begin{array}{cccc}0&10&0&0\\1&0&0&0\\0&0&0&0\\0&0&0&0\end{array}\right)&K_y&=&\left(\begin{array}{cccc}0&0&1&0\\0&0&0&0\\1&0&0&0\\0&0&0&0\end{array}\right)&K_z&=&\left(\begin{array}{cccc}0&0&0&1\\0&0&0&0\\0&0&0&0\\1&0&0&0\end{array}\right)\end{array}\tag{2}$$
(See how the $J_j$ are skew-Hermitian, thus have pure imaginary eigenvalues, so that $\exp(a_j\, J_j)$ has stuff like $\sin,\,\cos$ of an angle and is a rotation matrix, whereas the $K_j$ are Hermitian, with purely real eigenvalues, so that $\exp(b_j\, K_j)$ has stuff like $\sinh,\,\cosh$ of a rapidity and is a pure boost matrix).
An intuitive description: imagine you are sitting at the console of your spaceship's "hyperdrive": it has two track balls each with their own levers marked "spin" and "boost" and a set of accelerometers - linear and rotational. Your spaceship is initially moving inertially. You roll the trackballs around to set the axis of rotation and direction of boost respectively. When you pull on the levers, the spin lever accelerates the angular speed about the rotation axis, the boost lever accelerates the linear velocity in the boost direction. Otherwise put, the "rotate" trackball and its lever set the superposition weights $a_j(s)$ of the $J_j$ in (1) when we use the definitions in (2) and the "boost" trackball and its lever set the weights $b_j$ of the $K_k$. You go through a control sequence, ending so that your accelerometers read nought, so that now a set of $x,\,y,\,z$ axes attached to your spaceship is moving inertially relative to the beginning frame. The proper, orthochronous transformations are precisely every transformation between the beginning frame and an inertial frame that you can reach with your controls.
However, there are other transformations possible that preserve the quadratic form $t^2 - x^2 - y^2 - z^2$ that don't fulfill your criteria 2. and 3. but they follow only a "simple" pattern that makes them "not much different" from the identity connected component. A discrete subgroup of the full Lorentz group is $\{\mathrm{id},\,P,\,T,\,P\,T\}$ with
$$P=\text{"parity flipper"} = \mathrm{diag}[1,\,-1,\,-1,\,-1];\\T=\text{''time flipper''} = \mathrm{diag}[-1,\,1,\,1,\,1]$$
With the exception of $\mathrm{id}$, none of these can be reached from the identity by paths fulfilling (1). They belong to different connected components from the identity component $SO^+(1,\,3)$. Indeed, the identity connected component is a normal subgroup of the full Lorentz group $SO(1,\,3)$ and the quotient $O(1,\,3) / SO^+(1,\,3)$ is the little group $\{\mathrm{id},\,P,\,T,\,P\,T\}$. So any full Lorentz transformation can be represented as a proper orthochronous transformation followed by one of $P,\,T$ or $P\,T$. There are four separate connected components to the full Lorentz group. (an aside: $\{\mathrm{id},\,P,\,T,\,P\,T\}$ is the Klein "fourgroup": the only possible group of four elements aside from $\mathbb{Z}_4$).
To sniff out a non-proper or non-orthochronous transformation, you do one of two things:
Compute the matrix's determinant. If it is -1, then you know it has to include one of $P$ or $T$, so it's not proper or not orthochronous. You can further differentiate the $P$ and $T$ cosets by looking at the $L_0^0$ component of the transformation: the $T$ coset has $L_0^0<0$, since such a transformation swaps the roles of the "future" and "past" (actually reflects Minkowsky vector space in the $t=0$ plane).
If the determinant is $+1$, then it may belong to the $P\,T$ coset of $O(1,\,3)$. As in point 1, the $T$ coset and the $P\,T$ coset can be recognised as transformations with $L_0^0<0$
I think the clearest way to think about this is to say that the gamma matrices don't transform. In other words, the fact that they carry a vector index doesn't mean that they form a four vector. This is analogous to how the Pauli matrices work in regular quantum mechanics, so let me talk a little bit about that.
Suppose you have a spin $1/2$ particle in some state $|\psi\rangle$. You can calculate the mean value of $\sigma_x$ by doing $\langle \psi | \sigma_x | \psi\rangle$. Now let's say you rotate your particle by an angle $\theta$ around the $z$-axis. (Warning: There is about a 50% chance my signs are incorrect.) You now describe your particle with a different ket, given by $|\psi'\rangle = \exp(-i \sigma_z \theta /2)$. Remember that we are leaving the coordinates fixed and rotating the system, as is usually done in quantum mechanics. Now the expectation value is given by
$$\langle \psi' | \sigma_x | \psi' \rangle = \langle \psi |\, e^{i\sigma_z \theta /2}\, \sigma_x\, e^{-i \sigma_z \theta / 2}\, | \psi\rangle$$
There is a neat theorem, not too hard to prove, that says that
$$e^{i\sigma_z \theta /2}\, \sigma_x\, e^{-i \sigma_z \theta / 2} = \cos \theta\, \sigma_x -\sin \theta\, \sigma_y$$
So it turns out that the expectation value for the rotated system is also given by $\langle \psi |\, \cos \theta\, \sigma_x -\sin \theta\, \sigma_y \, |\psi\rangle = \cos \theta\, \langle \sigma_x \rangle - \sin \theta\, \langle \sigma_y \rangle$. It's as if we left our particle alone and rotated the Pauli matrices. But note that if we apply the rotation to $|\psi\rangle$, then we don't touch the matrices. Also, I never said that I transformed the matrices. I just transformed the state, and then found out that I could leave it alone and rotate the matrices.
The situation for a Dirac spinor is similar. The analogous identity is that $S(\Lambda) \gamma^\mu S^{-1}(\Lambda) \Lambda^\nu_{\ \mu} = \gamma^\nu$. This is just something that is true; nobody said anything about transforming $\gamma^\mu$. There's no $\gamma^\mu \to \dots$ here.
Now let's take the Dirac equation, $(i \gamma^\mu \partial_\mu - m)\psi = 0$, and apply a Lorentz transformation. This time I will change coordinates instead of boosting the system, but there's no real difference. Let's say we have new coordinates given by $x'^\mu = \Lambda^\mu_{\ \nu} x^\nu$, and we want to see if the Dirac equation looks the same in those coordinates. The field $\psi'$ as seen in the $x'^\mu$ frame is given by $\psi' = S(\Lambda) \psi \iff \psi = S^{-1}(\Lambda) \psi'$, and the derivatives are related by $\partial_\mu = \Lambda^\nu_{\ \mu} \partial'_\nu$. Plugging in we get $(i\gamma^\mu \Lambda^\nu_{\ \mu} \partial'_\nu-m) S^{-1}(\Lambda)\psi' = 0$, which doesn't really look like our original equation. But let's multiply on the left by $S(\Lambda)$. $m$ is a scalar so $S$ goes right through it and cancels with $S^{-1}$. And in the first term we get $S(\Lambda)\gamma^\mu S^{-1}(\Lambda) \Lambda^\nu_{\ \mu}$, which according to our trusty identity is just $\gamma^\nu$. Our equation then simplifies to
$$(i\gamma^\mu \partial'_\mu - m)\psi'=0$$
This is the same equation, but written in the primed frame. Notice how the gamma matrices are the same as before; when you're in class and the teacher writes them on the board, you don't need to ask in what coordinate system they are valid. Everyone uses the same gamma matrices. They're not really a four-vector, but their "transformation law" guarantees that anything written as if they were a four vector is Lorentz invariant as long as the appropiate spinors are present.
Best Answer
Sometimes, diagrams are more useful than equations!
Note that $p^2 = - (p^0)^2 + |\vec{p}|^2$ remains unchanged under any Lorentz transformation. Suppose that $p^2 < 0$ (say $p^2 = -1$). A plot of this hypersurface is shown below
Lorentz transformations that are connected to the identity are continuous transformations take a single point on this line to a different point on this line. Clearly, if I start off on the branch of the line with $p^0 > 0$, then no continuous transformation will ever take me to the $p^0 < 0$ branch. Thus, if $p^2 < 0$, Lorentz transformations cannot change the sign of $p^0$. Note that this is not true for the discrete Lorentz transformations such as $T$ which, of course, does change the sign of $p^0$.
On the other hand, if $p^2 > 0$ (say $p^2 = 1$), then the hypersurface looks like
It is now possible to have a continuous Lorentz transformation that changes the sign of $p^0$.