Paul,
This particular writing of the problem in the article I have always thought was sloppy as well. The most confusing part of the discussion is the statement "The continuity equation is as before". At first one writes the continuity equation as:
$$\nabla \cdot J + \dfrac{\partial\rho}{\partial t} = 0$$
Although the del operator can be defined to be infinite dimensional, it is frequently reserved for three dimensions and so the construction of the sentence does not provide a clear interpretation. If you look up conserved current you find the 4-vector version of the continuity equation:
$$\partial_\mu j^\mu = 0$$
What is important about the derivation in the wikipedia article is the conversion of the non time dependent density to a time dependent density, or rather:
$$\rho = \phi^*\phi$$
becomes
$$\rho = \dfrac{i\hbar}{2m}(\psi^*\partial_t\psi - \psi\partial_t\psi^*)$$
the intent is clear, the want to make the time component have the same form as the space components. The equation of the current is now:
$$J^\mu = \dfrac{i\hbar}{2m}(\psi^*\partial^\mu\psi - \psi\partial^\mu\psi^*)$$
which now contains the time component. So the continuity equation that should be used is:
$$\partial_\mu J^\mu = 0$$
where the capitalization of $J$ appears to be arbitrary choice in the derivation.
One can verify that this is the intent by referring to the article on probability current.
From the above I can see that the sudden insertion of the statement that one can arbitrarily pick $$\psi$$ and $$\dfrac{\partial \psi}{\partial t}$$ isn't well explained. This part the article was a source of confusion for me as well until one realized that the author was trying to get to a discussion about the Klein Gordon equation
A quick search of web for "probability current and klein gordan equation" finds good links, including a good one from the physics department at UC Davis. If you follow the discussion in the paper you can see it confirms that the argument is really trying to get to a discussion about the Klein Gordon equation and make the connection to probability density.
Now, if one does another quick search for "negative solutions to the klein gordan equation" one can find a nice paper from the physics department of the Ohio University. There we get some good discussion around equation 3.13 in the paper which reiterates that, when we redefined the density we introduced some additional variability. So the equation:
$$\rho = \dfrac{i\hbar}{2mc^2}(\psi^*\partial_t\psi - \psi\partial_t\psi^*)$$
(where in the orginal, c was set at 1)
really is at the root of the problem (confirming the intent in the original article). However, it probably still doesn't satisfy the question,
"can anyone show me why the expression for density not positive
definite?",
but if one goes on a little shopping spree you can find the book Quantum Field Theory Demystified by David McMahon (and there are some free downloads out there, but I won't link to them out of respect for the author), and if you go to pg 116 you will find the discussion:
Remembering the free particle solution $$\varphi(\vec{x},t) = e^{-ip\cdot x} = e^{-i(Et- px)}$$ the time derivatives are $$\dfrac{\partial\varphi}{\partial t} = -iEe^{-i(Et- px)}$$ $$\dfrac{\partial\varphi^*}{\partial t} = iEe^{i(Et- px)}$$ We have $$\varphi^*\dfrac{\partial\varphi}{\partial t} = e^{i(Et- px)}[-iEe^{-i(Et- px)}] = -iE$$ $$\varphi\dfrac{\partial\varphi^*}{\partial t} = e^{-i(Et- px)}[iEe^{i(Et- px)}] = iE$$ So the probability density is $$\rho = i(\varphi^*\dfrac{\partial\varphi}{\partial t} - \varphi\dfrac{\partial\varphi^*}{\partial t}) = i(-iE-iE) = 2E$$ Looks good so far-except for those pesky negative energy solutions. Remember that $$E = \pm\sqrt{p^2+m^2}$$ In the case of the negative energy solution $$\rho = 2E =-2\sqrt{p^2+m^2}<0$$ which is a negative probability density, something which simply does not make sense.
Hopefully that helps, the notion of a negative probability does not make sense because we define probability on the interval [0,1], so by definition negative probabilities have no meaning. This point is sometimes lost on people when they try to make sense of things, but logically any discussion of negative probabilities is non-sense. This is why QFT ended up reinterpreting the Klein Gordan equation and re purposing it for an equation that governs creation and annihilation operators.
The other answers are correct but it's worth stating, given your title's question, what the continuity equation is actually telling us.
A continuity equation is the expression of balance between the rate of change of the amount of "stuff" inside a region $M$ on the one hand and the total flux of that stuff through the boundary $\partial M$ on the other. It is the translation into mathematics of the statement, "what goes in, stays in unless it comes out again through the boundary". For example, like a fluid: the amount of fluid in a volume can only change by the total flux of fluid through that volume's boundary. By shrinking the test volume and taking the limit, one can show that this notion is the same as the equation in Nate Stemen's answer if one takes $\rho$ to be the density of a fluid and $\vec{j}$ to be the mass flow rate.
In your case, the "stuff" is the total probability per unit volumne of the position operator's yielding a measurement within that volume. Probability flux is perhaps a little more abstract than mass flow rate, but the fulfilling of a continuity equation over the whole of space simply means that the probability that the measurement will be somewhere is constant. Which it is: the probability that the measurement will lie somewhere in all space is unity!! And the continuity equation results from applying this principle to an arbitrary volume, the volume's boundary and the volume's complement. The decrease in probability of measurement within the volume must match the increase in the probability of measurement without, which, in turn, must match the integrated flux through the boundary.
Now, ponder these thoughts for a bit, and with them in mind, see whether you can reproduce, from your own reasoning, AlphaGo's answer.
Best Answer
First note that Schrödinger's equation can be understood to come from an action. The Lagrangian is $$L = \int~\mathrm d^3x \,\,\psi^†(x) \left(i \frac{\partial}{\partial t} - \frac{\nabla^2}{2m}\right)\psi(x) - \psi^†(x)\psi(x)V(x)$$
The Euler-Lagrange equation for $\psi^†(x)$ is exactly the Schrödinger equation. Since the dynamics of $\psi(x)$ are determined by Lagrangian mechanics in this way, Noether's theorem applies without any caveats.^^
In particular, this Schrödinger Lagrangian has a $U(1)$ symmetry corresponding to $\psi(x) \mapsto e^{i\alpha}\psi(x)$. The corresponding conserved charge current density is $$\rho = j^0 = \frac{\partial L}{\partial \dot{\psi}}\delta \psi = \psi^†\psi(x)$$ $$\vec{j}^i = \frac{\partial L}{\partial_i\psi}\delta \psi+\frac{\partial L}{\partial_i\psi^†}\delta \psi^†=\frac{i}{2m}\left((\partial^i\psi^†)\psi-\psi^†\partial^i\psi\right),$$ which is the well-known probability current density.
^^ In non-relativistic quantum mechanics the wavefunction $\psi(x)$ is a "classical" variable in that it is simply a function from space and time to $\mathbb{C}$. Noether's theorem works exactly the same for it as in classical mechanics. In quantum field theory the relevant objects $\psi(x)$ become quantum operators and the usual arguments have to be modified somewhat.