The "branch-cuts" along the imaginary axis seem to be a software artifact. The value of $\sqrt{z^2-1}$ is (apparently) computed by first computing $z^2-1$, and then taking the square root, choosing the square root with non-negative real part [and probably the one with non-negative imaginary part when the square root is purely imaginary]. That loses the information where the value whose square root shall be taken comes from, and thus leads to an even function (since $z^2-1$ is even), instead of a continuous branch, which would be an odd function (with the branch-cut $[-1,1]$). To avoid that, you need to remember where $z$ lies, and for $z$ in the left [or in the right, if you want the other branch] half-plane multiply the computed value with $-1$ [also for $z$ in the upper or lower half of the imaginary axis].
Then you should get a nice smooth plot, with a branch-cut on the interval $[-1,1]$.
For branch cuts I thought we had to identify where the argument of the square root is zero, which gives us the branch points and then we are free to take lines from those branch points to infinity in any way we like.
That describes the situation for $\sqrt{z}$ and similar, but generally, there are a few other considerations. If you have a rational function $f$, and consider $\sqrt{f(z)}$, the potential branch points are the zeros and poles of $f$.
But zeros and poles of even multiplicity are not actual branch points (in a neighbourhood of such a point you have $f(z) = (z-z_0)^{2k}\cdot h(z)$ with a holomorphic $h$ that does not vanish at $z_0$, so $\sqrt{h(z)}$ is holomorphic in a neighbourhood of $z_0$ and $\sqrt{f(z)} = (z-z_0)^k\sqrt{h(z)}$ is a well-defined meromorphic function in a neighbourhood of $z_0$), so only the zeros and poles of odd multiplicity are actual branch-points. [The situation is similar for $k$-th roots, zeros and poles whose multiplicity is a multiple of $k$ aren't branch-points, zeros and poles of other multiplicities are; essential singularities of $f$ may or may not be branch-points of $\sqrt[k]{f(z)}$, consider $\exp \frac{1}{z}$ and $z\exp \frac{1}{z}$.] Then you need to connect each branch-point to at least one other branch-point with a branch-cut to obtain a domain on which a holomorphic branch of $\sqrt[k]{f(z)}$ is defined.
In the situation here, $\infty$ is a pole of multiplicity $2$ of $z^2-1$, so we have only the two branch-points $1$ and $-1$ of $\sqrt{z^2-1}$. Any curve connecting the two branch-points gives a region on which a holomorphic branch of $\sqrt{z^2-1}$ is defined, whether that curve passes through $\infty$ or not. The most common choices of branch-cuts are the interval $[-1,1]$ and the other arc of the real axis (plus $\infty$) connecting the two points, $\bigl(\mathbb{R}\setminus (-1,1)\bigr) \cup \{\infty\}$ [the latter branch-cut makes $\sqrt{z^2-1}$ an even function on its domain].
The image shows the branch cuts in red for the first identity on the left and the second on the right.
I don't think splitting the radicand into factors is a good idea here. For $\sqrt{z+1}$ and $\sqrt{z-1}$ (resp. $\sqrt{1+z}$ and $\sqrt{1-z}$), you need a branch-cut connecting $\mp 1$ to $\infty$, but for $\sqrt{z^2-1}$, $\infty$ is in no way a special point. The branch-cuts of a product can [often] be chosen quite differently from the branch-cuts of the factors.
Why is there a branch cut extending the imaginary axis in the second case?
There shouldn't be. It's most likely an artifact of how the values are computed by the software.
How would I tailor the inbuilt python function to choose my own branch cuts?
Write a wrapper that multiplies the result of the inbuilt square root function with $-1$ for $z$ in the appropriate parts of the plane, according to your chosen branch-cuts.
Given a multivalued analytic expression, i.e., an expression containing surds or logs, a branch cut, according to the definition in wikipedia, is a curve (or a collection of curves) $\Gamma$ in the complex plane such that it is possible to define a single valued function $f:\>\Omega\to{\mathbb C}$ in the domain $\Omega:={\mathbb C}\setminus \Gamma$ realizing an accepted value of the given expression at each point $z\in\Omega$.
Such a branch cut is by no means uniquely determined. In the case of $\sqrt{z}$ one could as well take away the full imaginary axis, if the concrete problem at hand involves only points in the right half-plane. But of course one is inclined to make the cut as small as possible. Since each "singularity" necessitates a cut originating there it suggests itself that in the case of two singularities one hopes that a single cut connecting them would suffice. But this need not to be the case. If, e.g., the given expression is $\sqrt{z}\>\root 3\of{z-1}$ you cannot obtain singlevaluedness by cutting out the segment $[0,1]$.
In the case of the example $f(z):=\sqrt{z}\sqrt{z-1}$ each surd picks up a factor $-1$ when continued analytically around its singularity. Therefore the product $f$ picks up the factor $(-1)^2=1$ when continued analytically around the cut $[0,1]$. It follows that the two possible choices $\pm\sqrt{2}$ for $f(2)$ each lead to a well defined branch of $f$ in $\Omega$.
In your second example we have the four "square-root points" $\pm1\pm i$. We could make a "cross-cut" as you proposed, resulting in a doubly connected domain. But making two vertical (or two horizontal) cuts leaves a topologically more interesting domain (having two holes), and still a single valued branch of $f$ can be selected. If in this example we had a fourth root instead of a square root it would be necessary to resort to the cross cut. (Think about it!)
Best Answer
Multivalued analytic functions can be made (and have been made) a rigorous notion. This notion is sometimes useful. But modern textbooks prefer not to use it, because it is hard to deal with rigorously. (What is a sum or product of multivalued functions?)
There are several substitutes:
To use only single valued branches. For this you need to restrict the region (usually by making some "branch cuts"). This is the way most elementary textbook take.
To use sheafs (which can be considered a rigorous framework into which multivalued functions fit, though there are alternative approaches). This is the approach used in the standard graduate textbook of Ahlfors.
To translate everything to the language of functions on Riemann surfaces. This is perhaps the most useful approach, at least in one complex variable, which was proposed by Riemann.