Let's suppress some dimensions to simplify:
$$\Delta s^2 = -(c\Delta t)^2 + \Delta x^2 $$
This quantity $$\Delta s^2$$ is preserved by changes of reference frame, just as in Galilean physics the quantity $$\Delta r^2 = \Delta x^2 + \Delta y^2 $$ is preserved by rotations.
Notice it is also the equation of a hyperbola. Thus, the effect of a frame shift is to slide events around on hyperbolae of constant $\Delta s^2$.
Here's a helpful image from Wikipedia (attribution below):
Ignore the vectors and just look at the hyperbolae. Events on a given hyperbola must, under a given frame boost, remain on that hyperbola.
Now you might notice those hyperbolae seem to come in two classes, those on the top and those on the bottom. The "v=c" hyperbolae - the straight lines - divide the two. Events on those are said to be "lightlike (or null) separated from the origin". Notice that for these, $\Delta s^2$ is just zero.
The hyperbolae in the purple regions are said to be timelike separated from the origin. This is because no matter how much they slide around on their hyperbolae, their ordering compared to the origin never changes. Any events in the purple regions which occur before (after) the origin will occur before (after) the origin to all observers. Thus, this set of events - plus the null events - is said to be causally connected with the origin. The fact that the ordering of these events with the origin in time is fixed motivates the term.
The hyperbolae in the white regions do not have this property. Some observers think they happened before O, while some think they happened after. It had therefore better be true that nothing about O depend logically on happening after (or before) these events! Otherwise we could break logic by running really fast.
However, notice that it is not possible to slide the white-region events from one side of the origin to the other. This makes the separation more like our normal ideal of "distance", so we say the events are spacelike separated.
Image attribution:
"Minkowski lightcone lorentztransform" by Maschen - Own work. Licensed under Public Domain via Commons
Here's a geometric interpretation... (admittedly a top down approach).
(The essence of the idea is inspired by the Bondi k-calculus and
by the "product of times" formula seen in A.A. Robb's "Optical geometry of motion"
[see also Geroch's "General Relativity from A to B"].)
update
(in case this is part of the question)
Why is the invariant of this form $S^2=\Delta t^2-\Delta x^2$?
A good motivation is a radar measurement of an event $P=(t_P,x_P)$
not on your worldline.
Suppose you are an inertial observer.
To measure event $P$,
imagine sending a light signal to $P$ and waiting for its echo, and
noting the time on your wristwatch when you sent it $t_{send}$ and
when you receive it $t_{rec}$. From those two times, you would assign
event $P$ the following coordinates:
- time coordinate $t_P=\frac{1}{2}(t_{rec}+t_{send})$ [the midway time during the round trip]
- spatial coordinate $x_P=\frac{1}{2}(t_{rec}-t_{send})$ [half of the roundtrip time (multiplied by c)]
Note that $t_{rec}=(t_P+x_P)$ and $t_{send}=(t_P-x_P)$.
Consider another inertial observer who met you when your wristwatch
read zero and they set their wristwatch to zero. They would make
analogous measurements of event $P$.
Thus, note that
$t'_{rec}=(t'_P+x'_P)$ and $t'_{send}=(t'_P-x'_P)$.
Taking an image from Bondi's "Relativity and Common Sense"
It turns out for events joined by a future-directed light-signal
that $t'_{send}=K t_{send}$ (where $K$ is a proportionality constant
[which depends on the relative velocities of the observers])
and
that
$t_{rec}=K t'_{rec}$ (the same proportionality constant).
(This is actually the Lorentz Boost transformation.)
So, it turns out that while $t_P\neq t'_P$ and $x_P\neq x'_P$,
it turns out
that $$t_{rec}t_{send}=t'_{rec}t'_{send}.$$
(This is the product of times formula [seen in Robb and in Geroch].)
Expressing this back in terms of the $t_P$s and $x_P$s,
this says
that $$({t_P}^2-{x_P}^2)=({t'_P}^2-{x'_P}^2).$$
(This is the invariance of the square-interval.)
Let's consider the (1+1)-dim case and write $$S^2=\Delta t^2-\Delta x^2=(\Delta t+\Delta x)(\Delta t-\Delta x)=\Delta u \Delta v.$$
I'm going to define quantities $u=t+x$ and $v=t-x$
called (up to sign and scaling conventions) "light-cone coordinates" since these are coordinates with axes along the light-cone.
Written in this way $S^2=\Delta u \Delta v$ looks like the "area of a diamond" ( a parallelogram whose sides are parallel to the light cone). One corner could be taken to be the origin and the opposite corner traces out a hyperbola (the Minkowski-circle, the curve of constant interval from the origin) as you do a Lorentz boost-transform.
(Recall that for a point $(x,y)$ along the hyperbola $xy=1$, a rectangle with corners at $(0,0)$ and at $(x,y)$, with sides parallel to the x- and y-axes, has area 1.)
Some important features of the Lorentz boost-transformation in (1+1)-Minkowski spacetime.
- It has a determinant of 1. So, it preserves areas.
- It has eigenvectors along the light-cone (this is the invariance of the speed of light).
- The eigvenvalues are $k$ and $1/k$ (since the product is equal to the determinant).
(It turns out that the eigenvalues are the Doppler factors.)
So, under a boost-transformation
these diamonds transform (are reshaped) into other diamonds with the same area.
Check out my visualization:
https://www.geogebra.org/m/Jq4jDMRW
I exploited this "area of a diamond" to visualize proper time along an inertial worldline and developed a method to do calculations on "rotated graph paper".
Check out my visualization:
https://www.geogebra.org/m/HYD7hB9v
The idea is written up here:
"Relativity on rotated graph paper"
American Journal of Physics 84, 344 (2016);
https://doi.org/10.1119/1.4943251
Best Answer
You have two great answers, but you might find it interesting to know that it was once common for spacetime in SR to be described with an imaginary time axis. That allowed people to consider that it was a straightforward Cartesian arrangement, where the calculation of a length was through the usual Pythagorean method of taking the square root of the squares of the component displacements along the four orthogonal axes. The fact that the time axis was iT meant that when you squared the displacement along the time axis you automatically got minus T squared.
The idea of an imaginary time axis also made the Lorenz transformation look like straightforward rotations in a 4D space, so some people thought that would make SR easier to grasp if described in that way. However, it turns out that using an imaginary time axis only works straightforwardly for SR, and causes all kinds of complications in GR, so it dropped out of fashion.