In Quantum Mechanics, the Schrödinger equation is just the statement that energy is the generator of time evolution. In the QM framework this is written as
$$H|\psi(t)\rangle=i\hbar\dfrac{d|\psi(t)\rangle}{dt}.$$
Now, if we have the position representation $\mathbf{r}$ we can form the wavefunction $\Psi(\mathbf{r},t)=\langle \mathbf{r}|\psi(t)\rangle$ and this becomes
$$\langle \mathbf{r}|H|\psi(t)\rangle=i\hbar \dfrac{\partial\Psi}{\partial t}.$$
The usual Schrödinger equation is found when we replace $H$ by the quantized classical hamiltonian:
$$H=\dfrac{P^2}{2m}+V.$$
The question is that the equation you get for $\Psi(\mathbf{r},t)$ is not Lorentz invariant. And indeed, we used the non relativistic energy when we quantized.
Now, the canonical way to do it, is to try quantizing the relativistic version
$$E^2=p^2+m^2,$$
in units where $c=1$. To quantize this we insist that energy is the generator of time translations. This suggests that $E\mapsto i\hbar \partial_t$ while we insist that $p$ is the generator of spatial translations so that $p\mapsto -i\hbar \nabla$. This leads to
$$-\hbar^2\dfrac{\partial^2\Psi}{\partial t^2}=-\hbar^2\nabla^2\Psi+m^2\Psi,$$
or also choosing units where $\hbar =1$
$$(\square+m^2)\Psi=0.$$
Here, $\Psi$ is a wave function, hence $\Psi:\mathbb{R}^3\times \mathbb{R}\to \mathbb{C}$ and hence, despite this strange terminology, $\Psi$ is a classical field.
So for $(1)$, we just quantized the energy momentum relation, by requiring that the same relation holds in the quantum version and imposing that energy is the generator of time translations and momentum the generator of spatial translations.
Now for $(2)$, the Klein-Gordon is a wave function equation. You are just rewriting Schrödinger's equation with a particular Hamiltonian. In the same way, it is a classical field. It is a classical field because it is not operator valued. A quantum field is one operator valued field. Now, talking about making it into a quantum field, that is, dealing with the quantization of this field is another story.
The Schr$\ddot{\rm o}$dinger equation is non-relativistic and for a free particle is derived from the Hamiltonian
\begin{equation}
H\boldsymbol{=} \dfrac{p^2}{2m}
\tag{K-01}\label{eqK-01}
\end{equation}
by the transcription
\begin{equation}
H\boldsymbol{\longrightarrow} i\hbar\dfrac{\partial}{\partial t}\quad \text{and}\quad \mathbf{p}\boldsymbol{\longrightarrow} \boldsymbol{-}i\hbar\boldsymbol{\nabla}
\tag{K-02}\label{eqK-02}
\end{equation}
so that
\begin{equation}
i\hbar \dfrac{\partial \psi}{\partial t}\boldsymbol{+}\dfrac{\hbar^2}{2m}\nabla^2\psi\boldsymbol{=} 0
\tag{K-03}\label{eqK-03}
\end{equation}
For a first try to derive a relativistic quantum mechanical equation we make use of the property that according to the theory of special relativity the total energy $\;E\;$ and momenta $\;(p_x,p_y,p_z)\;$ transform as components of a contravariant four-vector
\begin{equation}
p^\mu\boldsymbol{=}\left(p^0,p^1,p^2,p^3\right)\boldsymbol{=}\left(\dfrac{E}{c},p_x,p_y,p_z\right)
\tag{K-04}\label{eqK-04}
\end{equation}
of invariant length
\begin{equation}
\sum\limits_{\mu\boldsymbol{=}0}^{3}p_{\mu} p^{\mu}\boldsymbol{\equiv}p_{\mu} p^{\mu}\boldsymbol{=}\dfrac{E^2}{c^2}\boldsymbol{-}\mathbf{p}\boldsymbol{\cdot}\mathbf{p}\boldsymbol{\equiv}m^2c^2\tag{K-05}\label{eqK-05}
\end{equation}
where $\;m\;$ is the rest mass of the particle and $\;c\;$ the velocity of light in vacuum.
Following this it is natural to take as the Hamiltonian of a relativistic free particle
\begin{equation}
H\boldsymbol{=}\sqrt{p^{2}c^2\boldsymbol{+}m^2c^4}
\tag{K-06}\label{eqK-06}
\end{equation}
and to write for a relativistic quantum analogue of \eqref{eqK-03}
\begin{equation}
i\hbar \dfrac{\partial \psi}{\partial t}\boldsymbol{=}\sqrt{\boldsymbol{-}\hbar^2c^2 \nabla^{2}\boldsymbol{+}m^2c^4}\,\psi
\tag{K-07}\label{eqK-07}
\end{equation}
Facing the problem of interpreting the square root operator on the right in eq. \eqref{eqK-07} we simplify
mathematics by removing this square root operator, so that
\begin{equation}
\left[\dfrac{1}{c^2}\dfrac{\partial^2}{\partial t^2}\boldsymbol{-}\nabla^{2}\boldsymbol{+}\left(\dfrac{mc}{\hbar}\vphantom{\dfrac{\partial^2 \psi}{\partial t^2}}\right)^2\right]\psi\boldsymbol{=}0
\tag{K-08}\label{eqK-08}
\end{equation}
or recognized as the classical wave equation
\begin{equation}
\left[\square\boldsymbol{+}\left(\dfrac{mc}{\hbar}\right)^2\right]\psi\boldsymbol{=}0
\tag{K-09}\label{eqK-09}
\end{equation}
where(1)
\begin{equation}
\square\boldsymbol{\equiv}\dfrac{1}{c^2}\dfrac{\partial^2}{\partial t^2}\boldsymbol{-}\nabla^{2}\boldsymbol{=}\dfrac{\partial}{\partial x_\mu}\dfrac{\partial}{\partial x^\mu}
\tag{K-10}\label{eqK-10}
\end{equation}
Equation \eqref{eqK-09} is the Klein-Gordon equation for a free particle. With its complex conjugate we have
\begin{align}
& \dfrac{1}{c^2}\dfrac{\partial^2 \psi\hphantom{^{\boldsymbol{*}}}}{\partial t^2}\boldsymbol{-}\nabla^{2}\psi\hphantom{^{\boldsymbol{*}}}\boldsymbol{+}\left(\dfrac{mc}{\hbar}\vphantom{\dfrac{\partial^2 \psi}{\partial t^2}}\right)^2\psi\hphantom{^{\boldsymbol{*}}}\boldsymbol{=} 0
\tag{K-11.1}\label{eqK-11.1}\\
&\dfrac{1}{c^2}\dfrac{\partial^2 \psi^{\boldsymbol{*}}}{\partial t^2}\boldsymbol{-}\nabla^{2}\psi^{\boldsymbol{*}}\boldsymbol{+}\left(\dfrac{mc}{\hbar}\vphantom{\dfrac{\partial^2 \psi}{\partial t^2}}\right)^2\psi^{\boldsymbol{*}}\boldsymbol{=} 0
\tag{K-11.2}\label{eqK-11.2}
\end{align}
Multiplying them by $\;\psi^{\boldsymbol{*}},\psi\;$ respectively and subtracting side by side we have(2)
\begin{align}
\dfrac{1}{c^2}\left(\psi^{\boldsymbol{*}}\dfrac{\partial^2 \psi}{\partial t^2}\boldsymbol{-}\psi\dfrac{\partial^2 \psi^{\boldsymbol{*}}}{\partial t^2}\right)\boldsymbol{-}\left(\psi^{\boldsymbol{*}}\nabla^{2}\psi\boldsymbol{-}\psi\nabla^{2}\psi^{\boldsymbol{*}}\vphantom{\dfrac{\partial^2 \psi}{\partial t^2}}\right)&\boldsymbol{=} 0\quad \boldsymbol{\Longrightarrow}
\nonumber\\
\dfrac{1}{c^2}\dfrac{\partial}{\partial t}\left(\psi^{\boldsymbol{*}}\dfrac{\partial \psi}{\partial t}\boldsymbol{-}\psi\dfrac{\partial \psi^{\boldsymbol{*}}}{\partial t}\right)\boldsymbol{+}\boldsymbol{\nabla \cdot}\left(\psi\boldsymbol{\nabla }\psi^{\boldsymbol{*}}\boldsymbol{-}\psi^{\boldsymbol{*}}\boldsymbol{\nabla }\psi\vphantom{\dfrac{\partial^2 \psi}{\partial t^2}}\right)&\boldsymbol{=} 0
\tag{K-12}\label{eqK-12}
\end{align}
We multiply above equation by $\;i\hbar/2m\;$ in order to have real quantities on one hand and on the other hand to have an identical expression for the probability current density vector as that one from the Schr$\ddot{\rm o}$dinger equation
\begin{equation}
\dfrac{\partial}{\partial t}\left[\dfrac{i\hbar}{2mc^2}\left(\psi^{\boldsymbol{*}}\dfrac{\partial \psi}{\partial t}\boldsymbol{-}\psi\dfrac{\partial \psi^{\boldsymbol{*}}}{\partial t}\right)\right]\boldsymbol{+}\boldsymbol{\nabla \cdot}\left[\dfrac{i\hbar}{2m}\left(\psi\boldsymbol{\nabla }\psi^{\boldsymbol{*}}\boldsymbol{-}\psi^{\boldsymbol{*}}\boldsymbol{\nabla }\psi\vphantom{\dfrac{\partial^2 \psi}{\partial t^2}}\right)\right]\boldsymbol{=} 0
\tag{K-13}\label{eqK-13}
\end{equation}
so
\begin{equation}
\dfrac{\partial \varrho}{\partial t}\boldsymbol{+}\boldsymbol{\nabla \cdot}\boldsymbol{S}\boldsymbol{=} 0
\tag{K-14}\label{eqK-14}
\end{equation}
where
\begin{equation}
\boxed{\:\:\varrho\boldsymbol{\equiv}\dfrac{i\hbar}{2mc^2}\left(\psi^{\boldsymbol{*}}\dfrac{\partial \psi}{\partial t}\boldsymbol{-}\psi\dfrac{\partial \psi^{\boldsymbol{*}}}{\partial t}\right)\:\:}\quad \text{and} \quad \boxed{\:\:\boldsymbol{S}\boldsymbol{\equiv}\dfrac{i\hbar}{2m}\left(\psi\boldsymbol{\nabla }\psi^{\boldsymbol{*}}\boldsymbol{-}\psi^{\boldsymbol{*}}\boldsymbol{\nabla }\psi\vphantom{\dfrac{\partial^2 \psi}{\partial t^2}}\right)\:\:}
\tag{K-15}\label{eqK-15}
\end{equation}
We would like to interpret $\dfrac{i\hbar}{2mc^2}\left(\psi^{\boldsymbol{*}}\dfrac{\partial \psi}{\partial t}\boldsymbol{-}\psi\dfrac{\partial \psi^{\boldsymbol{*}}}{\partial t}\right)$ as a probability density $\varrho$. However, this is impossible, since it is not a positive definite expression.
(1)
We define
\begin{align}
\blacktriangleright x^\mu\boldsymbol{=}\left(ct,\mathbf{x}\right)&\blacktriangleright \nabla^\mu\boldsymbol{=}\partial^\mu\boldsymbol{=}\dfrac{\partial}{\partial x_\mu}\boldsymbol{=}\left(\dfrac{1}{c}\dfrac{\partial}{\partial t},\boldsymbol{-}\boldsymbol{\nabla}\right)
\nonumber\\
&\blacktriangleright \nabla_\mu\boldsymbol{=}\partial_\mu\boldsymbol{=}\dfrac{\partial}{\partial x^\mu}\boldsymbol{=}\left(\dfrac{1}{c}\dfrac{\partial}{\partial t},\boldsymbol{+}\boldsymbol{\nabla}\right)\blacktriangleright\square \boldsymbol{=}\nabla^\mu\nabla_\mu \boldsymbol{=}\partial^\mu\partial_\mu \boldsymbol{=}\dfrac{\partial}{\partial x_\mu}\dfrac{\partial}{\partial x^\mu}
\nonumber
\end{align}
(2)
If $\;\psi\;$ and $\;\mathbf{a}\;$ are scalar and vector functions in $\;\mathbb{R}^{3}$ then
\begin{equation}
\boldsymbol{\nabla \cdot}\left(\psi\mathbf{a}\right)\boldsymbol{=}\mathbf{a}\boldsymbol{\cdot}\boldsymbol{\nabla}\psi\boldsymbol{+}\psi\boldsymbol{\nabla \cdot}\mathbf{a}
\nonumber
\end{equation}
Best Answer
Once you start thinking about relativity, gauge fields, qft, etc, it's easy to forget that the massless KG equation is actually just a fancy name for one of the simplest and most common equations in physics: $$ (\partial_t^2 - \partial_x^2) \, \varphi = 0 , $$ the wave equation!
The most familiar example is waves on a string. Here's the answer in that context:
$$ (\partial_t^2 - \partial_x^2 + m^2) \, \varphi = 0 $$
With $m=0$ you are talking about waves on a string, where each little string segment is coupled only to its neighbors. (We call this the "wave equation".)
With $m\neq 0$ each little string segment has a harmonic restoring force back to its equilibrium displacement, in addition to neighbor coupling. (I'd call this the "wave equation with dispersion").
The value of $m$ tells you the strength of the harmonic restoring force at each point, relative to the strength of neighbor coupling.
Okay, so why "massive" and "massless"? A few reasons.
Look at the dispersion relation $\omega = \sqrt{k^2 + m^2}$.
In quantum mechanics $\omega \sim E$ and $k \sim p$, roughly speaking. Translating, the dispersion relation looks like $E = \sqrt{p^2 + m^2}$ which is the relativistic energy for a particle with rest mass $m$.
Normalized wavepackets have a minimum total energy $m$. (This might not strictly be true but the idea is right. Didn't feel like working out proof. The point is that in Fourier space (at a fixed time) you're summing up energies related to $\omega(k) \geq m$.)
Group velocity of all wavepackets is $c$ (of course $c=1$ here) if $m=0$. If $m>0$ all wavepackets have group velocity less than $c$. In the massive case $m>0$, low energy normalized wavepackets just sit still (all "rest mass" energy, no kinetic energy), whereas very energetic normalized wavepackets move almost at $c$ (high kinetic energy).
When you go quantum, the properties 2 and 3 of classical wavepackets basically translate to the corresponding properties of quantum excitations.
So basically the answer to your second question is: Because the KG dispersion relation corresponds to the relativistic energy equation for a particle of rest mass $m$, and the associated wavepacket dynamics agrees with the analogy as well.
I'm sure there are many more ways to think about this, some mathematically more rigorous, but I think they're all fundamentally related to that basic fact and the properties above.