If at every time $t$, $\phi(\mathbf{x},t)$ is a nice enough function that it has a Fourier transform, then $$\phi(\mathbf{x},t)=\int_{-\infty}^{\infty}\frac{d^{3}k}{(2\pi)^{3}}\widetilde{\phi}(\mathbf{k},t)e^{i\mathbf{k}\cdot\mathbf{x}},$$ where $\widetilde{\phi}(\mathbf{k},t)$ is just the Fourier coefficient at that time $t.$
But now you ask that the whole function $\phi(\mathbf{x},t)$ (at every time) be a solution to the Klein-Gordon equation. Which means the function at different times need to have the right derivatives. If at every time there is a Fourier transform, then there are Fourier coefficients at every time. So if the wave evolves a certain way in time, then the Fourier coefficients need have values at different times need to be just right to let he wave have the right temporal derivatives.
OK, so the Klein-Gordon equation is second order so we can find the initial function $\phi(\mathbf{x},t=0)$ and its Fourier transform, call it $\theta(\mathbf{k})$ and we can take the initial temporal derivative $\partial_t \phi(\mathbf{x},t)\big\vert_{t=0}$ and it's Fourier transform, call it $\omega(\mathbf{k}).$ Then we know the initial Fourier coefficients and we know their derivatives and the second derivative is enforced to make it satisfy the Klein-Gordon equation, so
$\widetilde{\phi}(\mathbf{k},t)=\theta(\mathbf{k})\cos(E_kt)
+ \frac{1}{E_k}\omega(\mathbf{k})\sin(E_kt).$
Why? Because it has the right initial values and the right initial temporal derivative and it satisfies the Klein-Gordon equation when $E_k=\sqrt{m^2+k^2}.$
So at each time we have Fourier coefficients of our wave. They are designed so that the functions satisfies the Klein-Gordon equation when the Fourier coefficients evolve in time according to a second order equation.
I may have misunderstood your question. But the idea is that from the initial wave and the initial temporal derivative of the wave you get enough initial conditions to know the initial Fourier coefficients and the initial time rate of change of the Fourier coefficients which is all the freedom you have. The rest is determined that the Fourier coefficients have to evolve temporally a certain way so the wave evolves a certain way temporally.
edit to respond to comments
At each time you get a Fourier transform. And you then ask yourself how those Fourier coefficients depend (in time) on each other. In order for the wave to evolve in time by a second order equation, the Fourier transform needs to evolve in time by a second order equation. But the transform gets to do a pointwise evolution that is simpler and that's why we do it.
When you take the Fourier transform at each time, you get a Fourier transform at each time, so you get a $\widetilde{\phi}(\mathbf{k},t)$ that is itself a function of time. So it has a partial derivative $\partial_t \widetilde{\phi}(\mathbf{k},t)\neq 0.$
edit to further respond to comments
Let's say we want to solve $$\left(\partial_t^2-\nabla^2+m^2\right)\phi=.0$$ We start by noting that $\left(A_ke^{i\mathbf{k}\cdot\mathbf{x}}+B_ke^{-i\mathbf{k}\cdot\mathbf{x}}\right)\cos(t\sqrt{m^2+k^2})$ and $\frac{\left(C_ke^{i\mathbf{k}\cdot\mathbf{x}}+D_ke^{-i\mathbf{k}\cdot\mathbf{x}}\right)}{\sqrt{m^2+k^2}}\sin(t\sqrt{m^2+k^2})$ are both solutions for any $\mathbf{k},$ any $A_k, B_k$ and any $C_k, D_k$.
Great.
And the first solution has zero time rate of change at $t=0$ and the second has zero value at $t=0.$ So if our initial wave was $\phi(\mathbf{x},t=0)=3\cos(3x)$ and had an initial time rate of change equal to $\partial_t\phi(\mathbf{x},t)\big\vert_{t=0}=4\cos(2y)$ then we know exactly what the solution to $\left(\partial_t^2-\nabla^2+m^2\right)\phi=0$ is: $$\phi(\mathbf{x},t)=3\cos(3x)\cos(t\sqrt{m^2+3^2})+\frac{4}{\sqrt{m^2+2^2}}\cos(2y)\sin(t\sqrt{m^2+2^2}).$$
And any finite linear combination of $e^{\pm i\mathbf k \cdot \mathbf x}$ for the initial value $\phi(\mathbf{x},t=0)$ is as easily manageable by having a finite linear combination of solutions like $e^{\pm i\mathbf{k}\cdot\mathbf{x}}\cos(t\sqrt{m^2+k^2}).$ And similarly if the initial time rate of change $\partial_t\phi(\mathbf{x},t)\big\vert_{t=0}$ is a finite linear combination of $e^{\pm i\mathbf k \cdot \mathbf x}$ then we add terms that are a finite linear combination of $\frac{e^{\pm i\mathbf{k}\cdot\mathbf{x}}}{\sqrt{m^2+k^2}}\sin(t\sqrt{m^2+k^2}).$
All we are doing is taking solutions and adding them up in combinations that give us the right initial values and right initial time rate of changes. And it is super easy if the initial conditions happen to be a finite linear combination of sines and cosines.
But wait. What if instead of being a finite linear combination of terms like $e^{\pm i\mathbf k \cdot \mathbf x}$ the initial condition was just a function that has a Fourier Transform? Then you can try the same thing. Write your initial value $\phi(\mathbf{x},t=0)$ and your initial time rate of change $\partial_t\phi(\mathbf{x},t)\big\vert_{t=0}$ as inverse Fourier transforms of sines and cosine spatially. Then replace a spatial $\sin(\mathbf k \cdot \mathbf x)$ with the function $\sin(\mathbf k \cdot \mathbf x)\cos(t\sqrt{m^2+k^2})$ and replace the spatial function $\cos(\mathbf k \cdot \mathbf x)$ with the function $\cos(\mathbf k \cdot \mathbf x)\cos(t\sqrt{m^2+k^2}).$ Why? Because each of those satisfies the Klein-Gordon equation. And so for $t=0$ the inverse Fourier transform of those will be the initial value of the wave $\phi(\mathbf{x},t=0).$ So you are taking a $t=0$ spatial Fourier transform of $\phi(\mathbf{x},t=0)$ then replacing every spatial Fourier component $e^{\pm i \mathbf k \cdot \mathbf x}$ with $e^{\pm i \mathbf k \cdot \mathbf x}\cos(t\sqrt{m^2+k^2}).$ This solves the Klein-Gordon equation, has the right initial values and has $\partial_t\phi(\mathbf{x},t)\big\vert_{t=0}=0.$
Next, take your initial time rate of change $\partial_t\phi(\mathbf{x},t)\big\vert_{t=0}$ as an inverse Fourier transforms of sines and cosine spatially. Then replace a spatial $\sin(\mathbf k \cdot \mathbf x)$ with the function $\frac{\sin(\mathbf k \cdot \mathbf x)}{\sqrt{m^2+k^2}}\sin(t\sqrt{m^2+k^2})$ and replace the spatial function $\cos(\mathbf k \cdot \mathbf x)$ with the function $\frac{\cos(\mathbf k \cdot \mathbf x)}{\sqrt{m^2+k^2}}\sin(t\sqrt{m^2+k^2}).$ Why? Because each of those satisfies the Klein-Gordon equation. And yet for $t=0$ if you take the inverse spatial Fourier transform you get a function that has value zero at $t=0$ and has an initial time rate of change that equals $\partial_t\phi(\mathbf{x},t)\big\vert_{t=0}$ of the wave $\phi(\mathbf{x},t=0).$
Why have two solutions? Because this second one also solves the Klein-Gordon equation and has an initial value of zero but has an initial time rate of change that equals $\partial_t\phi(\mathbf{x},t)\big\vert_{t=0}.$ And the first one solves the Klein-Gordon equation, has an initial value that equals $\phi(\mathbf{x},t=0)$ and has a zero initial time rate of change.
So if you add those two solutions together you get a function that (1) solves the Klein-Gordon equation (2) has the right initial value and (3) has the right initial time rate of change. That's what you wanted all along.
If you understand that when the initial wave was $\phi(\mathbf{x},t=0)=3\cos(3x)$ and had an initial time rate of change equal to $\partial_t\phi(\mathbf{x},t)\big\vert_{t=0}=4\cos(2y)$ then he solution was $\left(\partial_t^2-\nabla^2+m^2\right)\phi=0$ is: $$\phi(\mathbf{x},t)=3\cos(3x)\cos(t\sqrt{m^2+3^2})+\frac{4}{\sqrt{m^2+2^2}}\cos(2y)\sin(t\sqrt{m^2+2^2}).$$ If you understand that, then everything else is the same idea applied to finite and "infinite" linear combinations of $e^{\pm i \mathbf k \cdot \mathbf x}$ for the initial conditions.
Once you start thinking about relativity, gauge fields, qft, etc, it's easy to forget that the massless KG equation is actually just a fancy name for one of the simplest and most common equations in physics:
$$ (\partial_t^2 - \partial_x^2) \, \varphi = 0 , $$
the wave equation!
The most familiar example is waves on a string. Here's the answer in that context:
$$ (\partial_t^2 - \partial_x^2 + m^2) \, \varphi = 0 $$
With $m=0$ you are talking about waves on a string, where each little string segment is coupled only to its neighbors. (We call this the "wave equation".)
With $m\neq 0$ each little string segment has a harmonic restoring force back to its equilibrium displacement, in addition to neighbor coupling. (I'd call this the "wave equation with dispersion").
The value of $m$ tells you the strength of the harmonic restoring force at each point, relative to the strength of neighbor coupling.
Okay, so why "massive" and "massless"? A few reasons.
Look at the dispersion relation $\omega = \sqrt{k^2 + m^2}$.
In quantum mechanics $\omega \sim E$ and $k \sim p$, roughly speaking. Translating, the dispersion relation looks like $E = \sqrt{p^2 + m^2}$ which is the relativistic energy for a particle with rest mass $m$.
Normalized wavepackets have a minimum total energy $m$. (This might not strictly be true but the idea is right. Didn't feel like working out proof. The point is that in Fourier space (at a fixed time) you're summing up energies related to $\omega(k) \geq m$.)
Group velocity of all wavepackets is $c$ (of course $c=1$ here) if $m=0$. If $m>0$ all wavepackets have group velocity less than $c$. In the massive case $m>0$, low energy normalized wavepackets just sit still (all "rest mass" energy, no kinetic energy), whereas very energetic normalized wavepackets move almost at $c$ (high kinetic energy).
When you go quantum, the properties 2 and 3 of classical wavepackets basically translate to the corresponding properties of quantum excitations.
So basically the answer to your second question is: Because the KG dispersion relation corresponds to the relativistic energy equation for a particle of rest mass $m$, and the associated wavepacket dynamics agrees with the analogy as well.
I'm sure there are many more ways to think about this, some mathematically more rigorous, but I think they're all fundamentally related to that basic fact and the properties above.
Best Answer
It is easy actually. To be precise $\Sigma$ is a smooth spacelike Cauchy surface of the spacetime and the considered space $V$ of solutions of the KG equations is made of solutions smooth and with compactly supported Cauchy data on $\Sigma$. Under these hypotheses, the relation between Cauchy data on $\Sigma$, $(f|_\Sigma, n_\Sigma \cdot\nabla f|_\Sigma)$ and corresponding solutions $f$ of the KG equation is one-to-one. In other words the Cauchy problem is well posed.
If $(f,h)=0$ for every $h$ then both $f$ and its derivative normal to $\Sigma$ are zero. This easily follows from the very form of the simplectic form and from the fact that we can choose the Cauchy data of $h$ arbitrarily and there is a corresponding $h$.
Hence, again in view of the well posedness of the Cauchy problem, $f$ must be the zero solution of the KG equation.