If at every time $t$, $\phi(\mathbf{x},t)$ is a nice enough function that it has a Fourier transform, then $$\phi(\mathbf{x},t)=\int_{-\infty}^{\infty}\frac{d^{3}k}{(2\pi)^{3}}\widetilde{\phi}(\mathbf{k},t)e^{i\mathbf{k}\cdot\mathbf{x}},$$ where $\widetilde{\phi}(\mathbf{k},t)$ is just the Fourier coefficient at that time $t.$
But now you ask that the whole function $\phi(\mathbf{x},t)$ (at every time) be a solution to the Klein-Gordon equation. Which means the function at different times need to have the right derivatives. If at every time there is a Fourier transform, then there are Fourier coefficients at every time. So if the wave evolves a certain way in time, then the Fourier coefficients need have values at different times need to be just right to let he wave have the right temporal derivatives.
OK, so the Klein-Gordon equation is second order so we can find the initial function $\phi(\mathbf{x},t=0)$ and its Fourier transform, call it $\theta(\mathbf{k})$ and we can take the initial temporal derivative $\partial_t \phi(\mathbf{x},t)\big\vert_{t=0}$ and it's Fourier transform, call it $\omega(\mathbf{k}).$ Then we know the initial Fourier coefficients and we know their derivatives and the second derivative is enforced to make it satisfy the Klein-Gordon equation, so
$\widetilde{\phi}(\mathbf{k},t)=\theta(\mathbf{k})\cos(E_kt)
+ \frac{1}{E_k}\omega(\mathbf{k})\sin(E_kt).$
Why? Because it has the right initial values and the right initial temporal derivative and it satisfies the Klein-Gordon equation when $E_k=\sqrt{m^2+k^2}.$
So at each time we have Fourier coefficients of our wave. They are designed so that the functions satisfies the Klein-Gordon equation when the Fourier coefficients evolve in time according to a second order equation.
I may have misunderstood your question. But the idea is that from the initial wave and the initial temporal derivative of the wave you get enough initial conditions to know the initial Fourier coefficients and the initial time rate of change of the Fourier coefficients which is all the freedom you have. The rest is determined that the Fourier coefficients have to evolve temporally a certain way so the wave evolves a certain way temporally.
edit to respond to comments
At each time you get a Fourier transform. And you then ask yourself how those Fourier coefficients depend (in time) on each other. In order for the wave to evolve in time by a second order equation, the Fourier transform needs to evolve in time by a second order equation. But the transform gets to do a pointwise evolution that is simpler and that's why we do it.
When you take the Fourier transform at each time, you get a Fourier transform at each time, so you get a $\widetilde{\phi}(\mathbf{k},t)$ that is itself a function of time. So it has a partial derivative $\partial_t \widetilde{\phi}(\mathbf{k},t)\neq 0.$
edit to further respond to comments
Let's say we want to solve $$\left(\partial_t^2-\nabla^2+m^2\right)\phi=.0$$ We start by noting that $\left(A_ke^{i\mathbf{k}\cdot\mathbf{x}}+B_ke^{-i\mathbf{k}\cdot\mathbf{x}}\right)\cos(t\sqrt{m^2+k^2})$ and $\frac{\left(C_ke^{i\mathbf{k}\cdot\mathbf{x}}+D_ke^{-i\mathbf{k}\cdot\mathbf{x}}\right)}{\sqrt{m^2+k^2}}\sin(t\sqrt{m^2+k^2})$ are both solutions for any $\mathbf{k},$ any $A_k, B_k$ and any $C_k, D_k$.
Great.
And the first solution has zero time rate of change at $t=0$ and the second has zero value at $t=0.$ So if our initial wave was $\phi(\mathbf{x},t=0)=3\cos(3x)$ and had an initial time rate of change equal to $\partial_t\phi(\mathbf{x},t)\big\vert_{t=0}=4\cos(2y)$ then we know exactly what the solution to $\left(\partial_t^2-\nabla^2+m^2\right)\phi=0$ is: $$\phi(\mathbf{x},t)=3\cos(3x)\cos(t\sqrt{m^2+3^2})+\frac{4}{\sqrt{m^2+2^2}}\cos(2y)\sin(t\sqrt{m^2+2^2}).$$
And any finite linear combination of $e^{\pm i\mathbf k \cdot \mathbf x}$ for the initial value $\phi(\mathbf{x},t=0)$ is as easily manageable by having a finite linear combination of solutions like $e^{\pm i\mathbf{k}\cdot\mathbf{x}}\cos(t\sqrt{m^2+k^2}).$ And similarly if the initial time rate of change $\partial_t\phi(\mathbf{x},t)\big\vert_{t=0}$ is a finite linear combination of $e^{\pm i\mathbf k \cdot \mathbf x}$ then we add terms that are a finite linear combination of $\frac{e^{\pm i\mathbf{k}\cdot\mathbf{x}}}{\sqrt{m^2+k^2}}\sin(t\sqrt{m^2+k^2}).$
All we are doing is taking solutions and adding them up in combinations that give us the right initial values and right initial time rate of changes. And it is super easy if the initial conditions happen to be a finite linear combination of sines and cosines.
But wait. What if instead of being a finite linear combination of terms like $e^{\pm i\mathbf k \cdot \mathbf x}$ the initial condition was just a function that has a Fourier Transform? Then you can try the same thing. Write your initial value $\phi(\mathbf{x},t=0)$ and your initial time rate of change $\partial_t\phi(\mathbf{x},t)\big\vert_{t=0}$ as inverse Fourier transforms of sines and cosine spatially. Then replace a spatial $\sin(\mathbf k \cdot \mathbf x)$ with the function $\sin(\mathbf k \cdot \mathbf x)\cos(t\sqrt{m^2+k^2})$ and replace the spatial function $\cos(\mathbf k \cdot \mathbf x)$ with the function $\cos(\mathbf k \cdot \mathbf x)\cos(t\sqrt{m^2+k^2}).$ Why? Because each of those satisfies the Klein-Gordon equation. And so for $t=0$ the inverse Fourier transform of those will be the initial value of the wave $\phi(\mathbf{x},t=0).$ So you are taking a $t=0$ spatial Fourier transform of $\phi(\mathbf{x},t=0)$ then replacing every spatial Fourier component $e^{\pm i \mathbf k \cdot \mathbf x}$ with $e^{\pm i \mathbf k \cdot \mathbf x}\cos(t\sqrt{m^2+k^2}).$ This solves the Klein-Gordon equation, has the right initial values and has $\partial_t\phi(\mathbf{x},t)\big\vert_{t=0}=0.$
Next, take your initial time rate of change $\partial_t\phi(\mathbf{x},t)\big\vert_{t=0}$ as an inverse Fourier transforms of sines and cosine spatially. Then replace a spatial $\sin(\mathbf k \cdot \mathbf x)$ with the function $\frac{\sin(\mathbf k \cdot \mathbf x)}{\sqrt{m^2+k^2}}\sin(t\sqrt{m^2+k^2})$ and replace the spatial function $\cos(\mathbf k \cdot \mathbf x)$ with the function $\frac{\cos(\mathbf k \cdot \mathbf x)}{\sqrt{m^2+k^2}}\sin(t\sqrt{m^2+k^2}).$ Why? Because each of those satisfies the Klein-Gordon equation. And yet for $t=0$ if you take the inverse spatial Fourier transform you get a function that has value zero at $t=0$ and has an initial time rate of change that equals $\partial_t\phi(\mathbf{x},t)\big\vert_{t=0}$ of the wave $\phi(\mathbf{x},t=0).$
Why have two solutions? Because this second one also solves the Klein-Gordon equation and has an initial value of zero but has an initial time rate of change that equals $\partial_t\phi(\mathbf{x},t)\big\vert_{t=0}.$ And the first one solves the Klein-Gordon equation, has an initial value that equals $\phi(\mathbf{x},t=0)$ and has a zero initial time rate of change.
So if you add those two solutions together you get a function that (1) solves the Klein-Gordon equation (2) has the right initial value and (3) has the right initial time rate of change. That's what you wanted all along.
If you understand that when the initial wave was $\phi(\mathbf{x},t=0)=3\cos(3x)$ and had an initial time rate of change equal to $\partial_t\phi(\mathbf{x},t)\big\vert_{t=0}=4\cos(2y)$ then he solution was $\left(\partial_t^2-\nabla^2+m^2\right)\phi=0$ is: $$\phi(\mathbf{x},t)=3\cos(3x)\cos(t\sqrt{m^2+3^2})+\frac{4}{\sqrt{m^2+2^2}}\cos(2y)\sin(t\sqrt{m^2+2^2}).$$ If you understand that, then everything else is the same idea applied to finite and "infinite" linear combinations of $e^{\pm i \mathbf k \cdot \mathbf x}$ for the initial conditions.
This was (is) one of my biggest bugbears whilst learning QFT. The reasoning behind using a Fock space is actually really simple and intuitive for a scalar field (provided you are comfortable with standard QM) but it's always masked by the horrible concept of 'canonical quantisation'.
Take the Fourier transform of the Klein-Gordon equation:
$$
\partial_t^2\hat{\phi} = -\omega_{k}^2\hat{\phi}
$$
this is the classical equation of motion for a harmonic oscillator. There's one of these for every possible momentum (this corresponds to $\phi$ being a field over space transforming into $\hat{\phi}$ which is a field over momentum).
It's harder to transform the Lagrangian (since it includes terms like $\left(\frac{\partial\phi}{\partial t}\right)^2$), but one can assume it similarly describes $\hat{\phi}$ as an infinite spectrum of harmonic oscillators. Given this, it's reasonable to assume that the quantum Lagrangian/Hamiltonian for $\hat{\phi}$ similarly corresponds to an infinite spectrum of quantum harmonic oscillators. This we do know the form of:
$$
H = \int \frac{d^{3}p}{2E_{p}}a_{p}^{\dagger}a_{p},
$$
where I've dropped the zero point energy because you can (it corresponds to reordering the fields, which is an ambiguity that exists in going to the non-commuting quantum case from the commuting classical case) and it avoids the usual infinite-energy problems.
Now we just claim that $\phi$ has the same quantum Lagrangian as in the classical case and that the above Hamiltonian is the Fourier transform (of the Legendre transform) of the Lagrangian. If you work through you find that you get out the canonical form of the field. David Tong does this on page 24 of his notes, though he does it by essentially proposing the canonical form as an ansatz.
Then you just use your infinite set of annihilation and creation operators that arose naturally to generate the infinite set of QHO number states (one for each momentum). This is identical to the Fock space generated by the momentum operator, so you just treat it as a Fock space.
Best Answer
Hint:
1. $\phi(x,t)$ at different times are not independent.
2. $\int{d^4p\delta(p^2-m^2)}=\int{d^4p\frac{\delta(p^0-E_p)}{2p^0}}$. The left side of this equation is Lorentz invariant.
This time your question is much clearer.
If $\phi(x)$ is an arbitrary function of $x$, there's nothing confusing. If $\phi(x)$ is constrained by the Klein-Gordon equation, we have
$0=(\square+m^2)\phi(x)=\int{\frac{dp^4}{(2\pi)^4}(m^2-p^2)\phi(p)e^{-ip\cdot x}}$.
Since $e^{-ip\cdot x}$s are linearly independent, $\phi(p)$ must vanish everywhere except on the mass shell $p^2=m^2$. Then the most general form of $\phi(p)$ should be
$\phi(p)=\frac{2\pi}{\sqrt{2E_{\mathbf p}}}[\delta(p^0-E_{\mathbf p})a_{\mathbf p}+\delta(p^0+E_{\mathbf p})b_{\mathbf{-p}}^{\dagger}]$ .
Thus
$\phi(x)=\int{\frac{dp^4}{(2\pi)^4}\phi(p)e^{-ip\cdot x}}=\int{\frac{d\mathbf p^3}{(2\pi)^3}\frac{1}{\sqrt{2E_{\mathbf p}}}[a_{\mathbf p}e^{-iE_{\mathbf p}t}+b_{\mathbf{-p}}^{\dagger}e^{iE_{\mathbf p}t}]e^{i\mathbf{p\cdot x}}}=\int{\frac{d\mathbf p^3}{(2\pi)^3}\frac{1}{\sqrt{2E_{\mathbf p}}}[a_{\mathbf p}e^{-ip\cdot x}+b_{\mathbf{p}}^{\dagger}e^{ip\cdot x}]}$.
Obviously this is just the last equation in your question.
Then the inverse Fourier transforms are
$\phi(p)=\int{d^4x\phi(x)e^{ip\cdot x}}$,
and
$\phi(\mathbf p,t)\equiv \frac{1}{\sqrt{2E_{\mathbf p}}}[a_{\mathbf p}e^{-iE_{\mathbf p}t}+b_{\mathbf{-p}}^{\dagger}e^{iE_{\mathbf p}t}]=\int{d^3\mathbf x\phi(x)e^{-i\mathbf{p\cdot x}}}$.
Due to the limitation fo the length of characters, I add the comments below.
The first identity in the last line is the definition of $\phi(\mathbf p,t)$. The second identity in it is the inverse 3-dimensional Fourier transform of $\phi(x)=\int{\frac{d\mathbf p^3}{(2\pi)^3}\frac{1}{\sqrt{2E_{\mathbf p}}}[a_{\mathbf p}e^{-iE_{\mathbf p}t}+b_{\mathbf{-p}}^{\dagger}e^{iE_{\mathbf p}t}]e^{i\mathbf{p\cdot x}}}$. Direct comparison of $\phi(\mathbf p,t)$ and the general form of $\phi(p)$ shows that $\phi(p)$ contains aditional delta functions, while $\phi(\mathbf p,t)$ is free of delta functions. Beides, since $\phi(p)$ is the 4-dimensional Fourier transform of $\phi(x)$, it is not a function of $t$. I don't think that $\phi(p)$ can be understood as "a particle whith 4-momentum $p$". It onlly make sense mathematically. The square root is just a matter of convention which can be absorbed by $a_{\mathbf p}$ and $b_{\mathbf p}$ (see, Peskin p21).