I have a slightly different perspective from the other two answers which provides a more elementary motivation. Suppose you know nothing about renormalizability or energy-momentum relations and all you know is that a Lagrangian density is a function of fields and their derivatives that transforms as a scalar under Poincaré transformations.
You can motivate the Klein-Gordon equation by asking what is the simplest Lagrangian you can write down for a scalar field that transforms as a scalar and provides a positive-definite Hamiltonian.
Since we're dealing with scalar fields any polynomial function of the fields $\phi$ will satisfy the correct Lorentz transformation property. So you could write down a term like $a\phi+b\phi^2$ with real constants $a$ and $b$. Now we also want to include derivatives $\partial_\mu\phi$. In order to satisfy the correct Lorentz transformation properties we need to contract this with a term $\partial^\mu\phi$.
So the simplest Lagrangian we can write down is $\mathcal{L}=c\partial_\mu\phi\partial^\mu\phi+a\phi+b\phi^2$ from which we obtain a Hamiltonian
$\mathcal{H}=\frac{\pi^2}{4c}+c\partial_i\phi\partial_i\phi-a\phi-b\phi^2$
The $a\phi$ term is not nice since it ruins the positive-definiteness of the Hamiltonian, so set $a=0$. A scalar field and the derivatives both have dimension of $[mass]^2$ and the Lagrangian density has dimension $[mass]^4$, so $c$ should be dimensionless and $b$ should be $b=-m^2$ where $m$ has units of mass and the minus sign is there to make the Hamiltonian positive-definite.
So we've reduced our Hamiltonian to
$\mathcal{H}=\frac{\pi^2}{4c}+c\partial_i\phi\partial_i\phi+m^2\phi^2$
Setting $c=1/2$ and rescaling $m^2\rightarrow m^2/2$ means the coeffecients of all terms are the same.
Hence the Lagrangian densiy is $\mathcal{L}=\frac{1}{2}\partial_\mu\phi\partial^\mu\phi-\frac{1}{2}m^2\phi^2$
*Things you could try and argue against this being the simplest scalar field Lagrangian;
- If the point is simplicity, why not just ignore the derivate terms and write a Lagrangian for a scalar field as $\mathcal{L}=-m^2\phi^2$? Because if you ignore the derivative terms the field equation is $\phi=0$, and who cares about that? Ignoring the derivatives results in a non-dynamical field. So the Klein-Gordon Lagrangian is the simplest you can write down where something actually happens.
Of course, you get a simpler valid Lagrangian by setting $m=0$, but this isn't done as books want to show the energy-momentum relation in a general setting when you quantize the field. However, you can start with the massless case in 5 dimensions and perform dimensional reduction to obtain the massive case in 4 dimensions.
- Why ignore the possibility of field-derivative interaction terms? You can do this, but the goal is simplicity, and the simplest term coupling the field to its derivatives, transforming correctly and yielding a positive definite Hamiltonian is $\phi\phi\partial_\mu\phi\partial^\mu\phi$, which is much more complicated than our other terms.
Best Answer
This is not the action, the action $S$ is the time integral of the lagrangian $L$, i.e.
$$S=\int L dt $$
The equations of motion for the field $\phi$ is given by the Euler-Lagrange equations for fields (summation over $\mu$ is implicit)
$$\partial_{\mu}\left(\frac{\partial \mathcal{L}}{\partial(\partial_{\mu}\phi)}\right)-\frac{\partial \mathcal{L}}{\partial \phi}=0 $$
which in this case gives the Klein-Gordon equation
$$(\partial_{\mu}\partial^{\mu}+m^2)\phi=0 $$