I have a slightly different perspective from the other two answers which provides a more elementary motivation. Suppose you know nothing about renormalizability or energy-momentum relations and all you know is that a Lagrangian density is a function of fields and their derivatives that transforms as a scalar under PoincarĂ© transformations.

You can motivate the Klein-Gordon equation by asking what is the simplest Lagrangian you can write down for a scalar field that transforms as a scalar and provides a positive-definite Hamiltonian.

Since we're dealing with scalar fields any polynomial function of the fields $\phi$ will satisfy the correct Lorentz transformation property. So you could write down a term like $a\phi+b\phi^2$ with real constants $a$ and $b$. Now we also want to include derivatives $\partial_\mu\phi$. In order to satisfy the correct Lorentz transformation properties we need to contract this with a term $\partial^\mu\phi$.

So the simplest Lagrangian we can write down is $\mathcal{L}=c\partial_\mu\phi\partial^\mu\phi+a\phi+b\phi^2$ from which we obtain a Hamiltonian

$\mathcal{H}=\frac{\pi^2}{4c}+c\partial_i\phi\partial_i\phi-a\phi-b\phi^2$

The $a\phi$ term is not nice since it ruins the positive-definiteness of the Hamiltonian, so set $a=0$. A scalar field and the derivatives both have dimension of $[mass]^2$ and the Lagrangian density has dimension $[mass]^4$, so $c$ should be dimensionless and $b$ should be $b=-m^2$ where $m$ has units of mass and the minus sign is there to make the Hamiltonian positive-definite.

So we've reduced our Hamiltonian to

$\mathcal{H}=\frac{\pi^2}{4c}+c\partial_i\phi\partial_i\phi+m^2\phi^2$

Setting $c=1/2$ and rescaling $m^2\rightarrow m^2/2$ means the coeffecients of all terms are the same.

Hence the Lagrangian densiy is $\mathcal{L}=\frac{1}{2}\partial_\mu\phi\partial^\mu\phi-\frac{1}{2}m^2\phi^2$

*Things you could try and argue against this being the simplest scalar field Lagrangian;

- If the point is simplicity, why not just ignore the derivate terms and write a Lagrangian for a scalar field as $\mathcal{L}=-m^2\phi^2$? Because if you ignore the derivative terms the field equation is $\phi=0$, and who cares about that? Ignoring the derivatives results in a non-dynamical field. So the Klein-Gordon Lagrangian is the simplest you can write down where something actually happens.

Of course, you get a simpler valid Lagrangian by setting $m=0$, but this isn't done as books want to show the energy-momentum relation in a general setting when you quantize the field. However, you can start with the massless case in 5 dimensions and perform dimensional reduction to obtain the massive case in 4 dimensions.

- Why ignore the possibility of field-derivative interaction terms? You can do this, but the goal is simplicity, and the simplest term coupling the field to its derivatives, transforming correctly and yielding a positive definite Hamiltonian is $\phi\phi\partial_\mu\phi\partial^\mu\phi$, which is much more complicated than our other terms.

Think of a two dimensional horizontal elastic sheet in three dimensions. Suppose we specify the height everywhere with $\phi(x,y)$.

Then if the sheet moves up and down there is kinetic energy. This kinetic energy is proportional to $\dot{\phi}^2$ because $\dot{\phi}$ is telling you the velocity of that point on the sheet.

Also, suppose we want to pull one point on the sheet up while keeping a neighboring point fixed. This will cost elastic potential energy. The amount of potential energy is proportional to $(\nabla\phi)^2$.

Thus we get a contribution to the lagrangian of the form $(\partial_t \phi)^2 - (\nabla\phi)^2 = \partial_\mu \phi \partial^\mu \phi$. I googled and found a pdf (titled week 7 lecture: concepts of QFT by some one named Andrew Forrester) which does this in more detail. I haven't read it though, so I can't vouch for it too much. You would only need to read the second and third pages.

## Best Answer

In field theory, it is assumed that action is dimensionless and so is the speed of light ($\hbar=c=1$). So, we can write everything in terms of the mass unit. From the speed of light being unitless, we get, $[L]= [T]$ and from action being unitless we get, $[M]=[L]^{-1}$. In these units, the Lagrangian density has the dimension, $[\mathcal{L}] = [M]^4$. From the mass term in the Lagrangian density, we can find the units of the field $\phi$, $$[\phi] = [M].$$ To check whether the first term is consistent we should have the dimension of the first term as $[M]^4$. $$[\partial_\mu \phi] = \frac{[\phi]}{[x]} = \frac{[M]}{[L]} = [M]^2.$$ Remember that the partial derivative has $\partial x^\mu$ in the denominator which has the dimension of length. So, finally the dimension of the first term is $[(\partial_\mu\phi)^2] = [M]^4$, which is the correct dimension as $[\mathcal{L}] =[M]^4$.

So, there are two misconceptions in your assumptions. $\phi$ is not unitless and the form of Lagrangian density is not written in SI units, it is written in what is called natural units with $\hbar = c = 1$.