I think Hawking, Ellis "The Large Scale Structure of Space-Time" 3.4 would be an interesting read for you.
From this book:
With Ricciscalar $R$, cosmological constant $\lambda$ and matter Lagrangian $L_m$
$$
I=\int_M (A (R - 2 \lambda ) + L_m)
$$
One might ask whether varying an action derived from some other
scalar combination of the metric and curvature tensors might not give
a reasonable alternative set of equations. However the curvature scalar
is the only such scalar linear in second derivatives of the metric tensor;
so only in this case can one transform away a surface integral and be
left with an equation involving only second derivatives of the metric.
If one tried any other scalar such as $R_{ab}R^{ab}$ or $R_{abcd}R^{abcd}$ one would
obtain an equation involving fourth derivatives of the metric tensor.
This would seem objectionable, as all other equations of physics are
first or second order. If the field equations were fourth order, it would
be necessary to specify not only the initial values of the metric and its
first derivatives, but also the second and third derivatives, in order to
determine the evolution of the metric.
Another nice read is Carroll "Spacetime and Geometry: An Introduction to General Relativity" 4.8. Alternative Theories. There is an shorter version online for free in Chapter 4 here
Once the geometric context of gravity is clear (which is really kind of independent of the specific field equations), one can reason as follows.
The field equation must reduce to the field equation of Newtonian gravity, which is $$ \nabla^2\phi=\kappa\rho, $$ where $\rho$ is the density of the source ($\kappa$ is an irrelevant constant).
We know from special relativity, that mass/energy density is by itself not a scalar quantity, but (in $c=1$ units) is the $00$ component of the stress-energy tensor, $\rho=T^{00}$.
If we want a geometric description and the equivalence principle, then particles moving under the effects of solely gravity must obey the geodesic equation, which is $$ \frac{d^2\gamma^\mu}{d\tau^2}=-\Gamma^\mu_{\kappa\lambda}\frac{d\gamma^\kappa}{d\tau}\frac{d\gamma^\lambda}{d\tau}. $$
In the nearly flat spacetime ($g_{\mu\nu}=\eta_{\mu\nu}+h_{\mu\nu}$) and slow particle ($d\gamma^0/d\tau$ dominates over the spatial components) we get $$ \frac{d^2\gamma^\mu}{d\tau^2}=-\Gamma^\mu_{00}\left(\frac{d\gamma^0}{d\tau}\right)^2, $$ where in this limit the connection is $\Gamma^\mu_{00}=\frac{1}{2}\eta^{\mu\rho}(2\partial_0h_{0\rho}-\partial_\rho h_{00})$. We further assume that the metric doesn't change in time (Newtonian gravitation has no time evolution), then we get $$ \frac{d^2\gamma^\mu}{d\tau^2}=\frac{1}{2}\left(\frac{d\gamma^0}{d\tau}\right)^2\partial^\mu h_{00}. $$
Due to staticness, the 0 component of this equation is just $d^2\gamma^0/d\tau^2=0$, which means that $d\gamma^0/d\tau$ is constant, so we can divide by its square (and use that $\gamma^0$ is basically the $t$ coordinate time), we get $$ \frac{d^2\gamma^i}{dt^2}=\frac{1}{2}\partial_i h_{00}, $$ which is the equation of motion in a Newtonian gravitational field if $h_{00}=-2\phi$.
So going back to the Poisson equation $\nabla^2\phi=\kappa\rho$, we know that $\rho=T^{00}$, and $\phi$ is related to $h_{00}$ and as such to $g_{00}$, and second derivatives of $\phi$ appear, so the Newtonian equation has the form $$ K^{00}=\kappa T^{00}, $$ but we want a tensorial equation, so the full gravitational equation should be $$ K^{\mu\nu}=\kappa T^{\mu\nu}, $$ where $K^{\mu\nu}$ is some kind of tensor field constructed out of the metric's second derivatives.
But we know from differential geometry, that all tensor fields depending on the metric's second derivatives are derived from $ R^\kappa_{\ \lambda\mu\nu} $, the curvature tensor.
There are not many second-rank tensors that can be made of this, one of them is $R^{\mu\nu}$ the Ricci tensor. Here we could also look at the Ricci tensor's Newtonian limit to see how it is related to $h_{00}$ but I don't want to do that here.
Unfortunately $K^{\mu\nu}$ cannot be proportional to $R^{\mu\nu}$ because the stress energy tensor must satisfy $\nabla_\nu T^{\mu\nu}=0$, but we have $$ \nabla_\nu R^{\mu\nu}=\frac{1}{2}\nabla^\mu R $$ from the Bianchi identity, so by rearranging we get $$ \nabla_\nu R^{\mu\nu}=\frac{1}{2}g^{\mu\nu}\nabla_\nu R \\ \nabla_\nu\left(R^{\mu\nu}-\frac{1}{2}g^{\mu\nu}R\right)=\nabla_\nu G^{\mu\nu}=0, $$ eg. the Einstein tensor $G^{\mu\nu}$ is automatically divergenceless. So we should have $$ G^{\mu\nu}\sim T^{\mu\nu}, $$ which is the Einstein field equation.
Best Answer
First, we notice that the paths traced by particles through spacetime under the influence of a gravitational field seem to depend only on their positions and velocities, i.e. they are independent of any identifiable charge or composition of the particles. It is almost as if the particles were moving along tracks carved into some curved surface.
Although even Galileo noticed this, we happen to be passingly familiar with differential geometry to the point where we can actually make something of it. We realize that, if this observation really held, it would be possible to describe the curvature mathematically by assigning to spacetime a nonzero Riemann curvature tensor $R^a_{bcd}$. This object describes the rotation of a vector when dragged along a closed loop through the spacetime.
What should we couple $R^a_{bcd}$ to? Well, gravity is obviously sourced by mass somehow, but we know because we invented special relativity that mass and energy are intimately connected. We also know we can uniquely describe the energies and pressures of classical matter fields in a nicely coordinate-independent way by their stress-energy tensor $T_{ab}$. So, making the leap that perhaps pressure can source gravity as well, we write down
$$R^a_{bcd} = -\kappa T_{ab}.$$
Unfortunately this equation makes no sense because the RHS and LHS have a different number of indices. No matter - we can just contract some of the indices on the left hand side. This gives us the Ricci tensor $$R_{ab} \equiv R^j_{ajb}$$ so we write $$R_{ab} = -\kappa T_{ab}.$$
This equation predicts the perihelion precession of Mercury, so we are very happy. Unfortunately it has a rather serious flaw: $T_{ab}$ had better have identically zero divergence if energy is at least locally conserved - i.e. if energy does not simply appear out of nowhere in small regions of spacetime. But $R_{ab}$ unfortunately can have nonzero divergence.
Again, no problem - we can simply subtract the divergence: $$R_{ab} - \frac{1}{2}g_{ab}R = -\kappa T_{ab}$$ where $$R\equiv R^a_a$$ is the so-called Ricci scalar. Actually this is a bit restrictive: the divergence is determined only up to an arbitrary constant $\Lambda$. Thus: $$R_{ab} + \left(\Lambda - \frac{1}{2}R\right)g_{ab} = -\kappa T_{ab}.$$
What about the free parameter $\kappa$? This we can't fix with the theory, so it will have to be measured, and then set to enforce agreement with experiment. In particular, in the limit where the curvature is very weak, we should be able to reproduce Newtonian gravity. It turns out to do this we have to set $\kappa = 8\pi$, so that we end up with
$$R_{ab} + \left(\Lambda - \frac{1}{2}R\right)g_{ab} = -8\pi T_{ab}.$$