If this were computer science, we might say $\psi$ takes a $d$-tuple of reals ($r$) and another real ($t$) and returns a complex number with the attached unit of $L^{-d/2}$ in $d$ dimensions (with $L$ being the unit of length).1
If you want any more of an interpretation, well then you've already given it: $\psi(r,t)$ is the thing such that $\int_R\ \lvert \psi(r,t) \rvert^2\ \mathrm{d}V$ is the probability of the particle being observed in the region $R$ at time $t$. You can loosely think of it as a "square root" of a probability distribution.
The reason the "square root" interpretation is not quite right, and probably the reason you aren't satisfied with the $\int_R\ \lvert \psi(r,t) \rvert^2\ \mathrm{d}V$ definition, is that any particular instance of $\psi(r,t)$ carries extraneous information beyond what is needed to fully specify the physics. In particular, if we have $\psi_1$ describing a situation, then the wavefunction defined by $\psi_2(r,t) = \mathrm{e}^{i\phi} \psi_1(r,t)$ gives identical physics for any real phase $\phi$.
So the return value of the wavefunction itself is not a physical observable -- one always takes a square magnitude or does some other such thing that projects many mathematically distinct functions onto the same physical state. Even once you've taken the square magnitude, $\lvert \psi(r,t) \rvert^2$ arguably isn't directly observable, as all we can measure is $\int_R\ \lvert \psi(r,t) \rvert^2\ \mathrm{d}V$ (though admittedly for arbitrary regions $R$).
1You can check that $-d/2$ is necessarily the exponent. We need some unit such that squaring it and multiplying by the $d$-dimensional volume becomes a probability (i.e. is unitless). That is, we are solving $X^2 L^d = 1$, from which we conclude $X = L^{-d/2}$.
The main reason is that the probability density, constructed as you indicated, leads to a CONSERVED overall probability over all space.
In most QM books it is shown that the modulus square of the wave function, integrated over all space, does not depend on time. It can then be normalized to 1. The derivation of the conservation of the probability over all space being conserved is from the Schrodinger equation. I've not used Griffith but if it's a standard book it should have that derivation. The 'trick' is to find a current j in terms of the wave function and its complex conjugate such that the divergence of j plus the partial of the probability density with time is zero. By Gauss's theorem then the derivative wrt time of the integral of the density over space is zero.
You can then use the prob. density to get the average (I.e., expectation) values of dynamical valuable, or observables, like position and momentum, and get the Newton equations for those expectation values, and appropriate conserved quantities.
You can in more advanced books get the current density j and probability density from the Lagrangian for the quantum theory (Schrodinger or later relativistic quantum field theories) In a continuity of probability equation that conserves the probability integrated over the appropriate space.
There are also hysical reasons such as the similarity with the energy density in electromagnetism. It is also the the wave variables, E and B, squared, and then the energy is conserved when integrated over space.
The square root of the modulus square has been shown to not obey a conservation equation, so that was ruled out.
You can see a little of the math and essentially the same answer, stated with some differences, at Born Interpretation of Wave Function
Best Answer
Well there are multiple reasons, but a very important one is that it can be proven (from the Schrödinger equation) that
$$\frac{\mathrm d}{\mathrm dt}\int \mathrm d\boldsymbol x\ |\psi(\boldsymbol x,t)|^2=0$$
so that, if at any moment in time we have $\int \mathrm d\boldsymbol x\ |\psi(\boldsymbol x,t)|^2=1$, this will remain true at any other time.
On the other hand, the derivative of the integral of $|\psi|$ is not time independent, so a consistent normalization is not possible
We need that the integral to be time independent, because otherwise a probabilistic interpretation wouldn't be possible. We need that the probability of finding the particle somewhere has to be $1$. If we used $|\psi|$ as a probability distribution, and at any point in time had we that the integral equals $1$, this will change over time, what wouldn't make any sense. On the other hand, as I already stated, we can think of $|\psi|^2$ as a probability just because its integral is time-independent. So if at any point in time we have that the integral of $|\psi|^2$ equals $1$, this will remain true at any other point in time.
Also, this has nothing to do with the Schrödinger equation being of 2nd order: Dirac equation is a 1st order equation and (in some sense), the probability distribution is still $\psi^\dagger\psi$.
Edit: there is another explanation that might be more "physical", closer to our intuition. You probably know about the double slit-experiment, a standard way of introducing QM. When learning about such experiment, we are given two scenarios: first, think of the double slit being hit by light. We know from optics about the phenomenon of interference: the electromagnetic field is radiated from each slit, thus interfering when reaching the screen. The interference pattern is easily understood, mathematically, when we think of the electric field as a wave propagating through space. We know that the intensity observed at the screen is the modulus squared of $\boldsymbol E$, where $\boldsymbol E=\boldsymbol E_1+\boldsymbol E_2$. When calculating the modulus squared, we get the expected interference (crossed) term. The observed intensity is just $I(x)=|\boldsymbol E(x)|^2$.
On the other hand, if we think of the experiment when using electrons, we know that the interference pattern is still produced, so by being inspired from classical electrodynamics, we think of another wave propagating through space, such that its modulus squared gives the intensity on the screen, i.e., the modulus squared of the wave function is like the intensity of the light: where it is high, there is a high chance of finding an electron. In this way, we can think of $|\psi|^2$ as a probability distribution, in the same way we can think of $|\boldsymbol E|^2$ as a probability distribution of the photon. There is actually a lot from QM taken from classical electromagnetism.
For the record, I must say that this analogy between the electric field and the wave-function is rather limited, and should not be pushed too far: it will lead to incorrect conclusions. The electric field is not the wave function of the photon.