Well there are multiple reasons, but a very important one is that it can be proven (from the Schrödinger equation) that
$$\frac{\mathrm d}{\mathrm dt}\int \mathrm d\boldsymbol x\ |\psi(\boldsymbol x,t)|^2=0$$
so that, if at any moment in time we have $\int \mathrm d\boldsymbol x\ |\psi(\boldsymbol x,t)|^2=1$, this will remain true at any other time.
On the other hand, the derivative of the integral of $|\psi|$ is not time independent, so a consistent normalization is not possible
We need that the integral to be time independent, because otherwise a probabilistic interpretation wouldn't be possible. We need that the probability of finding the particle somewhere has to be $1$. If we used $|\psi|$ as a probability distribution, and at any point in time had we that the integral equals $1$, this will change over time, what wouldn't make any sense. On the other hand, as I already stated, we can think of $|\psi|^2$ as a probability just because its integral is time-independent. So if at any point in time we have that the integral of $|\psi|^2$ equals $1$, this will remain true at any other point in time.
Also, this has nothing to do with the Schrödinger equation being of 2nd order: Dirac equation is a 1st order equation and (in some sense), the probability distribution is still $\psi^\dagger\psi$.
Edit: there is another explanation that might be more "physical", closer to our intuition. You probably know about the double slit-experiment, a standard way of introducing QM. When learning about such experiment, we are given two scenarios: first, think of the double slit being hit by light. We know from optics about the phenomenon of interference: the electromagnetic field is radiated from each slit, thus interfering when reaching the screen. The interference pattern is easily understood, mathematically, when we think of the electric field as a wave propagating through space. We know that the intensity observed at the screen is the modulus squared of $\boldsymbol E$, where $\boldsymbol E=\boldsymbol E_1+\boldsymbol E_2$. When calculating the modulus squared, we get the expected interference (crossed) term. The observed intensity is just $I(x)=|\boldsymbol E(x)|^2$.
On the other hand, if we think of the experiment when using electrons, we know that the interference pattern is still produced, so by being inspired from classical electrodynamics, we think of another wave propagating through space, such that its modulus squared gives the intensity on the screen, i.e., the modulus squared of the wave function is like the intensity of the light: where it is high, there is a high chance of finding an electron. In this way, we can think of $|\psi|^2$ as a probability distribution, in the same way we can think of $|\boldsymbol E|^2$ as a probability distribution of the photon. There is actually a lot from QM taken from classical electromagnetism.
For the record, I must say that this analogy between the electric field and the wave-function is rather limited, and should not be pushed too far: it will lead to incorrect conclusions. The electric field is not the wave function of the photon.
Any equation claiming to describe fundamental process has to have time. We know that in nature system changes with time. They are not static. For example, absorption of a photon by a hydrogen atom is a dynamic process and we need to have a theory which can predict this dynamical process in time. Since Schrodinger equation is supposed to be an equation describing nature, time needs to get involved somewhere.
The question then is why did Schrodinger choose this particular equation? There is no rigorous argument to that. If you read Schrodinger's original paper, you will see that Schrodinger used Hamilton Jacobi Equation, $H(q,\frac{\partial S}{\partial q})-\frac{\partial S}{\partial t}=0$. He replaced action $S$ by, $$\psi=e^{KS},$$
where $K$ is some constant with the dimension of action (seems familiar?). He then postulates that integral of the left-hand side of above equation should be extremized. This leads to Schrodinger Equation. The factor of time derivative in Schrodinger equation comes from the term $\frac{\partial S}{\partial t}$ in Hamilton Jacobi Equation.
Above procedure may seem, and is, ad-hoc. But he was able to calculate energy spectrum of Hydrogen atom successfully with it. He also wrote a paper in which he sort of gives motivation for this procedure. He used the geometrical formulation of Hamilton Jacobi Equation and argued that quantum physics differ from classical in the same way wave optics differ from geometric optics.
In short, we require wave function to have time dependence if it has to describe nature. The way this dependence was introduced by Schrodinger is little bit shaky but not entirely bogus. I would suggest you to read his original papers if you want to understand his motivation.
Best Answer
If this were computer science, we might say $\psi$ takes a $d$-tuple of reals ($r$) and another real ($t$) and returns a complex number with the attached unit of $L^{-d/2}$ in $d$ dimensions (with $L$ being the unit of length).1
If you want any more of an interpretation, well then you've already given it: $\psi(r,t)$ is the thing such that $\int_R\ \lvert \psi(r,t) \rvert^2\ \mathrm{d}V$ is the probability of the particle being observed in the region $R$ at time $t$. You can loosely think of it as a "square root" of a probability distribution.
The reason the "square root" interpretation is not quite right, and probably the reason you aren't satisfied with the $\int_R\ \lvert \psi(r,t) \rvert^2\ \mathrm{d}V$ definition, is that any particular instance of $\psi(r,t)$ carries extraneous information beyond what is needed to fully specify the physics. In particular, if we have $\psi_1$ describing a situation, then the wavefunction defined by $\psi_2(r,t) = \mathrm{e}^{i\phi} \psi_1(r,t)$ gives identical physics for any real phase $\phi$.
So the return value of the wavefunction itself is not a physical observable -- one always takes a square magnitude or does some other such thing that projects many mathematically distinct functions onto the same physical state. Even once you've taken the square magnitude, $\lvert \psi(r,t) \rvert^2$ arguably isn't directly observable, as all we can measure is $\int_R\ \lvert \psi(r,t) \rvert^2\ \mathrm{d}V$ (though admittedly for arbitrary regions $R$).
1You can check that $-d/2$ is necessarily the exponent. We need some unit such that squaring it and multiplying by the $d$-dimensional volume becomes a probability (i.e. is unitless). That is, we are solving $X^2 L^d = 1$, from which we conclude $X = L^{-d/2}$.