Well there are multiple reasons, but a very important one is that it can be proven (from the Schrödinger equation) that
$$\frac{\mathrm d}{\mathrm dt}\int \mathrm d\boldsymbol x\ |\psi(\boldsymbol x,t)|^2=0$$
so that, if at any moment in time we have $\int \mathrm d\boldsymbol x\ |\psi(\boldsymbol x,t)|^2=1$, this will remain true at any other time.
On the other hand, the derivative of the integral of $|\psi|$ is not time independent, so a consistent normalization is not possible
We need that the integral to be time independent, because otherwise a probabilistic interpretation wouldn't be possible. We need that the probability of finding the particle somewhere has to be $1$. If we used $|\psi|$ as a probability distribution, and at any point in time had we that the integral equals $1$, this will change over time, what wouldn't make any sense. On the other hand, as I already stated, we can think of $|\psi|^2$ as a probability just because its integral is time-independent. So if at any point in time we have that the integral of $|\psi|^2$ equals $1$, this will remain true at any other point in time.
Also, this has nothing to do with the Schrödinger equation being of 2nd order: Dirac equation is a 1st order equation and (in some sense), the probability distribution is still $\psi^\dagger\psi$.
Edit: there is another explanation that might be more "physical", closer to our intuition. You probably know about the double slit-experiment, a standard way of introducing QM. When learning about such experiment, we are given two scenarios: first, think of the double slit being hit by light. We know from optics about the phenomenon of interference: the electromagnetic field is radiated from each slit, thus interfering when reaching the screen. The interference pattern is easily understood, mathematically, when we think of the electric field as a wave propagating through space. We know that the intensity observed at the screen is the modulus squared of $\boldsymbol E$, where $\boldsymbol E=\boldsymbol E_1+\boldsymbol E_2$. When calculating the modulus squared, we get the expected interference (crossed) term. The observed intensity is just $I(x)=|\boldsymbol E(x)|^2$.
On the other hand, if we think of the experiment when using electrons, we know that the interference pattern is still produced, so by being inspired from classical electrodynamics, we think of another wave propagating through space, such that its modulus squared gives the intensity on the screen, i.e., the modulus squared of the wave function is like the intensity of the light: where it is high, there is a high chance of finding an electron. In this way, we can think of $|\psi|^2$ as a probability distribution, in the same way we can think of $|\boldsymbol E|^2$ as a probability distribution of the photon. There is actually a lot from QM taken from classical electromagnetism.
For the record, I must say that this analogy between the electric field and the wave-function is rather limited, and should not be pushed too far: it will lead to incorrect conclusions. The electric field is not the wave function of the photon.
There is overlap with other questions linked in the comments. But, perhaps the focus of this question is different enough to merit a separate answer. There are at least two distinct but equivalent formalisms of QFT, the canonical approach and the path integral approach. Although, they are equivalent mathematically and in their experimental predictions, they do provide very different ways of thinking about QFT phenomena. The one most suited for your question is the path integral approach.
In the path integral approach, to describe an experiment we start with the field in one configuration and then we work out the amplitude for the field to evolve to another definite configuration that represents a possible measurement in the experiment. So in the two slit case we can start with a plane wave in front of the two slits representing the experiment starting with an electron of a particular momentum. Then our final configuration will be a delta function at the screen representing the electron measured at that point at some later specified time. We can work out the probability for this to occur by evaluating the amplitude for the field to evolve between the initial and final configuration in all possible ways. We then sum these amplitudes and take the norm in the usual QM way.
So in this approach there are no particles, just excitations in the field.
Best Answer
What I guessed in the comments was true. Schrödinger mentioned in his $1926$ paper (see below) that "the real continuous partition of the charge is a sort of mean$\dots$".
So he got the right equation but he interpreted it wrongly. He believed that in reality electron has a continuous charge distribution. But he did mention that "no very definite experimental results can be brought forward in favour of his hypothesis".
The following is a relevant excerpt from Schrödinger's $1926$ paper "An Undulatory Theory of Mechanics of Atoms and Molecules" in The Physics Review (Vol. $28$, No. $6$, pp. $1067$):