I really appreciate this question. You are perfectly right and your confusion is understandable. Sadly, the physics world is somewhat sloppy in their use of notation at times.
Of course, when writing $P\psi(x)$ one does not intent to apply the operator to a scalar, but the $P$ is applied to the ket vector $\psi$.
But now comes the major source of confusion, IMO. While to you the wave function really is just
the coefficients of a state vector when the vector is written in a particular basis
and the state vector is element of some abstract Hilbert space, many people will argue that the Hilbert space is actually a function space. Which one precisely depends on the system under consideration but specifically for a free particle it'll be $L^2(\mathbb R,\mathbb C)$, i.e. the space of square-integrable function from $\mathbb R$ to $\mathbb C$. So in this space, the actual function $\psi$ is the vector. That space is linear (a.k.a. a vector space) and it also has a scalar product, defined via the integral. It is also complete which means that one can insert identities as you did above. That's actually what a Hilbert space is (to a mathematician): a complete vector space with a scalar product.
When taking on this view, it is legitimate to apply an operator to a function. Indeed, only now does the identification $P=-i\hbar\partial_x$ make any sense. It is a differential operator which can be applied to functions.
The advantage of the Dirac bra-ket-notation is that it allows you to step back from any concrete realizations of the underlying Hilbert space. This has some advantages:
- The analogy to linear algebra is more pronounced.
- Many general ideas which arise in quantum theory can be formulated independent of the system under investigation. In particular, the formalism is the same for infinite-dimensional Hilbert space like the $L^2$ and finite-dimensional Hilbert spaces like $\mathbb C^2$, the spin-1/2 space.
- Change of basis is very transparent. Specifically the connection of position and momentum space via the Fourier transform comes about naturally.
So why should we bother working in function spaces?
- It leads to differential equations which have a well-understood theory.
- There are some mathematical subtleties$^1$ involved when dealing with infinite-dimensional spaces. Those can best be understood in function spaces.
- Often we need to leave the blessed world of Hilbert spaces and go beyond that. The space is then no longer complete. Delta functions are such an example. They are certainly not square integrable, hell, the square of a delta function is not even defined. In won't go into depth here. The phrase you need to search for is Gelfand space triplet.
$^1$ For an overview of what can go terribly wrong in quantum mechanics upon getting to comfortable with Dirac's notation, I recommend this excellent article: F. Gieres, Mathematical surprises and Dirac's formalism in quantum mechanics, arXiv:quant-ph/9907069
Edit in response to @Jyothi's comment:
The action of the momentum operator as a derivative may be written as $\langle x|\hat P|\psi\rangle = -i\hbar\frac{\partial}{\partial x}\langle x|\psi\rangle$.
Now, consider the following calculation of the action of the operator product $\hat P\hat X$:
$$
\langle x|\hat P\hat X|\psi\rangle = \int\mathrm dx'\,\langle x|\hat P\hat X|x'\rangle\langle x'|\psi\rangle = \int\mathrm dx'\,x'\langle x|\hat P|x'\rangle\langle x'|\psi\rangle\\
= -i\hbar\int\mathrm dx'\,x'\psi(x')\frac{\partial}{\partial x}\delta(x-x')\\
= +i\hbar\int\mathrm dx'\,x'\psi(x')\frac{\partial}{\partial x'}\delta(x-x')\\
=-i\hbar\int\mathrm dx'\,\delta(x-x')\frac{\partial}{\partial x'}(x'\psi(x'))\\
=-i\hbar\frac{\partial}{\partial x}(x\psi(x))\\
= -i\hbar\Bigl(\psi(x)+x\frac{\partial}{\partial x}\psi(x)\Bigr)\\
=-i\hbar\langle x|\psi\rangle + x\langle x|\hat P|\psi\rangle\\
=-i\hbar\langle x|\psi\rangle + \langle x|\hat X\hat P|\psi\rangle\\
=\langle x|(-i\hbar+\hat X\hat P)|\psi\rangle.
$$
In the third line, I rewrote a derivative for $x$ as an derivative for $x'$, yielding a minus sign. In the step after, integration by parts was used, arguing that the boundary terms vanish at $\pm\infty$. By assuming this holds for general $|\psi\rangle$, one can equate the operators on the LHS and RHS, giving $\hat P\hat X=-i\hbar+\hat X\hat P$ or $[\hat X,\hat P]=i\hbar$.
This "calculation" is severely handwavy, none of it is mathematically rigorous. For example, I'm taking derivatives of $\delta$ which is not defined. But I believe the gist of this calculation could be made rigorous in a distributional sense. Also, in my mind the commutator is actually more fundamental and this, if anything, shows that the wave function representation is compatible with that.
Since you want a bit of mathematical rigor:
A quantum state is a self-adjoint positive trace class operator on a Hilbert space with trace 1. This is called density matrix $\rho$. In its simplest form, given $\psi\in \mathscr{H}$, $\rho$ is the orthogonal projector on the subspace spanned by $\psi$. Let $E_\rho(\cdot):D_\rho\subset\mathcal{A}(\mathscr{H})\to \mathbb{R}$ be the map defined as:
$$E_\rho(A)=\mathrm{Tr}(A\rho)\; ,$$
where $\mathcal{A}(\mathscr{H})$ is the space of self-adjoint operators, $\mathrm{Tr}$ is the trace on $\mathscr{H}$ and
$$D_\rho=\{A\in \mathcal{A}(\mathscr{H})\; ,\; \mathrm{Tr}\lvert A\rho\rvert<+\infty\}\; .$$
The map $E_\rho(\cdot)$ has all the properties of an expectation in probability theory. I don't know if it is possible to characterize the measure $\mu$ associated to it (maybe by means of the projection valued measures associated to $\rho$ by the spectral theorem, but it is not straightforward at least for me).
Best Answer
You need to apply the operator first and then evaluate the integral:
$⟨P⟩_ψ = i\hbar\int{\psi^*(x)\frac{d\psi(x)}{dx}dx}$