I really appreciate this question. You are perfectly right and your confusion is understandable. Sadly, the physics world is somewhat sloppy in their use of notation at times.
Of course, when writing $P\psi(x)$ one does not intent to apply the operator to a scalar, but the $P$ is applied to the ket vector $\psi$.
But now comes the major source of confusion, IMO. While to you the wave function really is just
the coefficients of a state vector when the vector is written in a particular basis
and the state vector is element of some abstract Hilbert space, many people will argue that the Hilbert space is actually a function space. Which one precisely depends on the system under consideration but specifically for a free particle it'll be $L^2(\mathbb R,\mathbb C)$, i.e. the space of square-integrable function from $\mathbb R$ to $\mathbb C$. So in this space, the actual function $\psi$ is the vector. That space is linear (a.k.a. a vector space) and it also has a scalar product, defined via the integral. It is also complete which means that one can insert identities as you did above. That's actually what a Hilbert space is (to a mathematician): a complete vector space with a scalar product.
When taking on this view, it is legitimate to apply an operator to a function. Indeed, only now does the identification $P=-i\hbar\partial_x$ make any sense. It is a differential operator which can be applied to functions.
The advantage of the Dirac bra-ket-notation is that it allows you to step back from any concrete realizations of the underlying Hilbert space. This has some advantages:
- The analogy to linear algebra is more pronounced.
- Many general ideas which arise in quantum theory can be formulated independent of the system under investigation. In particular, the formalism is the same for infinite-dimensional Hilbert space like the $L^2$ and finite-dimensional Hilbert spaces like $\mathbb C^2$, the spin-1/2 space.
- Change of basis is very transparent. Specifically the connection of position and momentum space via the Fourier transform comes about naturally.
So why should we bother working in function spaces?
- It leads to differential equations which have a well-understood theory.
- There are some mathematical subtleties$^1$ involved when dealing with infinite-dimensional spaces. Those can best be understood in function spaces.
- Often we need to leave the blessed world of Hilbert spaces and go beyond that. The space is then no longer complete. Delta functions are such an example. They are certainly not square integrable, hell, the square of a delta function is not even defined. In won't go into depth here. The phrase you need to search for is Gelfand space triplet.
$^1$ For an overview of what can go terribly wrong in quantum mechanics upon getting to comfortable with Dirac's notation, I recommend this excellent article: F. Gieres, Mathematical surprises and Dirac's formalism in quantum mechanics, arXiv:quant-ph/9907069
Edit in response to @Jyothi's comment:
The action of the momentum operator as a derivative may be written as $\langle x|\hat P|\psi\rangle = -i\hbar\frac{\partial}{\partial x}\langle x|\psi\rangle$.
Now, consider the following calculation of the action of the operator product $\hat P\hat X$:
$$
\langle x|\hat P\hat X|\psi\rangle = \int\mathrm dx'\,\langle x|\hat P\hat X|x'\rangle\langle x'|\psi\rangle = \int\mathrm dx'\,x'\langle x|\hat P|x'\rangle\langle x'|\psi\rangle\\
= -i\hbar\int\mathrm dx'\,x'\psi(x')\frac{\partial}{\partial x}\delta(x-x')\\
= +i\hbar\int\mathrm dx'\,x'\psi(x')\frac{\partial}{\partial x'}\delta(x-x')\\
=-i\hbar\int\mathrm dx'\,\delta(x-x')\frac{\partial}{\partial x'}(x'\psi(x'))\\
=-i\hbar\frac{\partial}{\partial x}(x\psi(x))\\
= -i\hbar\Bigl(\psi(x)+x\frac{\partial}{\partial x}\psi(x)\Bigr)\\
=-i\hbar\langle x|\psi\rangle + x\langle x|\hat P|\psi\rangle\\
=-i\hbar\langle x|\psi\rangle + \langle x|\hat X\hat P|\psi\rangle\\
=\langle x|(-i\hbar+\hat X\hat P)|\psi\rangle.
$$
In the third line, I rewrote a derivative for $x$ as an derivative for $x'$, yielding a minus sign. In the step after, integration by parts was used, arguing that the boundary terms vanish at $\pm\infty$. By assuming this holds for general $|\psi\rangle$, one can equate the operators on the LHS and RHS, giving $\hat P\hat X=-i\hbar+\hat X\hat P$ or $[\hat X,\hat P]=i\hbar$.
This "calculation" is severely handwavy, none of it is mathematically rigorous. For example, I'm taking derivatives of $\delta$ which is not defined. But I believe the gist of this calculation could be made rigorous in a distributional sense. Also, in my mind the commutator is actually more fundamental and this, if anything, shows that the wave function representation is compatible with that.
Were you bothered by having the differential $dx$ and the ket $|x\rangle$ in the 1D example? If not, what makes this different?
Let's expand out your formula a little more.
$$
\begin{array}{rcl}
\langle \phi | \phi \rangle &=& \int d^3\vec{r}\langle\phi|\vec{r}\rangle\langle \vec{r}|\phi\rangle \\
&=& \int d^3 \vec{r} \phi^*(\vec{r})\phi(\vec{r})\\
&=& \int d^3 \vec{r} |\phi(\vec{r})|^2
\end{array}
$$
So this is clearly just the integral over all space of the probability density, as desired.
Best Answer
This is a very interesting question. I don't know if there is a general and definitive answer, but I'll try to make some comments. I apologize if this ends up rambling; I'm finding this out as I write this answer.
Operators have dimensions, since their eigenvalues are physical quantities. For bras and kets it gets more complicated. First, you cannot in general say that they are dimensionless. To see why, consider a state with a certain position $|x\rangle$. Since $\langle x | x' \rangle = \delta(x-x')$ and the Dirac delta has the inverse dimension of its argument, it must be that $[ \langle x | ] \times [ | x \rangle ] = 1/L$. A similar relationship holds for momentum eigenstates. Of course, there are higher powers of $L$ in higher dimensions.
However, consider an operator with discrete spectrum, such as the energy in an atom or something like that. Then the appropriate equation is $\langle m | n \rangle = \delta_{mn}$, and since this delta is dimensionless, bras and kets must have inverse dimensions. This gets even weirder when you consider that the Hamiltonian for a hydrogen atom has both discrete and continuous eigenvalues, so the relationship between the bras and the kets' dimensions will be different depending on the energy (or whatever physical quantity is appropiate).
We have the equation $\langle x | p \rangle = \frac1{\sqrt{2\pi\hbar}} \exp(ipx/\hbar)$. I at first thought that this combined with $[\langle x |] \times [| p \rangle ] = [\langle p |] \times [| x \rangle ]$ would allow us to find the dimensions of $|x\rangle$ (and everything else), but it turns out that the normalization conditions of $|x\rangle$ and $|p\rangle$ force the dimensions of $\langle x | p \rangle$ to come out right. We can find that $[|p\rangle] = \sqrt{T/M} [|x \rangle]$, but we can't go any further. Similar relationships will apply for the eigenstates of your favourite operator.
Any given ket is a linear combination of eigenkets, but again there are subtleties depending on whether the spectrum is discrete or continuous. Suppose we have two observables $O_1$ and $O_2$ with discrete spectrum and eigenstates $|n\rangle_1$ and $|n\rangle_2$. Any state $|\psi\rangle$ can be expressed as a dimensionless linear combination of the eigenstates (dimensionless because since $\langle n | n \rangle = 1$, the squares of the coefficientes make up probabilities): $|\psi\rangle = \sum_n a_n |n\rangle_1 = \sum_n b_n |n\rangle_2$. This implies that the eigenkets of all observables with discrete spectrum have the same dimensions, and likewise for the eigenbras.
It gets trickier for observables with continuous spectrum such as $x$ and $p$, because of the integration measure. We have $|\psi\rangle = \int f(x) |x\rangle\ dx = \int g(p) |p\rangle\ dp$. $\langle \psi | \psi \rangle = 1$ implies $\int |f(x)|^2\ dx = 1$, so that $[f] = 1/\sqrt{L}$ and likewise $[g] = \sqrt{T/ML}$. This should be no surprise since $f$ and $g$ are Fourier transforms of each other, with an $1/\sqrt{\hbar}$ thrown in. From this we can deduce $[|p\rangle] = \sqrt{T/M} [|x \rangle]$, which we already knew, and $\sqrt{L} [|x \rangle] = [|n \rangle]$.
The conclusion seems to be the following. All eigenkets with discrete eigenvalues must have the same dimensions, but it looks like that dimension is arbitrary (so you could take them to be dimensionless). Furthermore, normalized states have that same dimension. Eigenstates with continuous spectrum are more complicated; if we have an observable $A$ (with continuous eigenvalues) with eigenvalues $a$, then we can use the fact that $|\psi\rangle$ can be written either as an integral over eigenstates of $A$ or as a sum over discrete eigenstates to find that $\sqrt{[a]} [|a\rangle] = [|n\rangle]$, where $|n\rangle$ is some discrete eigenket. So once you fix the dimensions of one ket, you fix the dimensions of every other ket.