I really appreciate this question. You are perfectly right and your confusion is understandable. Sadly, the physics world is somewhat sloppy in their use of notation at times.
Of course, when writing $P\psi(x)$ one does not intent to apply the operator to a scalar, but the $P$ is applied to the ket vector $\psi$.
But now comes the major source of confusion, IMO. While to you the wave function really is just
the coefficients of a state vector when the vector is written in a particular basis
and the state vector is element of some abstract Hilbert space, many people will argue that the Hilbert space is actually a function space. Which one precisely depends on the system under consideration but specifically for a free particle it'll be $L^2(\mathbb R,\mathbb C)$, i.e. the space of square-integrable function from $\mathbb R$ to $\mathbb C$. So in this space, the actual function $\psi$ is the vector. That space is linear (a.k.a. a vector space) and it also has a scalar product, defined via the integral. It is also complete which means that one can insert identities as you did above. That's actually what a Hilbert space is (to a mathematician): a complete vector space with a scalar product.
When taking on this view, it is legitimate to apply an operator to a function. Indeed, only now does the identification $P=-i\hbar\partial_x$ make any sense. It is a differential operator which can be applied to functions.
The advantage of the Dirac bra-ket-notation is that it allows you to step back from any concrete realizations of the underlying Hilbert space. This has some advantages:
- The analogy to linear algebra is more pronounced.
- Many general ideas which arise in quantum theory can be formulated independent of the system under investigation. In particular, the formalism is the same for infinite-dimensional Hilbert space like the $L^2$ and finite-dimensional Hilbert spaces like $\mathbb C^2$, the spin-1/2 space.
- Change of basis is very transparent. Specifically the connection of position and momentum space via the Fourier transform comes about naturally.
So why should we bother working in function spaces?
- It leads to differential equations which have a well-understood theory.
- There are some mathematical subtleties$^1$ involved when dealing with infinite-dimensional spaces. Those can best be understood in function spaces.
- Often we need to leave the blessed world of Hilbert spaces and go beyond that. The space is then no longer complete. Delta functions are such an example. They are certainly not square integrable, hell, the square of a delta function is not even defined. In won't go into depth here. The phrase you need to search for is Gelfand space triplet.
$^1$ For an overview of what can go terribly wrong in quantum mechanics upon getting to comfortable with Dirac's notation, I recommend this excellent article: F. Gieres, Mathematical surprises and Dirac's formalism in quantum mechanics, arXiv:quant-ph/9907069
Edit in response to @Jyothi's comment:
The action of the momentum operator as a derivative may be written as $\langle x|\hat P|\psi\rangle = -i\hbar\frac{\partial}{\partial x}\langle x|\psi\rangle$.
Now, consider the following calculation of the action of the operator product $\hat P\hat X$:
$$
\langle x|\hat P\hat X|\psi\rangle = \int\mathrm dx'\,\langle x|\hat P\hat X|x'\rangle\langle x'|\psi\rangle = \int\mathrm dx'\,x'\langle x|\hat P|x'\rangle\langle x'|\psi\rangle\\
= -i\hbar\int\mathrm dx'\,x'\psi(x')\frac{\partial}{\partial x}\delta(x-x')\\
= +i\hbar\int\mathrm dx'\,x'\psi(x')\frac{\partial}{\partial x'}\delta(x-x')\\
=-i\hbar\int\mathrm dx'\,\delta(x-x')\frac{\partial}{\partial x'}(x'\psi(x'))\\
=-i\hbar\frac{\partial}{\partial x}(x\psi(x))\\
= -i\hbar\Bigl(\psi(x)+x\frac{\partial}{\partial x}\psi(x)\Bigr)\\
=-i\hbar\langle x|\psi\rangle + x\langle x|\hat P|\psi\rangle\\
=-i\hbar\langle x|\psi\rangle + \langle x|\hat X\hat P|\psi\rangle\\
=\langle x|(-i\hbar+\hat X\hat P)|\psi\rangle.
$$
In the third line, I rewrote a derivative for $x$ as an derivative for $x'$, yielding a minus sign. In the step after, integration by parts was used, arguing that the boundary terms vanish at $\pm\infty$. By assuming this holds for general $|\psi\rangle$, one can equate the operators on the LHS and RHS, giving $\hat P\hat X=-i\hbar+\hat X\hat P$ or $[\hat X,\hat P]=i\hbar$.
This "calculation" is severely handwavy, none of it is mathematically rigorous. For example, I'm taking derivatives of $\delta$ which is not defined. But I believe the gist of this calculation could be made rigorous in a distributional sense. Also, in my mind the commutator is actually more fundamental and this, if anything, shows that the wave function representation is compatible with that.
There is no eigenvector corresponding to continuous spectrum. The formalism of Gel'fand triples does not give much help in solving your doubts either, and it has very few applications in my experience. One reason is that those "generalized eigenvectors" are not in the Hilbert space but on a bigger space, and what you can do with them is not much on a rigorous standpoint (for example, they can act- by topological duality -only on a subset of the vectors of the Hilbert space; they cannot be multiplied; they cannot act on other generalized eigenvectors;...).
The spectral theorem is, instead, valid for any self-adjoint operator. Even more importantly, there is a one-to-one correspondence between $A\in SelfAdjoint(\mathscr{H})$ and special families of projections called spectral families (or projection-valued measures) $\{P_\Omega\}_{\Omega\in Borel(\mathbb{R})}\subset \mathcal{L}(\mathscr{H})$.
These projectors are the generalization of $\lvert \psi_n\rangle\langle\psi_n\rvert$, projecting on the subspace with eigenvalue $n$ (supposed to be of multiplicity one); so they project, roughly speaking, on the subspace of vectors where the operator take values only in the $\Omega$ portion of the spectrum. In fact, if we denote by $dP(\lambda)$ the measure with respect to the spectral family $\{P_\Omega\}_{\Omega\in Borel(\mathbb{R})}$ (that can be rigorously defined), the associated operator can be written as:
$$A=\int_\mathbb{R}\lambda dP(\lambda)\; ;$$
and the associated decomposition of the identity may be written as:
$$1=\int_\mathbb{R} dP(\lambda)=P_{\mathbb{R}}\; .$$
The last equation is the correct mathematical way of writing the second equation of the OP.
The other equations instead, do not have a rigorous counterpart since there are no eigenvectors of operators with continuous spectra (of course you may write $\psi=\int_{\mathbb{R}}dP(\lambda)(\psi)$, but that is not so useful).
Best Answer
I'll start with your first expression: As you correctly noticed, $| \xi \rangle\langle \xi |$ can be seen as an element of $\mathcal L(\mathcal H)$, therefore you are integrating an $\mathcal L(\mathcal H)$-valued function which gives a result $$ \int | \xi \rangle\langle \xi | \, \mathrm d\xi \in \mathcal L(\mathcal H) . $$
Your second expression is interesting: It can be understood in two ways. These two ways are equivalent because the tensor product is associative.
You can see it as $$ \left( \int \mathrm dx | x \rangle\langle x | \right) | \alpha \rangle , $$ which is obviously an element of $\mathcal H$ according to what I said about your first expression.
Alternatively, you can see it as an integral of a $\mathcal H$-valued function: $$ \int | x \rangle\langle x | \alpha \rangle \, \mathrm dx = \int \langle x | \alpha \rangle \, | x \rangle \, \mathrm dx . $$
Finally, I should mention that these integrals are still not well defined from a strict mathematical point of view, as soon as there is continuous spectrum. The correct way to go about is to view the whole expression $| x \rangle \langle x | \, \mathrm dx$ as a projection-valued measure (Wikipedia). That subject is too complex to explain in just one post here, though.