It is true that a lot of quantum mechanics can be taught and understood without much knowledge of the mathematical foundations, and usually it is. Since QM is a mandatory class at many faculties that future experimental physicists have to attend, too, this also makes sense. But for future theoretical and mathematical physicists, it may pay off to learn a little bit about the math, too.
A little anecdote: John von Neumann once said to Werner Heisenberg that mathematicians should be grateful for QM, because it led to the invention of a lot of beautiful mathematics, but that mathematicians repaid this by clarifying, e.g., the difference between a selfadjoint and a symmetric operator. Heisenberg asked: "What is the difference?"
Suppose you want to calculte exp(A). Why don't you define exp(A):=1+A+1/2 A^2 + ... and require convergence with respect to the operator norm.
That's correct. The benefit of the spectral theorem is that you can define f(A) for any selfadjoint (or more generally, normal) operator for any bounded Borel function. This comes in handy in many proofs in operator theory.
In addition to that I've heard that the spectral theorem gives a full description of all self-adjoint operators. Now why is that the case? I mean okay..there's a one to one correspondence between self-adjoint operators and spectral measures..
That's correct, too. Spectral measures are much much simpler objects than selfadoint operators, that's why. Futhermore, you can use the spectral theorem to prove that every selfadjoint operator is unitarily equivalent to a multiplication operator (multiply f(x) by x). From an abstract viewpoint, this is a very satisfactory characterization. It does not help much for concrete calcuations in QM, though.
BTW: On a more advanced level, you'll need to understand the spectral theorem to understand what a mass gap is in Yang-Mills Theory (millenium problem).
Hint: In QFT in Minkowski-Spacetime, one usually assumes that there is a continuous representation of the Poincaré group, especially of the commutative subgroup of translations, on the Hilbert space that contains all physical states. The operators that form the representation have a common spectral measure, this is an application of the SNAG-theorem. The support of this spectral measure is bounded away from zero, that's the definition of the mass gap.
Although we can define the momentum as a self-adjoint operator in $L^2[0,1]$ as you proposed, I think it's rather artificial to think about it as having relation to momentum in the case of $L^2(\Bbb{R})$. Realize that the operator $p_1$ with domain
$D(p_1)=\{\psi\in\mathcal{H}^1[0,1]\,|\,\psi(1)=\psi(0)\}$, is related to spatial translations via the unitary group $U(t)=\exp(-itp_1)$, whose action is
$$(U(t)\psi)(x)=\psi[x-t\pmod{1}],$$
so it's about a particle in a torus, not in an infinite square-well. Different values of $\alpha$, just give different phases to the wavefunction when it reaches the border and goes to the other side.
So, in my opinion, these operators are not actually related to the momentum as usually conceived. The idea of an infinite square-well does not allow spatial translations, so there's no self adjoint operator associated to a unitary translation group in this case. This happens for example in the case of a particle in the postive real line $\Bbb{R}_+$. In this case, the space $L^2[0,\infty)$ allows only translations to the right, not to the left, so you can not have a self-adjoint operator associated to a unitary group of translations. In this case, the operator $p=-i\dfrac{d}{dx}$ has no self-adjoint extensions, for any initial domain, although it is symmetric. For a particle in a box, we can think in the same way. There's no operator associated to spatial translations, because there is no spatial translations allowed.
It's also important to note that the hamiltonian $H$ in this case is given by the Friedrich extension of $$p_0^2=-\frac{d^2}{dx^2}\\
\mathcal{D}(p_0^2)=\{\psi\in\mathcal{H}^2[0,1]\,|\,\psi(0)=\psi'(0)=0=\psi'(1)=\psi(1)\}$$
$H$ cannot be the square of any $p_\alpha$, since the domains do not match.
Edit: As pointed out by @jjcale, one way to take the momentum in this case should be $p=\sqrt{H}$, but clearly, the action of $p$ can't be a derivative, because it has the same eigenfunctions of $H$, which are of the form $\psi_k(x)=\sin \pi kx$. This ilustrates the fact that it's not related to spatial translations as stated above.
Edit 2: There's is a proof that the Friedrich extension is the one with Dirichilet boundary conditions in Simon's Vol. II, section X.3.
The domains defined by the spectral theorem are indeed $\{\psi: p_\alpha\psi\in\mathcal{D}(p_\alpha)\}$. To see this, realize that in this case, since the spectrum is purely point, by the spectral theorem, we have
$$p_\alpha=\sum_{n\in \Bbb{Z}}\lambda_{\alpha,n}P_n,$$
where $\lambda_{\alpha,n}$ are the eigenvalues associated to the normalized eigenvectors $\psi_{\alpha,n}$, and $P_n=\psi_n\langle\psi_n,\cdot\rangle$ are the projections in each eigenspace. The domain $\mathcal{D}(p_\alpha)$ is then given by the vectors $\xi$, such that
$$\sum_{n\in \Bbb{Z}}|\lambda_{\alpha,n}|^2\|P_n\xi\|^2=\sum_{n\in \Bbb{Z}}|\lambda_{\alpha,n}|^2|\langle\psi_n,\xi\rangle|^2<+\infty$$
Also, $\xi\in\mathcal{D}(p_\alpha^2)$ iff
$$\sum_{n\in \Bbb{Z}}|\lambda_{\alpha,n}|^4\|P_n\xi\|^2=\sum_{n\in \Bbb{Z}}|\lambda_{\alpha,n}|^4|\langle\psi_n,\xi\rangle|^2<+\infty$$
But then, $p_\alpha\xi$ is such
$$\sum_{n\in \Bbb{Z}}|\lambda_{\alpha,n}|^2\|P_np_\alpha\xi\|^2=\sum_{n\in \Bbb{Z}}|\lambda_{\alpha,n}|^2|\langle\psi_n,p_\alpha\xi\rangle|^2=
\sum_{n\in \Bbb{Z}}|\lambda_{\alpha,n}|^2|\langle p_\alpha\psi_n,\xi\rangle|^2=
\sum_{n\in \Bbb{Z}}|\lambda_{\alpha,n}|^4|\langle\psi_n,\xi\rangle|^2<+\infty$$
So, $\mathcal{D}(p_\alpha^2)= \{\psi: p_\alpha\psi\in\mathcal{D}(p_\alpha)\}$.
Best Answer
There may be some problems in properly define the derivative for arbitrary unbounded operators. This is because as far as I know there is no suitable definition of topology on the set of unbounded operators.
If we restrict to closed operators (such as self-adjoint operators) acting on a Hilbert space $H$, then it is possible to define a metric. The set of closed operators becomes then a (non-complete) metric space $\mathcal{C}(H)$. Before discussing (briefly) what the metric is, let me remark that $\mathcal{C}(H)$ is not a linear space, for in general it is not possible to sum two closed unbounded operators. The distance between closed operators $T$ and $S$ is defined, roughly speaking, as the gap between the graphs $G(T)$ and $G(S)$. The graph of an operator is a closed linear manifold in $H\times H$ defined by $$G(T)=\{(\varphi,\psi) \in H\times H \;, \; \varphi\in D(T)\; , \; \psi=T\varphi \}\; .$$ For all the details of the definition, see e.g. Kato's book of 1966 on perturbation theory of linear operators.
In $\mathcal{C}(H)$, we have thus a notion of convergence $T_n\to T$. Convergence in this sense (called by Kato "generalized sense") extends roughly speaking the convergence in norm of bounded operators. If the resolvent set $\varrho(T)$ of $T$ is not empty, the generalized convergence is equivalent to convergence in the norm resolvent sense, i.e. it is equivalent to the convergence in norm of the resolvents (as bounded operators): $$T_n\to_\mathrm{gen} T \Leftrightarrow (T_n-z)^{-1}\to_{\mathrm{norm}} (T-z)^{-1}\; ,\; \forall z\in \varrho(T)\; .$$ More precisely, there exists an $n^*\in \mathbb{N}$ such that $z\in \varrho(T_n)$ for any $n\geq n^*$, and the convergence of resolvents holds. Of course convergence in the generalized sense is equivalent to convergence in norm if the operators are bounded.
Nevertheless one has still a problem in defining the derivative, since as I remarked before it is not in general possible to sum two closed operators and obtain another closed operators. It is possible to give abstract conditions on (densely defined) $T$ and $S$ for them to densely define a closed operator $T+S$, see this paper. However as you may notice, things are getting messier and messier. Anyways, let $T_0\in \mathcal{C}(H)$ be a fixed densely defined closed operator. We denote by $\mathcal{C}_{T_0}(H)$ the set $$\mathcal{C}_{T_0}(H)=\{T\in \mathcal{C}(H), T-T_0\in \mathcal{C}(H)\}\; .$$ Remark that $\mathcal{C}_{T_0}(H)$ may as well be empty. Nevertheless, let now $\alpha:\mathbb{R}\to \mathcal{C}_{T_0}(H)$ for some $T_0$, and $\alpha(x)=T_0$. Then the derivative $\alpha'(x)$ can be defined in the usual way since $h^{-1}\Bigl(\alpha(x+h)-\alpha(x)\Bigr)$ is a closed operator: $$\alpha'(x)=\lim_{h\to 0}h^{-1}\Bigl(\alpha(x+h)-\alpha(x)\Bigr)\; ;$$ where the limit is intended in the generalized sense (provided it exists). However, we are still not assured that the derivative makes sense in another point $x'\neq x$, if $\alpha(x')\neq T_0$!
As a matter of fact, I actually never saw this construction applied in any concrete physical or mathematical problem, and maybe it is never used.
As a final remark, the derivative of functions with values in the continuous (bounded) linear operators are used very often. In this case, the derivative can be intended in any topology of the bounded operators, such as e.g. the norm topology (that would be equivalent to the construction above and the OP already noted); but also in the strong topology, or in the weak one. As a matter of fact, derivatives may sometimes exist in the strong or weak sense, but not in the norm sense.