Let me first say that I think Tobias Kienzler has done a great job of discussing the intuition behind your question in going from finite to infinite dimensions.
I'll, instead, attempt to address the mathematical content of Jackson's statements. My basic claim will be that
Whether you are working in finite or infinite dimension, writing the Schrodinger equation in a specific basis only involves making definitions.
To see this clearly without having to worry about possible mathematical subtleties, let's first consider
Finite dimension
In this case, we can be certain that there exists an orthnormal basis $\{|n\rangle\}_{n=1, \dots N}$ for the Hilbert space $\mathcal H$. Now for any state $|\psi(t)\rangle$ we define the so-called matrix elements of the state and Hamiltonian as follows:
\begin{align}
\psi_n(t) = \langle n|\psi(t)\rangle, \qquad H_{mn} = \langle m|H|n\rangle
\end{align}
Now take the inner product of both sides of the Schrodinger equation with $\langle m|$, and use linearity of the inner product and derivative to write
\begin{align}
\langle m|\frac{d}{dt}|\psi(t)\rangle=\frac{d}{dt}\langle m|\psi(t)\rangle=\frac{d\psi_m}{dt}(t)
\end{align}
The fact that our basis is orthonormal tells us that we have the resolution of the indentity
\begin{align}
I = \sum_{m=1}^N|m\rangle\langle m|
\end{align}
So that after taking the inner product with $\langle m|$, the write hand side of Schrodinger's equation can be written as follows:
\begin{align}
\langle m|H|\psi(t)\rangle
= \sum_{m=1}^N\langle n|H|m\rangle\langle m|\psi(t)\rangle
= \sum_{m=1}^N H_{nm}\psi_m(t)
\end{align}
Equating putting this all together gives the Schrodinger equation in the $\{|n\rangle\}$ basis;
\begin{align}
\frac{d\psi_n}{dt}(t) = \sum_{m=1}^NH_{nm}\psi_m(t)
\end{align}
Infinite dimension
With an infinite number of dimensions, we can choose to write the Schrodinger equation either in a discrete (countable) basis for the Hilbert space $\mathcal H$, which always exists by the way since quantum mechanical Hilbert spaces all possess a countable, orthonormal basis, or we can choose a continuous "basis" like the position "basis" in which to write the equation. I put basis in quotes here because the position space wavefunctions are not actually elements of the Hilbert space since they are not square-integrable functions.
In the case of a countable orthonormal basis, the computation performed above for writing the Schodinger equation in a basis follows through in precisely the same way with the replacement of $N$ with $\infty$ everywhere.
In the case of the "basis" $\{|x\rangle\rangle_{x\in\mathbb R}$, the computation above carries through almost in the exact same way (as your question essentially shows), except the definitions we made in the beginning change slightly. In particular, we define functions $\psi:\mathbb R^2\to\mathbb C$ and $h:\mathbb R^2\to\mathbb C$ by
\begin{align}
\psi(x,t) = \langle x|\psi(t)\rangle, \qquad h(x,x') = \langle x|H|x'\rangle
\end{align}
Then the position space representation of the Schrodinger equation follows by taking the inner product of both sides of the equation with $\langle x|$ and using the resolution of the identity
\begin{align}
I = \int_{-\infty}^\infty dx'\, |x'\rangle\langle x'|
\end{align}
The only real mathematical subtleties you have to worry about in this case are exactly what sorts of objects the symbols $|x\rangle$ represent (since they are not in the Hilbert space) and in what sense one can write a resolution of the identity for such objects. But once you have taken care of these issues, the conversion of the Schrodinger equation into its expression in a particular "representation" is just a matter of making the appropriate definitions.
Although we can define the momentum as a self-adjoint operator in $L^2[0,1]$ as you proposed, I think it's rather artificial to think about it as having relation to momentum in the case of $L^2(\Bbb{R})$. Realize that the operator $p_1$ with domain
$D(p_1)=\{\psi\in\mathcal{H}^1[0,1]\,|\,\psi(1)=\psi(0)\}$, is related to spatial translations via the unitary group $U(t)=\exp(-itp_1)$, whose action is
$$(U(t)\psi)(x)=\psi[x-t\pmod{1}],$$
so it's about a particle in a torus, not in an infinite square-well. Different values of $\alpha$, just give different phases to the wavefunction when it reaches the border and goes to the other side.
So, in my opinion, these operators are not actually related to the momentum as usually conceived. The idea of an infinite square-well does not allow spatial translations, so there's no self adjoint operator associated to a unitary translation group in this case. This happens for example in the case of a particle in the postive real line $\Bbb{R}_+$. In this case, the space $L^2[0,\infty)$ allows only translations to the right, not to the left, so you can not have a self-adjoint operator associated to a unitary group of translations. In this case, the operator $p=-i\dfrac{d}{dx}$ has no self-adjoint extensions, for any initial domain, although it is symmetric. For a particle in a box, we can think in the same way. There's no operator associated to spatial translations, because there is no spatial translations allowed.
It's also important to note that the hamiltonian $H$ in this case is given by the Friedrich extension of $$p_0^2=-\frac{d^2}{dx^2}\\
\mathcal{D}(p_0^2)=\{\psi\in\mathcal{H}^2[0,1]\,|\,\psi(0)=\psi'(0)=0=\psi'(1)=\psi(1)\}$$
$H$ cannot be the square of any $p_\alpha$, since the domains do not match.
Edit: As pointed out by @jjcale, one way to take the momentum in this case should be $p=\sqrt{H}$, but clearly, the action of $p$ can't be a derivative, because it has the same eigenfunctions of $H$, which are of the form $\psi_k(x)=\sin \pi kx$. This ilustrates the fact that it's not related to spatial translations as stated above.
Edit 2: There's is a proof that the Friedrich extension is the one with Dirichilet boundary conditions in Simon's Vol. II, section X.3.
The domains defined by the spectral theorem are indeed $\{\psi: p_\alpha\psi\in\mathcal{D}(p_\alpha)\}$. To see this, realize that in this case, since the spectrum is purely point, by the spectral theorem, we have
$$p_\alpha=\sum_{n\in \Bbb{Z}}\lambda_{\alpha,n}P_n,$$
where $\lambda_{\alpha,n}$ are the eigenvalues associated to the normalized eigenvectors $\psi_{\alpha,n}$, and $P_n=\psi_n\langle\psi_n,\cdot\rangle$ are the projections in each eigenspace. The domain $\mathcal{D}(p_\alpha)$ is then given by the vectors $\xi$, such that
$$\sum_{n\in \Bbb{Z}}|\lambda_{\alpha,n}|^2\|P_n\xi\|^2=\sum_{n\in \Bbb{Z}}|\lambda_{\alpha,n}|^2|\langle\psi_n,\xi\rangle|^2<+\infty$$
Also, $\xi\in\mathcal{D}(p_\alpha^2)$ iff
$$\sum_{n\in \Bbb{Z}}|\lambda_{\alpha,n}|^4\|P_n\xi\|^2=\sum_{n\in \Bbb{Z}}|\lambda_{\alpha,n}|^4|\langle\psi_n,\xi\rangle|^2<+\infty$$
But then, $p_\alpha\xi$ is such
$$\sum_{n\in \Bbb{Z}}|\lambda_{\alpha,n}|^2\|P_np_\alpha\xi\|^2=\sum_{n\in \Bbb{Z}}|\lambda_{\alpha,n}|^2|\langle\psi_n,p_\alpha\xi\rangle|^2=
\sum_{n\in \Bbb{Z}}|\lambda_{\alpha,n}|^2|\langle p_\alpha\psi_n,\xi\rangle|^2=
\sum_{n\in \Bbb{Z}}|\lambda_{\alpha,n}|^4|\langle\psi_n,\xi\rangle|^2<+\infty$$
So, $\mathcal{D}(p_\alpha^2)= \{\psi: p_\alpha\psi\in\mathcal{D}(p_\alpha)\}$.
Best Answer
$\psi$ is a curve through $\mathcal H:=L^2(\mathbb R^{Nd})$, not through $\mathbb R^{Nd}$ itself. That is, for each $t\in \mathbb R$ we have that $\psi(t)\in L^2(\mathbb R^{Nd})$ is loosely$^\ddagger$ a square-integrable function.
You must be careful to distinguish between $\psi$, which is a curve through $\mathcal H$, and $\psi(t)$, which is the element of $\mathcal H$ the curve passes through at time $t$. $\psi$ is a function of one variable which eats a time $t$ and spits out the vector $\psi(t)$, which is itself a square-integrable function. It wouldn't make any sense for $\psi$ itself to accept a position as an input - what would that even mean? Instead, it is $\psi(t)$ - which is often interpreted as a function of position (i.e. the position-space wavefunction) - which is able to take position as an input variable.
To make this explicit, we might use notation like $\big[\psi(t)\big](\mathbf x)$, which makes it clear that $t\in \mathbb R$ is a number which we plug into $\psi$ to get a function $\psi(t)\in L^2(\mathbb R^{Nd})$, and $\mathbf x\in \mathbb R^{Nd}$ is a vector we plug into $\psi(t)$ to get a number $\big[\psi(t)\big](\mathbf x)\in \mathbb C$. This notation is the stuff of nightmares, so I personally prefer the symbol $\psi_t(\mathbf x)$ instead.
In most of the pedagogical literature, authors tend to sweep this discussion under the rug and simply write $\psi(t,\mathbf x)$; however, my view is that this obscures the distinction between space and time which occurs in non-relativistic quantum mechanics, and leads to deep misunderstandings.
When we write down the Schrodinger equation, we need to interpret the terms carefully. In my preferred notation:
$$\longrightarrow i \hbar\psi'_t = H\big(\psi_t\big)$$
$^\ddagger$Really an equivalence class of functions, where we identify two functions $f$ and $g$ as the same element of $L^2(\mathbb R^{Nd})$ if $\int \mathrm d^{Nd}x |f(x)-g(x)|^2 = 0$.