Hamilton shows that this is a correct representation in the book, but the approach may seem a bit counterintuitive. Let me therefore first give a high-level answer that motivates his modeling choice and then elaborate a bit on his derivation.
Motivation:
As should become clear from reading Chapter 13, there are many ways to write a dynamic model in state space form. We should therefore ask why Hamilton chose this particular representation. The reason is that that this representation keeps the dimensionality of the state vector low. Intuitively, you would think (or at least I would) that the state vector for an ARMA($p$,$q$) needs to be at least of dimension $p+q$. After all, just from observing say $y_{t-1}$, we cannot infer the value of $\epsilon_{t-1}$. Yet he shows that we can define the state-space representation in a clever way that leaves the state vector of dimension of at most $r = \max\{p, q + 1 \}$. Keeping the state dimensionality low may be important for the computational implementation, I guess. It turns out that his state-space representation also offers a nice interpretation of an ARMA process: the unobserved state is an AR($p$), while the MA($q$) part arises due to measurement error.
Derivation:
Now for the derivation. First note that, using lag operator notation, the ARMA(p,q) is defined as:
$$
(1-\phi_1L - \ldots - \phi_rL^r)(y_t - \mu) =(1 + \theta_1L + \ldots + \theta_{r-1}L^{r-1})\epsilon_t
$$
where we let $\phi_j = 0$ for $j>p$, and $\theta_j = 0$ for $j>q$ and we omit $\theta_r$ since $r$ is at least $q+1$. So all we need to show is that his state and observation equations imply the equation above. Let the state vector be
$$
\mathbf{\xi}_t = \{\xi_{1,t}, \xi_{2,t},\ldots,\xi_{r,t}\}^\top
$$
Now look at the state equation. You can check that equations $2$ to $r$ simply move the entries $\xi_{i,t}$ to $\xi_{i-1,t+1}$ one period ahead and discard $\xi_{r,t}$ in the state vector at $t+1$. The first equation, defining $\xi_{i,t+1}$ is therefore the relevant one. Writing it out:
$$
\xi_{1,t+1} = \phi_1 \xi_{1,t} + \phi_2 \xi_{2,t} + \ldots + \phi_r \xi_{r,t} + \epsilon_{t+1}
$$
Since the second element of $\mathbf{\xi_{t}}$ is the first element of $\mathbf{\xi_{t-1}}$ and the third element of the $\mathbf{\xi_{t}}$ is the first element of $\mathbf{\xi_{t-2}}$ and so on, we can rewrite this, using lag operator notation and moving the lag polynomial to the left hand side (equation 13.1.24 in H.):
$$
(1-\phi_1L - \ldots - \phi_rL^r)\xi_{1,t+1} = \epsilon_{t+1}
$$
So the hidden state follows an autoregressive process. Similarly, the observation equation is
$$
y_t = \mu + \xi_{1,t} + \theta_1\xi_{2,t} + \ldots + \theta_{r-1}\xi_{r-1,t}
$$
or
$$
y_t - \mu = (1 + \theta_1L + \ldots + \theta_{r-1}L^{r-1})\xi_{1,t}
$$
This does not look much like an ARMA so far, but now comes the nice part: multiply the last equation by $(1-\phi_1L - \ldots - \phi_rL^r)$:
$$
(1-\phi_1L - \ldots - \phi_rL^r)(y_t - \mu) = (1 + \theta_1L + \ldots + \theta_{r-1}L^{r-1})(1-\phi_1L - \ldots - \phi_rL^r)y_t
$$
But from the state equation (lagged by one period), we have $(1-\phi_1L - \ldots - \phi_rL^r)\xi_{1,t} = \epsilon_{t}$! So the above is equivalent to
$$
(1-\phi_1L - \ldots - \phi_rL^r)(y_t - \mu) = (1 + \theta_1L + \ldots + \theta_{r-1}L^{r-1})\epsilon_{t}
$$
which is exactly what we needed to show! So the state-observation system correctly represents the ARMA(p,q). I was really just paraphrasing Hamilton, but I hope that this is useful anyway.
State space form originated in control theory. There, the usual representation is similar to that shown in your first two equations, with additional terms for controller outputs that are inputs to the process. The equation propagating the state X from the previous state is called the state equation (your second equation), and the relationship of the measurement Y to the state is called the measurement equation (your first equation). (There are differences from the above in how the equations are typically written. By convention, V is the measurement noise and W is the process noise, the opposite of what's shown above. And the matrix shown as G above is, by convention, called H. But none of that matters other than avoiding confusion in discussions with others.) X is an internal representation. Y is what is measured. One typical goal, as in the case of Kalman filtering, is to take the measurements Y, and estimate the state X. The emphasis on estimating the state X is because with the state equation, predictions about the future can be made, and hence predictions of Y follow as well.
The system representation does not change when the system happens to achieve a steady state.
At steady state, by definition, the state X is not changing over time. That implies that F cannot not be changing over time either, because if F were to change, then X would have to change by the state equation. Subscripts referencing time are not relevant when solving for steady state solutions. So, at steady state, the state equation equation just reduces to
X = FX
(At a steady state, the process noise would have to be zero as well, otherwise by the state equation, process noise would change X).
For a useful steady state solution, you normally have another set of inputs, as is typical for process control. Otherwise you're left with the only solutions being X = 0, unless F is the identity matrix.
Normal definitions of a steady state system would include that the entire system is not changing, so that the measurement matrix (G in your notation) should be constant as well. I suppose that if you're differentiating a steady state system from a steady state X, then you could allow the measurement matrix to change over time. But your first equation still wouldn't change.
If I'm understanding your third equation, it says that Y at time t depends on X at time t, given the X values up to time t-1. The reference to t-1 is not needed there, whether at steady state or not. (With a similar comment about t+1|t and t|t-1 in the fourth equation). In state space representation, X always incorporates all previously known information --- the "t-1" is implied. Also, the state equation also allows you to always predict forward in time as far as you want, in terms of expected value (that is, assuming no process noise). Similarly, if F is invertible, you can reconstruct previous states (assuming no process noise). These issues are central to state space representation and its applications, because it means that for instance, for estimation, you never need to look backward further in time once you have an estimate of X at some time t.
So, in short, the third and fourth equations aren't needed, and the t|t-1 references aren't needed, if you're just talking about state space representation.
Sometimes you see references like the t|t-1 when looking at particular estimation solutions. For instance, Kalman filtering has two steps: a prediction from the previous state just using the state equation (ignoring measurements), and then a correction step that accounts for the measurements, balancing the measurement noise against the process noise. When differentiating between those two steps, sometimes that sort of notation is used to distinguish the intermediate prediction from the final result.
But all of that is a separate issue from making a steady state assumption.
Best Answer
One way to do it is to define the state vector as $$ \xi_t = \begin{pmatrix} y_t \\ y_{t-1} \\ w_{t} \\ w_{t-1} \\ 1 \\ \end{pmatrix} $$ The measurement equation is just $$ y_t = \begin{pmatrix} 1 & 0 & 0 & 0 & 0 \end{pmatrix} \, \xi_t $$ i.e. there is no noise term. The state transition equation is then $$ \underbrace{\begin{pmatrix} y_t \\ y_{t-1} \\ w_{t} \\ w_{t-1} \\ 1 \\ \end{pmatrix}}_{\xi_t} = \begin{pmatrix} \alpha_1 & \alpha_2 & \theta_1 & \theta_2 & \beta_0+\beta_1 x_{t-1} \\ 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 \\ \end{pmatrix} \underbrace{\begin{pmatrix} y_{t-1} \\ y_{t-2} \\ w_{t-1} \\ w_{t-2} \\ 1 \\ \end{pmatrix}}_{\xi_{t-1}} + \begin{pmatrix} 1 \\ 0 \\ 1 \\ 0 \\ 0 \\ \end{pmatrix} w_t $$