For a single-input system the transformation that yields the controller canonical form is
$$T=\left[\matrix{q\\qA\\ \vdots\\qA^{n-1}}\right]$$
where $q$ is the last row of the controllability matrix inverse i.e.
$$\mathcal{C}^{-1}=\left[\matrix{X\\ \hline q}\right]$$
This property ensures that
$$qA^{i-1}b=\begin{cases}0,\quad i=1,\cdots,n-1\\ 1,\quad i=n \end{cases}$$
which can be used along with the Cayley-Hamilton theorem to prove that
$$Tb=\left[\matrix{qb \\ \vdots \\ qA^{n-2}b \\qA^{n-1}b}\right]=\left[\matrix{0 \\ \vdots \\ 0 \\1}\right]=\bar{B}$$
$$TA=\left[\matrix{qA \\ \vdots \\ qA^{n-1} \\qA^{n}}\right]=\left[\matrix{qA \\ \vdots \\ qA^{n-1} \\-q\sum_{i=1}^{n}a_{n-i+1}A^{i-1}}\right]=\left[\matrix{0 & 1 & 0& \cdots & 0\\ 0 & 0 & 1 & \cdots & 0\\ \vdots & \vdots & \vdots & \ddots & \vdots\\0 & 0 & 0 & \cdots & 1\\-a_n & -a_{n-1} & -a_{n-2}& \cdots & -a_1}\right]\left[\matrix{q \\ qA\\ \vdots \\ qA^{n-2} \\qA^{n-1}}\right]=\bar{A}T$$
For the multiple input case $B\in\mathbb{R}^{n\times m}$ the situation is more complex. The calculation involves the so called controllability indices $\mu_1,\mu_2,\cdots,\mu_m$ and $\bar{B}$ is of the form
$$\bar{B}=\left[\matrix{0 & 0 & 0 & \cdots & 0\\ \vdots & \vdots & \vdots & \ddots & \vdots\\0 & 0 & 0 & \cdots & 0\\ 1 & * & * & \cdots & *\\\hline 0 & 0 & 0 & \cdots & 0\\ \vdots & \vdots & \vdots & \ddots & \vdots\\0 & 0 & 0 & \cdots & 0\\ 0 & 1 & * & \cdots & *\\\hline \vdots & \vdots & \vdots & \ddots & \vdots\\\hline 0 & 0 & 0 & \cdots & 0\\ \vdots & \vdots & \vdots & \ddots & \vdots\\0 & 0 & 0 & \cdots & 0\\ 0 & 0 & 0 & \cdots & 1}\right] $$
where $*$ denotes a not necessarily zero element. The $m$ nonzero rows of $\bar{B}$ are the $\mu_1,\mu_1+\mu_2,\cdots,\mu_1+\mu_2+\cdots+\mu_m$ rows. For more details I suggest you to consult the book Antsaklis and Michel, "A linear systems primer" .
If your system is of the form
$$
\dot{x} = A\,x + B\,u + f \\
y = C\,x + D\,u + g \tag{1}
$$
with $f$ and $g$ constant vectors, then you can do a coordinate transformation $z=x+\alpha$ and $v=u+\beta$ with $\alpha$ and $\beta$ constant vectors which satisfies
$$
\begin{bmatrix}
A & B \\ C & D
\end{bmatrix}
\begin{bmatrix}
\alpha \\ \beta
\end{bmatrix} =
\begin{bmatrix}
f \\ g
\end{bmatrix}. \tag{2}
$$
This can always be solved if the $(A,B,C,D)$ matrix is full rank and if this is not the case when the vector of $(f,g)$ lies in the span of the $(A,B,C,D)$ matrix. After this transformation the dynamics will simply be
$$
\dot{z} = A\,z + B\,v \\
y = C\,z + D\,v. \tag{3}
$$
However if it is required that $u=f(y)$, with $f(0)=0$ and $f(-y)=-f(y)$, then this transformation will only work if the value found for $\beta$ is zero. If $\beta\neq0$ or the system of equation in $(2)$ is not solvable, then you could resort to extending the state space by one state $\xi$, whose time derivative is always zero. If the initial condition of $\xi$ is one, then the same dynamics as $(1)$ will be obtained when using the following extended state space model
$$
\begin{bmatrix}
\dot{x} \\ \dot{\xi}
\end{bmatrix} =
\begin{bmatrix}
A & f \\ 0 & 0
\end{bmatrix}
\begin{bmatrix}
x \\ \xi
\end{bmatrix} +
\begin{bmatrix}
B \\ 0
\end{bmatrix} u \\
y =
\begin{bmatrix}
C & g
\end{bmatrix}
\begin{bmatrix}
x \\ \xi
\end{bmatrix} + D\,u. \tag{4}
$$
Best Answer
Let's start by moving the summation terms all over to the right hand side.
$y(k) = -\sum_{i=1}^n a_i y(k-i) + \sum_{i=1}^n b_i u(k-i)$
Now, let's define $\mathbf{x_k} = \left(\begin{array}{c} y(k-1) \\ y(k-2) \\ \vdots \\ y(k-n) \end{array}\right)$, and $\mathbf{u_k} = \left(\begin{array}{c} u(k-1) \\ u(k-2) \\ \vdots \\ u(k-n) \end{array}\right)$.
Then, $\mathbf{x_{k+1}} = \mathbf{Ax_k}+\mathbf{Bu_k}$, where $\mathbf{A} = \rm{diag}(-a_1,-a_2,\ldots,-a_n)$, and $\mathbf{B} = \rm{diag}(b_1,b_2,\ldots,b_n)$.
Finally, $y(k) = \left(1\ 1\ \cdots\ 1\right)\mathbf{x_k}$.