There are different approaches to initialize the Q matrix.
First off all let's make it clear for the initialization example in your question.
\begin{equation}
Q = q_{0}\Delta t
\begin{bmatrix}
\Delta t^{2}/3 & \Delta t/2 \\
\Delta t/2 & 1
\end{bmatrix}
\end{equation}
This approach is well explained in Kalman and Bayesian Filters in Python (Chapter 7.3.1 Continuous White Noise Model).
It works for systems with state vectors containing derivatives such as position, velocity, acceleration, jerk... The last element in such a sequence is meant to stay constant during the prediction step. The system noise is modelled as a continuous white noise $Q_c$ with a spectral density $q_0$ and should correct this modelling issue (that's why the spectral density is applied only to the right bottom element). For a state vector containing position and velocity it would look like
\begin{equation}
Q_c =
\begin{bmatrix}
0 & 0\\
0 & 1\\
\end{bmatrix}
q_{0}
\end{equation}
To define the amount of noise for a discrete time slot $\Delta t$ the continuous noise should be integrated over the interval $[0 .. \Delta t]$ :
\begin{equation}
Q=\int_{0}^{\Delta t} F Q_c F^{T} dt = q_0
\begin{bmatrix}
\Delta t^{3}/3 & \Delta t^{2}/2\\
\Delta t^{2}/2 & \Delta t\\
\end{bmatrix}
\end{equation}
F is the transition matrix
\begin{equation}
F =
\begin{bmatrix}
1 & \Delta t\\
0 & 1\\
\end{bmatrix}
\end{equation}
To use this approach you need to make sure your state elements are in a right order.
If your state vector is
\begin{equation}
x =
\begin{bmatrix}
x \\
vx \\
y \\
vy
\end{bmatrix}
\end{equation}
then Q would be
\begin{equation}
Q = q_{0}
\begin{bmatrix}
\Delta t^{3}/3 & \Delta t^{2}/2 & 0 & 0\\
\Delta t^{2}/2 & \Delta t & 0 & 0\\
0 & 0 & \Delta t^{3}/3 & \Delta t^{2}/2\\
0 & 0 & \Delta t^{2}/2 & \Delta t
\end{bmatrix}
\end{equation}
So you need to sort the elements of Q corresponding to your state vector.
Regarding the spectral density: the exact value is usually unknown and is used to tune the system, til the performance fits to your expectations.
Another approach
Another easier intuition when dealing with Q is to interpret its elements as uncertainty being added to your state elements during the prediction step. For each element of the state vector ask yourself what is the biggest prediction error when using the transition matrix F. Assuming that the error is normally distributed you can use the 3-sigma rule. Sigma squared gives you the corresponding diagonal element of the Q matrix.
\begin{equation}
Q_{ii} = (max(\varepsilon_{i}) /3)^{2}
\end{equation}
The better the model, the smaller the prediction error, and the smaller the Q-element.
Best Answer
The biggest issue with dealing with rotation in an EKF is the fact that (so far as I know) there is no sensible way to define the rotation in a vector space. Euler angles, Quaternions and rotation matrices are not vectors, but instead form groups.
This is a problem for the EKF because it assumes that all state variables are vectors. Think, for example, how you would form a covariance matrix about a quaternion. A covariance matrix is defined as:
$$ E[(x-\hat{x})(x-\hat{x})^\top]$$
But, in the case of a quaternion, what does $(q-\hat{q})$ even mean? It's no longer a unit quaternion. Same thing goes for rotation matrices. Euler angles are a little more subtle, but similar logic applies (if you subtract two "vectors" of euler angles, what does the result really mean?)
Given a linearization, euler angles actually work okay. Here's an example of a quadrotor EKF using euler angles as the orientation states Quadrotor Dynamics and Control -Beard. Obviously, euler angles have issues with gimbal lock that this source doesn't address, and euler angles are extremely computationally inefficient due to all that trigonometry, but it's meant as an introductory kalman filter implementation. If you're just getting started, it might make sense to start here.
However, to get around these problems, some smart people in spacecraft attitude estimation worked on this and came up with the "Multiplicative" EKF. In the MEKF, you can think of the covariance being defined with respect to an "error vector" about some quaternion reference state. You propagate the quaternion using standard quaternion dynamics, but then update the quaternion and define the covariance using a 3-vector parameterized as a gibbs vector or rodriguez parameters. Multiplicative vs Additive Filtering for spacecraft attitude determination - Markley. The gibbs vector actually is a vector, but it has a singularity, so it's only good for small rotations. Therefore you need a reference quaternion, and you define small "errors" to that reference with a vector. The "multiplicative" term comes from the fact that you end up "multiplying" this error vector into the reference quaternion, rather than add it with simple vector addition like you would all the other states.
This practice of dealing with non-vector components of filter states has recently been a little bit better formalized with some new notation that makes it more natural and cleans up the syntax significantly Integrating Generic Sensor Fusion Algorithms with Sound State Representations through Encapsulation of Manifolds -Hertzberg et al.. This avoids the use of error state variables, and allows you to "pretend" like you are dealing with vector states rather than these group objects (such as quaternions). The paper I linked has an example UKF implementation using quaternions, which could serve as a decent basis for your implementation.