The biggest issue with dealing with rotation in an EKF is the fact that (so far as I know) there is no sensible way to define the rotation in a vector space. Euler angles, Quaternions and rotation matrices are not vectors, but instead form groups.
This is a problem for the EKF because it assumes that all state variables are vectors. Think, for example, how you would form a covariance matrix about a quaternion. A covariance matrix is defined as:
$$ E[(x-\hat{x})(x-\hat{x})^\top]$$
But, in the case of a quaternion, what does $(q-\hat{q})$ even mean? It's no longer a unit quaternion. Same thing goes for rotation matrices. Euler angles are a little more subtle, but similar logic applies (if you subtract two "vectors" of euler angles, what does the result really mean?)
Given a linearization, euler angles actually work okay. Here's an example of a quadrotor EKF using euler angles as the orientation states Quadrotor Dynamics and Control -Beard. Obviously, euler angles have issues with gimbal lock that this source doesn't address, and euler angles are extremely computationally inefficient due to all that trigonometry, but it's meant as an introductory kalman filter implementation. If you're just getting started, it might make sense to start here.
However, to get around these problems, some smart people in spacecraft attitude estimation worked on this and came up with the "Multiplicative" EKF. In the MEKF, you can think of the covariance being defined with respect to an "error vector" about some quaternion reference state. You propagate the quaternion using standard quaternion dynamics, but then update the quaternion and define the covariance using a 3-vector parameterized as a gibbs vector or rodriguez parameters. Multiplicative vs Additive Filtering for spacecraft attitude determination - Markley. The gibbs vector actually is a vector, but it has a singularity, so it's only good for small rotations. Therefore you need a reference quaternion, and you define small "errors" to that reference with a vector. The "multiplicative" term comes from the fact that you end up "multiplying" this error vector into the reference quaternion, rather than add it with simple vector addition like you would all the other states.
This practice of dealing with non-vector components of filter states has recently been a little bit better formalized with some new notation that makes it more natural and cleans up the syntax significantly Integrating Generic Sensor Fusion Algorithms with Sound State Representations through Encapsulation of Manifolds -Hertzberg et al.. This avoids the use of error state variables, and allows you to "pretend" like you are dealing with vector states rather than these group objects (such as quaternions). The paper I linked has an example UKF implementation using quaternions, which could serve as a decent basis for your implementation.
Best Answer
Unit quaternions are great for parameterizing rotation in 3-D space, but trying to estimate them directly in a conventional Kalman filter setting can be tricky. This is because unit quaternions are constrained to live on the unit sphere in 4-D space ($S^3 \subset \mathbb{R}^4$). Hence, their probability density function (pdf) is restricted to the surface of the unit sphere. If one uses a Gaussian distribution to parameterize the pdf (as is done in a Kalman filter), the expectation conditioned on the measurements will lie inside the unit sphere and hence by definition will not be a unit quaternion. In addition the covariance matrix will shrink in the directions orthogonal to the surface of the unit sphere, which leads to a singular covariance matrix after several updates. This conceptual problem is explained in more detail in the references linked below. In order to circumvent this estimation problem, a common engineering practice is to represent the true orientation ($\pmb{q}$) as a small deviation from a reference orientation ($\bar{\pmb{q}}$) as:
$$ \pmb{q} = \bar{\pmb{q}} \oplus \pmb{\delta} (\pmb{e}) $$
The deviation $\pmb{\delta} \in S^3$ can be approximately parameterized by an error vector $\pmb{e} \in \mathbb{R}^3$ as:
$$ \pmb{\delta} \approx \begin{bmatrix} 1 & \frac{\pmb{e}}{2}\end{bmatrix}^T $$
For small orientation deviations, this approximation is good upto the second order. The idea then is to compute an estimate of the error vector $\hat{\pmb{e}}$ within the Kalman filter while simultaneously and separately propagating the reference quaternion through the numerical integration of:
$$\dot{\bar{\pmb{q}}} = \frac{1}{2} \cdot \bar{\pmb{q}} \oplus \begin{bmatrix} 0 \\ \bar{\pmb{\omega}} \end{bmatrix} $$
For this diff equation if we can assume that the reference angular velocity ($\bar{\pmb{\omega}}$) remains constant during the sample time, the discrete equivalent is:
$$ \bar{\pmb{q}}_k = \bar{\pmb{q}}_{k-1} \oplus \left[ \begin{matrix} cos(||\pmb{\omega}_{k-1}|| \frac{\Delta t}{2}) \\ \frac{\pmb{\omega}_{k-1}}{||\pmb{\omega}_{k-1}||} \cdot sin(||\pmb{\omega}_{k-1}||\frac{\Delta t}{2}) \end{matrix} \right] $$
The propagation dynamics for the error state can be shown to be linear (approximately) and is given by:
$$\dot{\pmb{e}} = \pmb{F}\pmb{e} + \pmb{G}\pmb{\eta}$$
where,
$\pmb{\eta} = \pmb{\omega} - \bar{\pmb{\omega}} $ - Error angular velocity assumed to be a white noise process with spectral density matrix $Q$
$\pmb{F} = - \left[ \bar{\pmb{\omega}} \times \right]$
$\pmb{G} = \pmb{I}$
Derivations of the propagation dynamics and the matrices $\pmb{F}$ and $\pmb{G}$ can be found in the references given below.
The covariance propagation equation is:
$$\dot{\pmb{P}}_e = \pmb{F}\pmb{P}_e + \pmb{P}_e\pmb{F}^T + \pmb{G}\pmb{Q}\pmb{G}^T$$
It is also worth noting that when $\pmb{e} = \pmb{0}$, then $\pmb{\delta} (\pmb{e})$ is the identity quaternion. Thus, after each measurement update, the error vector $\pmb{e}$ can be reset to zero by updating the reference quaternion as:
$$\bar{\pmb{q}}^+_k = \bar{\pmb{q}}^-_k \oplus \pmb{\delta} (\hat{\pmb{e}}_k)$$
Hope this helps!
References: