In practise: I'd give Extended Kalman filter a try at first and validate if the filter works well enough for the given problem. If it doesn't, I'd try Unscented Kalman filter or particle filters.
Extended Kalman filter solves the nonlinear estimation problem by linearising state and/or measurement equations and applying the standard Kalman filter formulas to the resulting linear estimation problem.
The linearisation yields to approximation errors which the filter doesn't take into account in the prediction/update steps. Therefore Extended Kalman filter's error estimates tend to underestimate state uncertainties.
In comparison, Unscented Kalman filter picks so called sigma point samples from the filtering distribution and propagates/updates them through the (nonlinear) state and measurement models. The resulting weighted set of sigma points represents now the updated filtering distribution, which, is then approximated as a moment matched Gaussian distribution. This results state estimates which represent the state uncertainty better than the estimates obtained from the EKF with an increased computational cost.
The biggest issue with dealing with rotation in an EKF is the fact that (so far as I know) there is no sensible way to define the rotation in a vector space. Euler angles, Quaternions and rotation matrices are not vectors, but instead form groups.
This is a problem for the EKF because it assumes that all state variables are vectors. Think, for example, how you would form a covariance matrix about a quaternion. A covariance matrix is defined as:
$$ E[(x-\hat{x})(x-\hat{x})^\top]$$
But, in the case of a quaternion, what does $(q-\hat{q})$ even mean? It's no longer a unit quaternion. Same thing goes for rotation matrices. Euler angles are a little more subtle, but similar logic applies (if you subtract two "vectors" of euler angles, what does the result really mean?)
Given a linearization, euler angles actually work okay. Here's an example of a quadrotor EKF using euler angles as the orientation states Quadrotor Dynamics and Control -Beard. Obviously, euler angles have issues with gimbal lock that this source doesn't address, and euler angles are extremely computationally inefficient due to all that trigonometry, but it's meant as an introductory kalman filter implementation. If you're just getting started, it might make sense to start here.
However, to get around these problems, some smart people in spacecraft attitude estimation worked on this and came up with the "Multiplicative" EKF. In the MEKF, you can think of the covariance being defined with respect to an "error vector" about some quaternion reference state. You propagate the quaternion using standard quaternion dynamics, but then update the quaternion and define the covariance using a 3-vector parameterized as a gibbs vector or rodriguez parameters. Multiplicative vs Additive Filtering for spacecraft attitude determination - Markley. The gibbs vector actually is a vector, but it has a singularity, so it's only good for small rotations. Therefore you need a reference quaternion, and you define small "errors" to that reference with a vector. The "multiplicative" term comes from the fact that you end up "multiplying" this error vector into the reference quaternion, rather than add it with simple vector addition like you would all the other states.
This practice of dealing with non-vector components of filter states has recently been a little bit better formalized with some new notation that makes it more natural and cleans up the syntax significantly Integrating Generic Sensor Fusion Algorithms with Sound State Representations through Encapsulation of Manifolds -Hertzberg et al.. This avoids the use of error state variables, and allows you to "pretend" like you are dealing with vector states rather than these group objects (such as quaternions). The paper I linked has an example UKF implementation using quaternions, which could serve as a decent basis for your implementation.
Best Answer
I suspect that the paper might have a typo in those equations. Namely, instead of $\mathcal{A}_{x_r}^{\circ}$ and $\mathcal{Q}_{x_r}^{\circ}$ I think one is supposed to use $\mathcal{A}_{v_r}^{\circ}$ and $\mathcal{Q}_{v_r}^{\circ}$ respectively. However, when using this and substituting $\tau_r = \mathcal{P}_{v_r}\!(x_r)$ in $x_r = \mathcal{U}_{v_r}\!(\tau_r)$ I get that
$$ \mathcal{U}_{v_r}\!(\tau_r) = x_r \frac{|v_r^\top x_r|}{v_r^\top x_r} = \pm x_r. $$
This would also make more sense for the notation of $\mathcal{U}_{v_r}\!(\tau_r)$, which otherwise would also be directly a function of $x_r$, instead of just $v_r$ and $\tau_r$. It can be noted that negating a unit quaternion represents the same attitude. Therefore, this possible change in sign might not matter for the rest of the algorithm.
This sign issue doesn't happen when one keeps using $\mathcal{A}_{x_r}^{\circ}$ and $\mathcal{Q}_{x_r}^{\circ}$, but that would indeed yield $\tau_r = 0$ which is quite a boring projection. However, in order to be completely sure about whether this is indeed a typo you could try to contact one of the authors.