Solved – Kalman filter vs Kalman Smoother for beta calculations

I am trying to calculate the beta of two timeseries by setting up a state-space model, calculating its covariances via the EM algorithm and finally running the kalman filter/smoother. From what I have read, I understand that using the kalman smoother might more sense for what I am trying to achieve since I am in a post-processing environment and not really looking for any real-time predictions.

However, when comparing the two approaches with a a few examples, there are cases where the smoother does seem to improve the beta numbers whereby it feels similar to removing the noise of the kalman filter values via a spline interpolation (see the first picture below) and other cases where the 'interpolation' is quite broad and seems to be 'hiding' a few of the underlying dynamics of the kalman filter results (please see the second picture for example)

Any thoughts on which is the best approach?

EDIT:

adding the difference between the values of the smoother and the filter, as expected the difference goes towards zero in the most recent observations and is actually zero in the latest one:

… -0.03943203, -0.01329412, -0.011849 , -0.01031422,
-0.01596532, -0.01822451, -0.00513093, 0.00208434, 0.00244347,
-0.0020279 , -0.00991458, -0.0046458 , 0. ])

Best Answer

They are not really different approaches in that they are solutions to different problems: one computes the sequence of filtering distributions $p(\beta_t|Y_{1:t})$, and the other the distributions based on all observations $p(\beta_t|Y_{1:T})$, for $t =1,...,T$.

The smoother doesn't "hide underlying dynamics" but rather adjusts its state estimate (with respect to the filter) to reflect the fact that new data has been observed; what "looked like" an increase in $\beta$ at time $t$ is now, on the basis of more accumulated evidence, believed to have been mostly observation noise and a much smaller move in $\beta$.

Which algorithm you should use depends on what you need it for. If you are really looking at this data purely retrospectively as you mention, then smoother is what you want. If you want to build a trading algorithm based on $\beta_t$ then you obviously need to use the filtered estimate in your backtesting because the smoothed one will not be available when you actually use the strategy (it depends on future data).

EDIT to answer additional questions from a comment:

Are you suggesting taking the output of the filter, and then forecasting the beta by fitting some model to it? You are already specifying a model for beta in your state space model (this is often a random walk but does not need to be), and you can forecast directly from this model. A two-step approximation where you first filter with simple linear dynamics and then forecast the filter output with a more complicated model (something that could not have been cast to linear state space form, say) could "work" in some cases I guess, but it would be at the very least logically inconsistent. If you went that route it would definitely need to be done on the filter output and tested for accuracy (the most meaningful way to do so being choosing the model which optimizes something tangible like the P&L on a trading strategy based on your beta forecast, since you can't actually observe the "true" beta).
Both of your beta estimates are implicitly conditional on the dynamics you have assumed for beta. If the dynamics of the filtered/smoothed estimates don't "make sense" then I would perhaps reconsider the state dynamics in the model. Additionally, you have to take into account uncertainty: the fact that the smoother beta increases monotonically for a period does not mean that the "true" beta necessarily did. It is simply your best guess based on the data that you have, which does not actually include beta. You should plot confidence bands so you can see what the smoother is really "saying" about beta. If the data is very noisy and doesn't contain a lot of information about beta, it would not make any sense for the smoother estimate to be "bumpy": that information is just not there.

Best Answer

Related Solutions

Related Question