Solved – How to model a biased coin with time varying bias

bayesiankalman filtertime series

Models of biased coins typically have one parameter $\theta = P(\text{Head} | \theta)$.
One way to estimate $\theta$ from a series of draws is to use a beta prior and compute posterior distribution with binomial likelihood.

In my settings, because of some weird physical process, my coin properties are slowly changing and $\theta$ becomes a function of time $t$.
My data is a set of ordered draws i.e. $\{H,T,H,H,H,T,…\}$. I can consider that I have only one draw for each $t$ on a discrete and regular time grid.

How would you model this? I'm thinking of something like a Kalman filter adapted to the fact that hidden variable is $\theta$ and keeping the binomial likelihood. What could I use to model $P(\theta(t+1)|\theta(t))$ to keep inference tractable?

Edit following answers (thanks!): I would like to model $\theta(t)$ as a Markov Chain of order 1 like it is done in HMM or Kalman filters. The only assumption I can make is that $\theta(t)$ is smooth. I could write $P(\theta(t+1)|\theta(t)) = \theta(t) + \epsilon$ with $\epsilon$ a small Gaussian noise (Kalman filter idea), but this would break the requirement that $\theta$ must remain in $[0,1]$. Following idea from @J Dav, I could use a probit function to map the real line to $[0,1]$, but I have the intuition that this would give a non-analytical solution. A beta distribution with mean $\theta(t) $ and a wider variance could do the trick.

I'm asking this question since I have the feeling that this problem is so simple that it must have been studied before.

Best Answer

I doubt you can come up with a model with analytic solution, but the inference can still be made tractable using right tools as the dependency structure of your model is simple. As a machine learning researcher, I would prefer using the following model as the inference can be made pretty efficient using the technique of Expectation Propagation:

Let $X(t)$ be the outcome of $t$-th trial. Let us define the time-varying parameter

$\eta(t+1) \sim \mathcal{N}(\eta(t), \tau^2)$ for $t \geq 0$.

To link $\eta(t)$ with $X(t)$, introduce latent variables

$Y(t) \sim \mathcal{N}(\eta(t), \beta^2)$,

and model $X(t)$ to be

$X(t) = 1$ if $Y(t) \geq 0$, and $X(t) = 0$ otherwise. You can actually ignore $Y(t)$'s and marginalize them out to just say $\mathbb{P}[X(t)=1] = \Phi(\eta(t)/\beta)$, (with $\Phi$ cdf of standard normal) but the introduction of latent variables makes inference easy. Also, note that in your original parametrization $\theta(t) = \eta(t)/\beta$.

If you are interested in implementing the inference algorithm, take a look at this paper. They use a very similar model so you can easily adapt the algorithm. To understand EP the following page may found useful. If you are interested in pursuing this approach let me know; I can provide more detailed advice on how to implement the inference algorithm.