It is hard to tell if your problem is due to mistakes inyour implementation or if it is about the actual estimation problem. We don't have any working code to inspect. I implemented a working example for you -- see code below. I have also made an analysis of your problem, as described below.
The code implements Metropolis-Hastings with Brownian motion proposal. I tinkered quite some time with initialization and step size. The solution is very much dependant on these values. In the code, they are called u
and step size
. The values in the code currently comes from a kind of manual annealing burn in -- I started with a wild guess for u
and large step_size
. At the end of that run, I wrote down the final u
, and restarted the whole process at that u
, but with smaller step_size
. And repeated again and again.
In the early runs, the acceptance rate is ca 0.5%. By the end, it is ca 50%. Reducing the step size further makes the samples autocorrelated, so I stopped here with the parameters in the code below.
The reason why this kind of annealing scheme is needed is because of the nasty periodic structure of the likliehood with sharp local maxima. I plotted the log likelihood and the histogram of samples to illustrate that. The histogram is extremely narrow and quite hard to see in the plot using this axis limits, but quite nice to work with interactively.
If the step size is too large, it will propose in the low-likelihood regions too often and get rejected. If the step size is too small, it get stuck in one of the small "bumps". The periodicity of the plot is due to aliasing.
Run the code with differenet initializations and zoom around, and I think you will get a nice feel for it. :)
I also played around with different noise levels, sample sizes, prior parameters etc. The conclusions changes with that of course. But in this low noise setting I have in the code below, the prior don't matter at all.
N.B. The plot shows the log likeliehood, since we are working with really small likelihoods. The use of logarithms can be crucial to get numerical stability.
The code also produces a different diagnostic plot
that tells us that the burn in period is long enough, that the autocorrelation between samples (after burn-in) is low, and that when I pick the expected posterior parameter, and generate synthetic data with it, it looks quite okay.
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(999)
#
# Generate data for the observed time indices
#
dt = 0.1
M = 5
N = 100
u_true = 0.01 # the true u-value, to be estimated
lamda = 1 # assumed known
noise_variance = 0.01 # assumed known
t = np.array([N*k*dt + i*dt for i in range(5) for k in range(10)])
n_data = len(t)
x = np.exp(u_true*1j*t*4/lamda)
n = (
np.random.standard_normal(size=n_data) +
1j*np.random.standard_normal(size=n_data)
)*np.sqrt(noise_variance)/np.sqrt(2)
z = x + n
#
# Configure and Initialize Metropolis-Hastings
#
sigma2_u = 1 # the variance of the gaussian prior
mu_u = 0 # the mean of the gaussian prior
proposal_step_size = 0.0001 # adjusted in manual "annealing" process
u = 0.014 # initialization
get_proposal = lambda u: u+np.random.standard_normal()*proposal_step_size # proposal is brownian motion
burn_in_samples = 500
mcmc_steps = burn_in_samples+ 2000
def log_proposal_density_ratio(u_new,u):
"""The proposals density is symmetric, so the ratio is always 1, and the log(1) = 0 always"""
return 0
def log_likeliehood(u):
""" compute the log likeliehood of this parameter
as in
log(p(z,u)) = log(p(u)) + log(p(z|u))
"""
x = np.exp(u*1j*t*4/lamda) #compute the predicted data sequence under this u
n = z - x # compute the observed noise, assuming this predicted sequence
n_real_std = np.real(n) * np.sqrt(2) / np.sqrt(noise_variance) # this is a vector of standard-normal distributed values, under the model assumptions
n_imag_std = np.imag(n) * np.sqrt(2) / np.sqrt(noise_variance)
ll1 = -0.5*np.log(2*np.pi) - 0.5*n_real_std**2 # this is a vector of log likliehoods per observation
ll2 = -0.5*np.log(2*np.pi) - 0.5*n_imag_std**2
log_p_z_given_u = ll1.sum() + ll2.sum()
log_p_u = -0.5*np.log(2*np.pi*sigma2_u)-0.5*(u-mu_u)**2/sigma2_u
return log_p_u+log_p_z_given_u
#
# Run Metropolis-Hastings Algorithm
#
ll = log_likeliehood(u)
us = np.zeros(mcmc_steps)
did_accept = np.zeros(mcmc_steps)
for step in range(mcmc_steps):
u_new = get_proposal(u)
ll_new = log_likeliehood(u_new)
accept_ratio = np.exp(ll_new - ll + log_proposal_density_ratio(u_new,u))
rand = np.random.random()
if rand < accept_ratio:
u = u_new
ll = ll_new
did_accept[step] = 1
else:
did_accept[step] = 0
us[step] = u
#
# Do diagnostics on the results
#
print(f"After burn in, {did_accept.mean():.1%} of the proposals were accepted.")
print(f"There are {len(np.unique(us[burn_in_samples:]))} unique values in the kept samples")
fig,axs = plt.subplots(3,1,figsize=plt.figaspect(1))
axs=axs.flatten()
variation = us[burn_in_samples:]-us[burn_in_samples:].mean()
variation /= np.linalg.norm(variation)
corrs = np.correlate(variation,variation,mode='same')
axs[0].plot(corrs[len(corrs)//2:],label='correlogram of samples')
axs[0].legend()
axs[1].plot(np.arange(burn_in_samples),us[:burn_in_samples],color='C3',label='burn in')
axs[1].plot(np.arange(burn_in_samples,mcmc_steps),us[burn_in_samples:],color='C2',label='kept samples')
axs[1].legend()
axs[2].scatter(t,np.real(x),label='real(x)')
axs[2].scatter(t,np.real(z),label='real(z)')
axs[2].scatter(t,np.real(np.exp(us.mean()*1j*t*4/lamda)),label='real(estimated x)')
axs[2].legend()
#
# Explanation plot
#
urange = np.linspace(-0.05,0.2,500)
lls = np.array([log_likeliehood(u) for u in urange])
fig,axs = plt.subplots(2,1,sharex=True)
axs=axs.flatten()
axs[0].hist(us[burn_in_samples:],label='MCMC samples (after burn in)',alpha=0.7)
axs[0].hist(us[:burn_in_samples],label='MCMC samples (burn in)',alpha=0.5)
axs[0].axvline(us.mean(),label=f'MCMC mean ={us.mean():.3f}',color='C1',linestyle='dotted')
axs[0].axvline(u_true,label=f'true u = {u_true:.3f}',color='C2',linestyle='dotted')
axs[0].legend()
axs[1].set_ylabel("Log Likliehood")
axs[1].set_xlabel("u")
axs[1].plot(urange,lls)
plt.show()
In summary:
- I can't tell from your description if you had any problems with the implementation. It seems from the discussion with Sextus Empiricus that there might be misundrestanding about how to compute the likeliehood.
- Metropolis Hastings can be used on this problem to generate samples from the posterior, if appropriate step size is used and it is initialized around a suitable local optimum.
- The use of logarithms may be critical, as log likelihoods can be as small as -2500 .
- For my number of data points and noise levels, the prior didn't have any notice'able effect. It can matter in your case.
Best Answer
First, the acceptance rate for random walk Metropolis-Hastings and MALA should be lower than 50%. For instance, to quote from an answer to an earlier X validated question,
Second, to keep tuning $\varepsilon$ according to earlier runs invalidates the Markov justification of MCMC algorithms unless the tuning is controlled, for instance under diminishing adaptation. Entries on X Validated with the keywords
adaptive
andmetropolis
provide some references on the topic.