From a previous post :
First to explain the MH algorithm, it's used to approximate
numerically a target distribution, in this case $p(\theta|D)$. At
each stage of the algorithm:
- A value $\theta_{proposed}$ is proposed using the jumping or proposal distribution.
- An acceptance ratio is calculated, equal to $\frac{p(\theta_{proposed}|D)}{p(\theta_{current}|D)}$. Because we
cannot calculate $p(\theta|D)$ directly, we leverage the proportional
expression of Bayes rule and calculate this quotient using the
products of likelihood and prior corresponding to
$\theta_{proposed}, \theta_{current}$. That is, the acceptance ratio
is:$$\frac{p(D|\theta_{proposed})p(\theta_{proposed})}{p(D|\theta_{current})p(\theta_{current})}$$
- If this ratio exceeds one—intuitively, if the proposed parameter value is more likely given data and prior—we accept this proposal. If
not, we accept it with probability equal to the ratio.As more and more values are sampled in this way, the trace of $\theta$
values more and more closely approximates the true distribution of
$\theta$.
Question 1) In my case, I have 6 parameters to estimate : Are $p(\theta_{\text{proposed}})$ and
$p(\theta_{\text{current}})$ different ? especially if I am using uniform law.
Question 2) If yes, I could simply write :
$$\text{Acceptance}_{\text{Ratio}} = \dfrac{p(D|\theta_{\text{proposed}})}{p(D|\theta_{\text{current}})}$$
Question 3) If I use a normal distribution with variance and mean for each of 6 parametric, have I still the symetry of proposal ?
Question 4) How to express this acceptance ratio as a function of Likehood function or cost function used to find the 6 parameters ?
My cost function is equal to :
$$J(\theta)=(y-H\,\theta)(y- H\,\theta)^T$$
with $d = H\,\theta$ where $d$ is the response of system.
UPDATE 1: Here is below the Metropolis-Hastings I am using :
%%%%%% MCMC - Metropolis Algorithm %%%%%%%
% Declare parameters of array
nb = 1e5;
% Metropolis loop
% Index of main loop
i = 1;
% Reject number
j = 0;
% Accept number
k = 0;
% Starting values
l1 = [10, 3];
l2 = [10, 3];
l3 = [10, 3];
l4 = [10, 3];
l5 = [10, 3];
l6 = [10, 3];
% Initial param array
paramsArray = zeros(nb,6);
paramsArray(1,1:6) = [l1(1) + l1(2)*rand(1), l2(1) + l2(2)*rand(1), ...
l3(1) + l3(2)*rand(1), l4(1) + l4(2)*rand(1), ...
l5(1) + l5(2)*rand(1), l6(1) + l6(2)*rand(1)];
% Init parameters vector
pStart = paramsArray(1,1:6)';
% Standard deviation vectors for proposal distributon
gamMatrix = diag([l1(2), l2(2), l3(2), l4(2), l5(2), l6(2)]);
% Starting prior distribution f(p|d)
w_x = Crit_J(pStart,D)*exp(-((pStart(1)-l1(1))^2/(2*l1(2)^2)+(pStart(2)-l2(1))^2/(2*l2(2)^2)+ ...
(pStart(3)-l3(1))^2/(2*l3(2)^2)+(pStart(4)-l4(1))^2/(2*l4(2)^2)+ ...
(pStart(5)-l5(1))^2/(2*l5(2)^2)+(pStart(6)-l6(1))^2/(2*l6(2)^2)));
while (i <= nb)
if (i == 1)
% First random
ptest = pStart;
else
% Other random
ptest = abs(pStart + gamMatrix*randn(6,1));
end
% Upper term for acceptance ratio
w_y = Crit_J(ptest,D)*exp(-((ptest(1)-l1(1))^2/(2*l1(2)^2)+(ptest(2)-l2(1))^2/(2*l2(2)^2)+ ...
(ptest(3)-l3(1))^2/(2*l3(2)^2)+(ptest(4)-l4(1))^2/(2*l4(2)^2)+ ...
(ptest(5)-l5(1))^2/(2*l5(2)^2)+(ptest(6)-l6(1))^2/(2*l6(2)^2)));
% Generate u uniformly
u = rand(1);
% Ratio acceptance
log_prob = log(w_y/w_x);
% Test acceptation
if (log(u) < log_prob)
% Assing new paramsArray
paramsArray(i,1:6) = ptest(1:6);
w_x = Crit_J(paramsArray(i,1:6),D)*exp(-((ptest(1)-l1(1))^2/(2*l1(2)^2)+(ptest(2)-l2(1))^2/(2*l2(2)^2)+ ...
(ptest(3)-l3(1))^2/(2*l3(2)^2)+(ptest(4)-l4(1))^2/(2*l4(2)^2)+ ...
(ptest(5)-l5(1))^2/(2*l5(2)^2)+(ptest(6)-l6(1))^2/(2*l6(2)^2)));
i = i+1;
k = k+1;
else
% Assing to previous
if (i ~= 1)
paramsArray(i,1:6) = paramsArray(i-1,1:6);
i = i+1;
j = j+1;
end
end
end
disp('acceptationt : ratio');
disp(k/nb)
disp('reject : ratio');
disp(j/nb)
% Display mean of different parameters
disp('Parameters with Metropolis-Hastings :');
mean(paramsArray(:,1))
mean(paramsArray(:,2))
mean(paramsArray(:,3))
mean(paramsArray(:,4))
mean(paramsArray(:,5))
mean(paramsArray(:,6))
and my cost function (assimilated to Likelihood function) :
% Function of cost
function cost = Crit_J(p,D)
% Compute the model corresponding to parameters p
[R,C] = size(D);
[Cols,Rows] = meshgrid(1:C,1:R);
% Model
Model = (1+((Rows-p(3)).^2+(Cols-p(4)).^2)/p(5)^2).^(-p(6));
model = Model(:);
d = D(:);
% Introduce H matrix
H = [ model, ones(length(model),1)];
% Compute the cost function : taking absolute value
cost = abs((d-H*[p(1),p(2)]')'*(d-H*[p(1),p(2)]'));
end
If you could see the error …
Best Answer
The problem with your description of the Metropolis-Hastings algorithm is that your notation does not distinguish between the probability densities in the actual problem you are trying to solve, and the proposal density used in the algorithm. Your notation also fails to capture the fact that we are trying to simulate from the posterior distribution, but we only have a kernel of this distribution. A better description of the algorithm, which makes these distinctions in the notation, is as follows:
Now that we have a clearer explanation of the actual workings of the algorithm, I will try to answer your specific questions. (For consistency, I will translate your questions into notation that is consistent with my explanation of the algorithm.) Your Question 4 is unclear to me (there is no cost function in the algorithm so I don't know what you're referring to here), but I will answer the other three questions.
In the case where you use a proposal distribution that is symmetric (in the sense described above) the two proposal densities (with the argument and conditioning parameter switched) will be the same. Symmetry occurs in the case where you use the uniform proposal density that is centred around the conditioning value:
$$g(\theta' | \theta) \propto \mathbb{I}(|\theta' - \theta| \leqslant t).$$
In this case, switching the terms in the proposal density does not alter the value (i.e., they are not different). If you are using a uniform proposal density that is not centred around the conditioning value then this will not hold.
Assuming your uniform proposal distribution is centred around the conditioning parameter (and thus symmetric in the above sense), yes you can.
Symmetry occurs if you use a normal distribution with mean equal to the conditioning parameter and variance independent of this parameter:
$$g(\theta' | \theta) = \text{N}(\theta' |\theta, \Sigma) \propto \exp \Big( -\frac{1}{2} (\theta' - \theta)^\text{T} \Sigma^{-1} (\theta' - \theta) \Big).$$