There seem to be some misconceptions about what the Metropolis-Hastings (MH) algorithm is in your description of the algorithm.
First of all, one has to understand that MH is a sampling algorithm. As stated in wikipedia
In statistics and in statistical physics, the Metropolis–Hastings algorithm is a Markov chain Monte Carlo (MCMC) method for obtaining a sequence of random samples from a probability distribution for which direct sampling is difficult.
In order to implement the MH algorithm you need a proposal density or jumping distribution $Q(\cdot\vert\cdot)$, from which it is easy to sample. If you want to sample from a distribution $f(\cdot)$, the MH algorithm can be implemented as follows:
- Pick a initial random state $x_0$.
- Generate a candidate $x^{\star}$ from $Q(\cdot\vert x_0)$.
- Calculate the ratio $\alpha=f(x^{\star})/f(x_0)$.
- Accept $x^{\star}$ as a realisation of $f$ with probability $\alpha$.
- Take $x^{\star}$ as the new initial state and continue sampling until you get the desired sample size.
Once you get the sample you still need to burn it and thin it: given that the sampler works asymptotically, you need to remove the first $N$ samples (burn-in), and given that the samples are dependent you need to subsample each $k$ iterations (thinning).
An example in R can be found in the following link:
http://www.mas.ncl.ac.uk/~ndjw1/teaching/sim/metrop/metrop.html
This method is largely employed in Bayesian statistics for sampling from the posterior distribution of the model parameters.
The example that you are using seems unclear to me given that $f(x)=ax$ is not a density unless you restrict $x$ on a bounded set. My impression is that you are interested on fitting a straight line to a set of points for which I would recommend you to check the use of the Metropolis-Hastings algorithm in the context of linear regression. The following link presents some ideas on how MH can be used in this context (Example 6.8):
Robert & Casella (2010), Introducing Monte Carlo Methods with R, Ch. 6, "Metropolis–Hastings Algorithms"
There are also lots of questions, with pointers to interesting references, in this site discussing about the meaning of likelihood function.
Another pointer of possible interest is the R package mcmc
, which implements the MH algorithm with Gaussian proposals in the command metrop()
.
I don't have a great example off the top of my head, but MH is easy compared to direct sampling whenever the parameter's prior is not conjugate with that parameter's likelihood. In fact this is the only reason I have ever seen MH preferred. A toy example is that $p \sim \text{Beta}(\alpha, \beta)$, and you wanted to have (independent) priors $\alpha, \beta \sim \text{Gamma}()$. This is not conjugate and you would need to use MH for $\alpha$ and $\beta$.
This presentation gives an example of a Poisson GLM which uses MH for drawing the GLM coefficients.
If you don't already know, it might be worth noting that direct sampling is just the case of MH when we always accept the drawn value. So whenever we can direct sample we should, to avoid having to tune our proposal distribution.
Best Answer
Let $q(x^*|x_{old})$ be your proposal (jumping) distribution and let $x^*$ be a proposed value from this distribution. Then the Metropolis-Hastings acceptance probability is $$ \rho = \min \left\{ a, \frac{f(x^*)}{f(x_{old})} \frac{q(x_{old}|x^*)}{q(x^*|x_{old})} \right\} $$ and set $x_{new} = x^*$ with probability $\rho$ and otherwise $x_{new} = x_{old}$.
The Metropolis-Hastings algorithm is appealing because you have a wide range of flexibility in specifying $q$. For example, you can use $N(x_{old},\sigma^2)$ in which case you have to remember that the target $f$ for a non-negative parameter includes an indicator function that indicates the parameter must be positive. Thus any negative values for $x^*$ are rejected because $f(x^*)$ will be zero and you set $x_{new} = x_{old}$. This proposal is convenient because it is symmetric, i.e. $q(x_{old}|x^*)= q(x^*|x_{old})$ for all $x^*, x_{old}$. Thus, you never need to calculate the last ratio in the acceptance probability.
Alternatively (as suggested in the comments), you can use a truncated normal, e.g. $N(x_{old},\sigma^2)\mathrm{I}(x^*>0).$ But note that the normalizing constant for this truncated normal depends on $x_{old}$ and thus this proposal is not a symmetric proposal and thus you will need to calculate the last ratio in the acceptance probability.
The choice of setting any proposed value to zero doesn't gain you anything since (I'm guessing) either your data will indicate that the parameter cannot actually be zero and thus $f(x^*)$ will be zero when $x^*=0$. Even if this isn't the case, evaluation of the proposal distribution will be a bit annoying since it is now a mixture of a continuous distribution and a point mass at zero.
There are many other choices for the proposal distribution including independent proposals, i.e. those that do not depend on $x_{old}$. You have not provided enough information for us to give a "better" proposal distribution because this will depend on the target distribution. If this target distribution doesn't have too much mass near zero, then the normal random-walk proposal, i.e. $N(x_{old},\sigma^2)$, will likely work well even though it occasionally will reject a negative proposed value.