Solved – How to use the Hessian matrix for maximum likelihood estimation

MATLABmaximum likelihoodnormal distribution

I am trying to teach myself maximum likelihood estimation using the Newton-Raphson method and related iterative methods. I don't understand the link between the hessian, the expected value of the hessian, the information matrix, and the expected information matrix.

Most resources say that you can compute estimates of parameters using the inverse of the hessian. (I have been referring to this pdf on maximum likelihood.)

I am testing the method by generating normally distributed data and trying to estimate the mean and variance. When I use the hessian method the variance seem to diverge. If I use a closed-form solution for the expected information matrix, the iterative method works. I want to use this method for more complex expressions. I figured it would be easier to stick with the hessian. How do I compute the expected value of the hessian?

Here is my Matlab code:

clear all; close all;
N = 500;
mu = 1; var = 1;
% Generate normally distributed data with mean mu and variance var
data = mu + sqrt(var)*randn(N,1);

Hess = zeros(2);
I = zeros(2);
estim = [1.4; 1.4];

% Closed-form estimates
MLE_mean = mean(data)
MLE_var = sum((data-MLE_mean).^2)/N

for iter = 1:400
 m = estim(1);
 v = estim(2);

 dldm = (sum(data) - N*m)/v;             % partial derivative wrt mean

 dldv = (sum((data-m).^2)/v  - N)/(2*v); % partial derivative wrt variance

 grad = [dldm; dldv];                    %gradient of log-likelihood

 Hess(1,1) = -N/v;
 Hess(2,2) = -(sum(data) - N*m)/(v.^3) + N/(2*v.^2);

 Hess(1,2) = -(sum(data) - N*m)/(v.^2);
 Hess(2,1) = Hess(1,2);

 Inverse_of_Expected_Info(1,1) = v/N;
 Inverse_of_Expected_Info(2,2) = 2*(v^2)/N;

 %estim = estim + Inverse_of_Expected_Info*grad;   % This method works

 estim = estim -(pinv((Hess))*grad);            % This method doesn't work

end

Iterative_estimate_mean = estim(1)
Iterative_estimate_var = estim(2)

Best Answer

You are using unit step-length for your optimization, that's why your method diverges after 40-50 iterations. Here is the plot of your intermediate solutions: enter image description here

You should perform line search (e.g. backtracking) along your Newton directions.

Related Question