Solved – Maximum Likelihood Estimation of a dataset

mathematical-statisticsmaximum likelihoodnormal distributionpython

I am coding a Maximum Likelihood Estimation of a given dataset (Data.csv). The goal is to estimate the mean and sigma. Note that the log of the dataset is well approximated by a normal distribution.

While working on the code, I have faced some issues that drive me crazy. The questions are listed below:

  • Maximizing the criterion function provides a postive log-likelihood.
    I expected the log-likelihood to be negative (see also 3D plot). Does
    a positive log-likelihood make any sense at all?
  • Why is the iteration method 'L-BFGS-B' (highly) sensitive to
    initalizations for mu_init and sig_init, while the method 'SLSQP' is
    not? Is there a logic behind this? Moreover, I noticed that the method 'L-BFGS-B' even provides inaccurate results when the initialization is far from the 'true' value.
  • Is there a way to calculate the Hessian when the method 'SLSQP' is
    employed?

  • I get small values (almost zero) when computing the log-likelihood,
    is there a way to deal with this issue of log(0)?

The csv-file containg the data can be found here: Data.csv

# ----------------------------------------------------------------------
# 0. Import the necessary libraries
# ----------------------------------------------------------------------
import numpy as np
import pandas as pd
import scipy.stats as sts
import matplotlib.pyplot as plt
import scipy.optimize as opt # minimizing procedure
import matplotlib
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
# ----------------------------------------------------------------------
# 1. Load data
# ----------------------------------------------------------------------
# Display settings
pd.set_option('display.width', 400) # display width in size
pd.set_option('display.max_columns', 50) # display columns

# read csv data
df = pd.read_csv("Data.csv")
df.set_index('ID', inplace=True) # set ID as index

# Data data
Data = df['Data']                                     # Load data
Data = Data.dropna()
Data = np.log(Data)                                   # Take log() transformation

# Check for right answers estimate
print('mean Data (check)= ', Data.mean(), 'std dev. Data (check) = ', Data.std() )
# ----------------------------------------------------------------------
# 2. Define function that generates probabilities from a normal pdf
# ----------------------------------------------------------------------
def norm_pdf(xvals, mu, sigma):
    pdf_vals = ((1 / (sigma * np.sqrt(2 * np.pi)) *
                 np.exp(- (xvals - mu) ** 2 / (2 * sigma ** 2)))) # normal pdf distribution
    return pdf_vals
# ----------------------------------------------------------------------
# 3. Define function that provides the log likelihood
# ----------------------------------------------------------------------
def log_lik_norm(xvals, mu, sigma):
    # Generates values from a normal distributed pdf
    pdf_vals = norm_pdf(xvals, mu, sigma)
    # Take log of normal distributed pdf values
    ln_pdf_vals = np.log(pdf_vals)
    # Summation of normal distributed pdf values (= loglikelihood)
    log_lik_val = ln_pdf_vals.sum()
    return log_lik_val
# ----------------------------------------------------------------------
# 4. Define criterion function
# ----------------------------------------------------------------------
def crit(params, *args): # Provides the negative log-likelihood
    # Parameters
    mu, sigma = params
    # Arguments
    xvals = args
    # Log-likelihood
    log_lik_val = log_lik_norm(xvals, mu, sigma)
    # Maximizing -> NOTE: Minimizing is computationally more stable
    neg_log_lik_val = -log_lik_val
    return neg_log_lik_val
# ----------------------------------------------------------------------
# 6. Maximization procedure of criterion function
# ----------------------------------------------------------------------
# Initialization of parameters
mu_init = 3
sig_init = 1 # initalization is sensitive for 'L-BFGS-B' -> first guess: method='SLSQP'
params_init = np.array([mu_init, sig_init])
# Arguments
mle_args = (Data)
# Minimizing procedure (constrained)
results_cstr = opt.minimize(crit, params_init, args=(mle_args), method='SLSQP',
                            bounds=((None, None), (1e-10, None)))
print(results_cstr)
# Maximum Likelihood Estimators
mu_MLE, sig_MLE = results_cstr.x
print('mu_MLE=', mu_MLE, ' sig_MLE=', sig_MLE)
# ----------------------------------------------------------------------
# 7. Plot the results of MLE
# ----------------------------------------------------------------------
# Plot the MLE estimated distribution
dist_Data = np.linspace(0, 10, 500) # generates chunk of data for pdf
plt.plot(dist_Data, norm_pdf(dist_Data, mu_MLE, sig_MLE),
         linewidth=2, color='k', label='3: $\mu$={},$\sigma$={}'.format(mu_MLE,sig_MLE))
plt.legend(loc='upper left')
#
plt.show()

# Print log-likelihoods for comparison
print('MLE log-likelihood 3: ', log_lik_norm(Data, mu_MLE, sig_MLE)) #MLE (clearly maxmized)

print('MLE log-likelihood check: ', log_lik_norm(Data, Data.mean(), Data.std())) #MLE (clearly maxmized)
# ----------------------------------------------------------------------
# 8. Plot log likelihood around M.L. Estimates for mu and sigma
# ----------------------------------------------------------------------
cmap1 = matplotlib.cm.get_cmap('summer')
#
mu_vals = np.linspace(0.75*mu_MLE, 1.25*mu_MLE, 50) # values around mu_MLE
sig_vals = np.linspace(0.75*sig_MLE, 1.25*sig_MLE, 50) # values around sig_MLE
lnlik_vals = np.zeros((50, 50))
for mu_ind in range(50):
    for sig_ind in range(50):
        lnlik_vals[mu_ind, sig_ind] = \
            log_lik_norm(Data, mu_vals[mu_ind],
                              sig_vals[sig_ind])

mu_mesh, sig_mesh = np.meshgrid(mu_vals, sig_vals)
#
fig = plt.figure()
ax = fig.gca(projection='3d')
ax.plot_surface(sig_mesh, mu_mesh, lnlik_vals, rstride=8,
                cstride=1, cmap=cmap1)
ax.set_title('Log likelihood for values of mu and sigma')
ax.set_xlabel(r'$\sigma$')
ax.set_ylabel(r'$\mu$')
ax.set_zlabel(r'log likelihood')
plt.show()
# ----------------------------------------------------------------------
# 9. Variance-covariance matrix of ML estimates
# ----------------------------------------------------------------------
# Obtain hessian from minimize function
vcv_mle = results_cstr.hess_inv.todense()
# Standard error of mu
stderr_mu_mle = np.sqrt(vcv_mle[0,0])
# Standard error of sigma
stderr_sig_mle = np.sqrt(vcv_mle[1,1])
#
print('VCV(MLE) = ', vcv_mle)
print('Standard error for mu estimate = ', stderr_mu_mle)
print('Standard error for sigma estimate = ', stderr_sig_mle)

— EDIT —

I used autograd reference Autograd MLE

However, as I run the following piece of code I get two errors when python tries to obtain: hessian_(theta_autograd)

  1. KeyError: < class 'pandas.core.series.Series' >
  2. TypeError: Can't differentiate w.r.t. type < class 'pandas.core.series.Series' >

The two errors are driving me crazy and I hope somebody might be of help.

The csv-data can be found here: csv-data

import warnings
warnings.filterwarnings('ignore')
import autograd.numpy as np
from autograd import grad, jacobian, hessian
from autograd.scipy.stats import norm
from scipy.optimize import minimize
import statsmodels.api as sm
import pandas as pd
import matplotlib.pyplot as plt

# ----------------------------------------------------------------------
# I. READ DATA
# ----------------------------------------------------------------------

# read excel data
df = pd.read_csv("Output.csv")
df.set_index('ID', inplace=True) # set ID as index

# # Plot
plt.scatter(df.X, df.Y, color='red', marker='+')
plt.title('X vs Y', fontsize=15)
plt.xlabel('X')
plt.ylabel('Y')
plt.show()

# ----------------------------------------------------------------------
# 0. Data x, y
# ----------------------------------------------------------------------

N = df.shape[0]             # number of observations
#
x_ = df.X                   # plug independent data!
x = np.c_[np.ones(N),x_]    # add constant to independent variables
y = df.Y                    # plug independent data!

# ----------------------------------------------------------------------
# 1. Parameters MLE
# ----------------------------------------------------------------------

# number of parameters
K = 2
# Parameter values
beta = np.random.randn(K)

# ----------------------------------------------------------------------
# 2. Negative Loglikelihood
# ----------------------------------------------------------------------
def neg_loglike(theta):
    beta = theta[:-1]
    # transform theta[-1] so that sigma > 0
    sigma = np.exp(theta[-1])
    mu = np.dot(x,beta)
    ll = norm.logpdf(y,mu,sigma).sum()
    return -1 * ll

# jacobian of neg_loglike
jacobian_  = jacobian(neg_loglike)
# hessian of neg_loglike
hessian_ = hessian(neg_loglike)

# ----------------------------------------------------------------------
# 2. Minimizing Negative Loglikelihood
# ----------------------------------------------------------------------
# initialization
theta_start = np.append(np.zeros(beta.shape[0]),0.0)
# minimization sci-py criterion
res1 = minimize(neg_loglike, theta_start, method = 'BFGS', jac = jacobian_)
print(res1)
# estimated parameters
theta_autograd = res1.x

# # ----------------------------------------------------------------------
# # 3. Results
# # ----------------------------------------------------------------------
# # std errors, calculate the information matrix and using the autograd hessian
information1 = np.transpose(hessian_(theta_autograd)) # <-- GIVES ERROR
# # se1 = np.sqrt(np.diagonal(np.linalg.inv(information1)))
# # #
# # print('se1 :', se1)
#
# # Put Results in a DataFrame
# results_a = pd.DataFrame({'Parameter':theta_autograd,'Std Err':se1})
# names = ['beta_'+str(i) for i in range(K)]
# names.append('log(Sigma)')
# results_a['Variable'] = names
# results_a['Model'] = "MLE Autograd"
# #
# print(results_a)
# #

# Jacobian of ML estimation
print("jacobian of theta_autgrad", jacobian_(theta_autograd))

# Loglikelihood of ML estimation
print("ML log-likelihood = ", -1*neg_loglike(res1.x))


# ----------------------------------------------------------------------
# 4. Check with Statsmodels
# ----------------------------------------------------------------------
model = sm.OLS(y, x)
results = model.fit()
print(results.summary())

EDIT 2–

def neg_loglike(theta):
    beta = theta[:-1]
    loc = np.dot(x, beta)
    scale = np.exp(theta[-1])
    ll = gumbel_r.logpdf(x, loc, scale).sum()
    return -1 * ll

Best Answer

I can suggest a number of improvements that will make any minimization method more stable.

  1. instead of
    # Generates values from a normal distributed pdf
    pdf_vals = norm_pdf(xvals, mu, sigma)
    # Take log of normal distributed pdf values
    ln_pdf_vals = np.log(pdf_vals)

you should analytically compute the log of the pdf. The reason is that the exponential term in the pdf is going to very very large for values away from the true values, and hence could underflow. Taking the log of an underflow'd number gives gibberish results. Instead, write code for the log of the pdf.

  1. Currently, you are not giving the solver any gradient (jacobian) information. This is extremely helpful for convergence stability and performance. Take a look at this blog post on how to automatically include gradients: https://rlhick.people.wm.edu/posts/mle-autograd.html Autograd also can compute hessian's exactly!

  2. Bounds, like on $\sigma$ can be sometimes tricky for inference. You can instead choose a new value, $\psi$ that is defined as $\sigma = \exp \psi$. So now $\psi$ can vary over any negative and positive number, but after exponentiating, $\sigma$ is still positive. This opens up other minimization routines to use.

Related Question