Solved – Gradient descent for parameter optimization with parameter > 0

gaussian processgradient descentoptimization

I want to apply gradient descent for parameter estimation in a Gaussian Process.
The parameters have to be > 0.
How can I prevent gradient descent to find parameters that are below zero?

My code for gradient descent (in Octave) looks like this:

function re = get_max(theta_0, epsilon, f1, k, X, y, noise, maxit=100)
  for i = 1:maxit
    direction = epsilon * f1(X, y, k, theta_0, noise);
    if(norm(direction) <= 1e-6)
      re = theta_0;
      return;
    endif
    theta_0 = theta_0 + direction;
  endfor
  re = theta_0;
endfunction

X and y is the training data of the gaussian process and k is the covariance function.
theta_0 is the vector of parameters that I want to estimate, epsilon is the step size for gradient descent and f1 is the derivative of the function (here the marginal log likelihood) that should be optimized.

My problem is that often gradient descent reaches parameters that are < 0. But I am only searching for positive parameters. I even cannot continue gradient descent, because calculation of the marginal log likehood or its derivative do not work for parameters < 0 (because of non positive definite covariance matrix).

So is there a good way to prevent gradient descent from finding parameters that are < 0?

Best Answer

Some options:

Exponentiation of the constrained variable will coerce it to be positive for all real inputs.
Don't use gradient descent. Using an optimizer that's built to deal with constraints will respect the constraints. L-BFGS-B is a strong, general-purpose choice.

Best Answer

Related Solutions

Solved – Batch gradient descent versus stochastic gradient descent

Solved – Most suitable optimizer for the Gaussain process likelihood function

Related Question