Regression – Tuning the Hyperparameters in scikit-learn

gaussian processhyperparameterregressionscikit learn

I am trying to find the hyperparameters of a gaussian process regression algorithm using sklearn. The book (Rasmussen), says I should to maximize the log marginal likelihood given by $$\log(\mathbf{y}|X,\mathbf{\theta})=-\frac{1}{2} \mathbf{y}^TK_y^{-1}\mathbf{y}-\frac{1}{2}\log(\det(K))-\frac{n}{2}\log(2\pi)$$
So I start from a RBF kernel in sklearn with some parameters (can they be easy and random, say just both 1.0?) and then try to find the correct $\theta$? I don't understand this approach, should I do this for each label in my dataset in bulk? Or consider one point of my training set at a time and update the weights at each iteration? I apologise for the confused question, but can somebody explain how to start implementing this method?

Best Answer

The Gaussian process is a Bayesian model. It uses Bayesian updating, so it doesn’t matter if you process the data one sample at a time, or all at once, the result would be the same. There is no reason why you would tune the hyperparameters on a subsample of your data other than using held-out test set for validation.

Related Question