On scikit-learn==0.14.1.
$\theta_0$ can be a vector. The following code works for me.
import numpy as np
from sklearn.gaussian_process import GaussianProcess
from sklearn.datasets import make_regression
X, y = make_regression()
bad_theta = np.abs(np.random.normal(0,1,100))
model = GaussianProcess(theta0=bad_theta)
model.fit(X,y)
You can pass any kernel you want as the parameter corr. The following is the radial basis function that sklearn uses for Gaussian processes.
def squared_exponential(theta, d):
"""
Squared exponential correlation model (Radial Basis Function).
(Infinitely differentiable stochastic process, very smooth)::
n
theta, dx --> r(theta, dx) = exp( sum - theta_i * (dx_i)^2 )
i = 1
Parameters
----------
theta : array_like
An array with shape 1 (isotropic) or n (anisotropic) giving the
autocorrelation parameter(s).
dx : array_like
An array with shape (n_eval, n_features) giving the componentwise
distances between locations x and x' at which the correlation model
should be evaluated.
Returns
-------
r : array_like
An array with shape (n_eval, ) containing the values of the
autocorrelation model.
"""
theta = np.asarray(theta, dtype=np.float)
d = np.asarray(d, dtype=np.float)
if d.ndim > 1:
n_features = d.shape[1]
else:
n_features = 1
if theta.size == 1:
return np.exp(-theta[0] * np.sum(d ** 2, axis=1))
elif theta.size != n_features:
raise ValueError("Length of theta must be 1 or %s" % n_features)
else:
return np.exp(-np.sum(theta.reshape(1, n_features) * d ** 2, axis=1))
It looks like you're doing something pretty interesting, btw.
A simple way to do this is to subtract the "centering value" of the coefficient times its associated variable from the left-hand side. To go with your example,
$Y = \beta_1X_1 + \beta_2X_2 + \beta_3X_3 + \beta_4X_4 + e$
Assume the coefficient values should be centered at (5,1,-1,-5) respectively. Then:
$Y - 5X_1 -X_2 +X_3 +5X_4 = (\beta_1-5)X_1 + (\beta_2-1)X_2 + (\beta_3+1)X_3 + (\beta_4+5)X_4 + e$
and, redefining terms, you have:
$Y^* = \beta_1^*X_1 + \beta_2^*X_2 + \beta_3^*X_3 + \beta_4^*X_4 + e$
A standard ridge regression would shrink the $\beta_i^*$ towards 0, which is equivalent to shrinking the original $\beta_i$ towards the specified centering values. to see this, consider a fully-shrunk $\beta_4^* = 0$, then $\beta_4+5 = 0$ and therefore $\beta_4 = -5$. Shrinkage accomplished!
Best Answer
Ridge regression looks like:
$$ \min_{\beta}||Y-X\beta||^2 + \lambda_1 ||\beta||^2 $$
If you want to instead compute
$$ \beta^* = \arg\min_{\beta}||Y-X\beta||^2 + \lambda_1 ||\beta - \beta_0||^2 $$
I guess you could just turn this into shrinking towards zero using the new variable
$$\theta = \beta - \beta_0.$$
So you'd solve:
$$ \theta^* := \arg\min_{\theta}||Y-X\beta_0-X \theta||^2 + \lambda_1 ||\theta||^2 $$
Then apply the change of variables again (i.e., $\beta^* := \theta^* + \beta_0$).
So to recap, if I have some black box function $\text{RidgeRegression}(Y,X, \lambda)$, I can use it to solve for an arbitrary prior $\beta_0$ simply by calling $\text{RidgeRegression}(Y-X\beta_0, X, \lambda)$.