Solved – Bayesion priors in ridge regression with scikit learn’s linear model

pythonregressionridge regressionscikit learn

I'm using scikit learn's linear model to do ridge regression.
Ridge regression penalizes parameters for moving away from zero. I want to penalize for moving away from a certain prior, with each parameter having a different prior.

Is this possible with scikit learn's linear model? I know there's a BayesianRidge module there, but I'm not sure what it does.

Best Answer

Ridge regression looks like:

$$ \min_{\beta}||Y-X\beta||^2 + \lambda_1 ||\beta||^2 $$

If you want to instead compute

$$ \beta^* = \arg\min_{\beta}||Y-X\beta||^2 + \lambda_1 ||\beta - \beta_0||^2 $$

I guess you could just turn this into shrinking towards zero using the new variable

$$\theta = \beta - \beta_0.$$

So you'd solve:

$$ \theta^* := \arg\min_{\theta}||Y-X\beta_0-X \theta||^2 + \lambda_1 ||\theta||^2 $$

Then apply the change of variables again (i.e., $\beta^* := \theta^* + \beta_0$).

So to recap, if I have some black box function $\text{RidgeRegression}(Y,X, \lambda)$, I can use it to solve for an arbitrary prior $\beta_0$ simply by calling $\text{RidgeRegression}(Y-X\beta_0, X, \lambda)$.

Related Solutions

Solved – Scikit-learn’s Gaussian Processes: How to include multiple hyperparameters in kernel/cov function

On scikit-learn==0.14.1.

$\theta_0$ can be a vector. The following code works for me.

import numpy as np
from sklearn.gaussian_process import GaussianProcess
from sklearn.datasets import make_regression
X, y = make_regression()
bad_theta = np.abs(np.random.normal(0,1,100))
model = GaussianProcess(theta0=bad_theta)
model.fit(X,y)

You can pass any kernel you want as the parameter corr. The following is the radial basis function that sklearn uses for Gaussian processes.

def squared_exponential(theta, d):
    """
    Squared exponential correlation model (Radial Basis Function).
    (Infinitely differentiable stochastic process, very smooth)::

                                            n
        theta, dx --> r(theta, dx) = exp(  sum  - theta_i * (dx_i)^2 )
                                        i = 1

    Parameters
    ----------
    theta : array_like
        An array with shape 1 (isotropic) or n (anisotropic) giving the
        autocorrelation parameter(s).

    dx : array_like
        An array with shape (n_eval, n_features) giving the componentwise
        distances between locations x and x' at which the correlation model
        should be evaluated.

    Returns
    -------
    r : array_like
        An array with shape (n_eval, ) containing the values of the
        autocorrelation model.
    """

    theta = np.asarray(theta, dtype=np.float)
    d = np.asarray(d, dtype=np.float)

    if d.ndim > 1:
        n_features = d.shape[1]
    else:
        n_features = 1

    if theta.size == 1:
        return np.exp(-theta[0] * np.sum(d ** 2, axis=1))
    elif theta.size != n_features:
        raise ValueError("Length of theta must be 1 or %s" % n_features)
    else:
        return np.exp(-np.sum(theta.reshape(1, n_features) * d ** 2, axis=1))

It looks like you're doing something pretty interesting, btw.

Solved – Ridge Regression in R where coefficients are penalized toward numbers other than zero

A simple way to do this is to subtract the "centering value" of the coefficient times its associated variable from the left-hand side. To go with your example,

$Y = \beta_1X_1 + \beta_2X_2 + \beta_3X_3 + \beta_4X_4 + e$

Assume the coefficient values should be centered at (5,1,-1,-5) respectively. Then:

$Y - 5X_1 -X_2 +X_3 +5X_4 = (\beta_1-5)X_1 + (\beta_2-1)X_2 + (\beta_3+1)X_3 + (\beta_4+5)X_4 + e$

and, redefining terms, you have:

$Y^* = \beta_1^*X_1 + \beta_2^*X_2 + \beta_3^*X_3 + \beta_4^*X_4 + e$

A standard ridge regression would shrink the $\beta_i^*$ towards 0, which is equivalent to shrinking the original $\beta_i$ towards the specified centering values. to see this, consider a fully-shrunk $\beta_4^* = 0$, then $\beta_4+5 = 0$ and therefore $\beta_4 = -5$. Shrinkage accomplished!

Best Answer

Related Solutions

Solved – Scikit-learn’s Gaussian Processes: How to include multiple hyperparameters in kernel/cov function

Solved – Ridge Regression in R where coefficients are penalized toward numbers other than zero

Related Question