Solved – What do eps and tol do in LassoCV (scikit-learn)

lassopythonregressionregularizationscikit learn

Using scikit-learn:

http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LassoCV.html

Specifically, I am interested in:

1) If eps grows, does the accuracy(precision) increase or decrease?

2) If tol grows, does the accuracy(precision) increase or decrease?

Best Answer

Here is an example of LassoCV's affect on MSE with varying eps and tol (using the diabetes dataset), for various $\alpha$'s. Note that this is the average MSE (each CV run will have a different MSE):

It appears that eps has a significant impact for some penalty parameters, but with a large enough penalty it doesn't matter. tol doesn't seem to play a large role (at least as far as scikit has implement LassoCV).

See below for code.

import matplotlib.pyplot as plt
from matplotlib.pyplot import cm
%matplotlib inline
import numpy as np
from sklearn import datasets
from sklearn.linear_model import LassoCV

# load data
diabetes = datasets.load_diabetes()
X = diabetes.data
y = diabetes.target

# Plot of epsilons
epss = [0.0001, 0.001, 0.01, 0.1]

plt.figure(figsize=(10,6))
color = cm.rainbow(np.linspace(0,1,len(epss)))

for i,c in zip(epss,color):
    model = LassoCV(eps=i).fit(X, y)

    ymin, ymax = 2300, 3800
    plt.plot(m_log_alphas, model.mse_path_.mean(axis=-1), color=c,
             label='eps = {}'.format(i), linewidth=2)
    plt.legend()

    plt.xlabel('-log(alpha)')
    plt.ylabel('Mean square error')
    plt.axis('tight')
    plt.ylim(ymin, ymax)


# Plot of tols
plt.figure(figsize=(10,6))
tols = [0.0001, 0.001, 0.01, 0.1, 1]

color = cm.rainbow(np.linspace(0,1,len(tols)))

for i,c in zip(tols,color):
    model = LassoCV(tol=i).fit(X, y)

    ymin, ymax = 2300, 3800
    plt.plot(m_log_alphas, model.mse_path_.mean(axis=-1), color=c,
             label='tol = {}'.format(i), linewidth=2)
    plt.legend()

    plt.xlabel('-log(alpha)')
    plt.ylabel('Mean square error')
    plt.axis('tight')
    plt.ylim(ymin, ymax)

Related Solutions

Solved – How to interpret scikit learn classification tree

The ordering of the classes in the value parameter should be deterministic and independent of the samples ordering in the training set.

Either the unique string name or integer identifier for each class is stored in the classes_ attribute of the DecisionTreeClassifier instance after a call to the fit method.

Solved – How to evaluate the predicted values using Scikit-Learn

In order to get the accuracy of the predication you can do:

print accuracy_score(expected, y_1)

If you want a few metrics, such as, precision, recall, f1-score you can get a classification report:

print classification_report(expected, y_1)

A confusion matrix will tell how many of the samples that were classified are classified according to which label. This will tell you if your classifier confuses some categories.

The functions to get these metrics are independent of the classification model you are using. (So you can easily test an SVM for example)

You should use predict() since this will give the labels of the classified samples. predict_proba will give the propability of a sample belonging to a category

I recommend reading a few of the documentation pages:

Best Answer

Related Solutions

Solved – How to interpret scikit learn classification tree

Solved – How to evaluate the predicted values using Scikit-Learn

Related Question