Solved – About learning curves in Machine Learning

deep learningmachine learningpythonrsupervised learning

I am a newbie in the Machine Learning world, I completed the course (very good by the way) of Andrew Ng on Coursera. This question is very software-independent. I would like to know, when you draw a learning curve, do you represent the training error and CV error (using the metric that we want like rmse or $R^2$ for linear regression) as a function of the training set size? Or do you represent instead training error and test error as a function of the training set size? I have seen lot of people plotting the learning curve for the test error, whereas in the course of Andrew Ng I have seen the learning curve for the CV error.

I attach as an example some curve that I got few months ago using Python.

Thanks a lot for the clarification, best regards

Best Answer

It represents training error and testing error. No cross validation involved (usually we have one big fixed testing data set, and changing the size of training samples to produce the curve).

My answers here gives you more details:

How to know if a learning curve from SVM model suffers from bias or variance?

Related Solutions

Solved – Learning Curves Example

The training error refers to the error found when testing an algorithm on the data it was trained with. The training error curve slopes up because with very few training samples in relation to the number of features the model can over fit the training data and create a near perfect fit. As the number of training examples increases the model can no longer perfectly fit the data.

Suppose you are classifying email as spam or not spam and you have only 4 features. Lets say the features are if it contains the words buy, deal, offer, or try. There are 2^4 = 16 possible combinations of feature vectors. Now if you have 10 training examples it is feasible they could all have a unique combination of feature values. So when a model is trained on this data it is possible to exactly fit the training examples and the training error will be 0. Now if you use 100 training examples instead this is no longer possible. Some of the training examples will have the same feature vector and if they have different classifications the training error will increase.

Solved – Plotting learning curves for any classification algorithm

In fact, you can define your own error function and pass it to the validation_curve() function as so:

def rms_error(model, X, y):
    y_pred = model.predict(X)
    return np.sqrt(np.mean((y - y_pred) ** 2))

val_train, val_test = validation_curve(PolynomialRegression(), X, y,
                                       'polynomialfeatures__degree',
                                       degree, cv=7, scoring=rms_error)

Best Answer

Related Solutions

Solved – Learning Curves Example

Solved – Plotting learning curves for any classification algorithm

Related Question