Solved – Observations weight-age in a Machine Learning model

machine learningweighted-data

I want to know is there any way in R/Python to specify to the model to emphasize its learning more on specific subset of data , while it considers the whole data.

For example – i have sales behavior data from 2011 to 2016 and i am predicting likelihood to buy in 2017 – i want the model to emphasize more on 2015-2016 data ( i.e. capture new learning – which may not be very evident when you consider the whole data from 2011 ). I can always build a separate model for for that time period or consider a time year variable for it to capture the effect , but is there some way to specify to the model that focus more on rows ( x to y ) as in give more weight-age to the learning from this subset from whole data.

Best Answer

One general approach is to try oversampling more/undersampling less important data.

With respect to weights it will depend on the algorihtm.

In Python it seems many algorithms in scikit-learn have it, for example SVMs, Stochastic Gradient Descent classifiers, and Random Forests have it, though unfortunately I can't find general documentation on this parameter.

Related Question