Solved – Checking the regression model’s performance

logisticr

I am R-tool beginner. I have a question regarding how to know the performance of a linear regression model by using validation data.
My approach was

  1. Create training and validation data sets from original data set.
    "train" is name of my training data set and "valid" is name of my validation data set. "category" will be my target variable and "date_time" is my independent variable.

  2. Use training data set to create a regression model

    attach(train)

    lreg=lm(category~date_time)

  3. Now do predictions for validation data set using model created with training data set

    p=predict(lreg,valid)

  4. Now check the accuracy by finding the values of ACC, AUC.

    mmetric(valid$category,p,"AUC")

    mmetric(valid$category,p,"ACC")

Now if AUC and ACC have small values then it means that model created by training data set is not good in making predictions.

Is my approach correct ?

Thanks and regards!

Best Answer

You should take such a result w/a pinch of salt. For instance, due to a particular mix, despite a bad train result, you can get a good test result. Therefore, it's better if you do a cross-validation-like approach and try to define multiple train-test scenario and see if you have consistency across them.

PS, if your output is like classes, you may wanna you classification, not regression ...

Related Question