I am new learner in the field of classification, and am stuck with a problem while implementing logistic regression:
My data set consists of about 300 measurement, with 20 features. I implemented logistic regression model using glmfit and got the probability (Y) values. Next, I use the model output (Y) to generate ROC curve, which gives me sensitivity and specificity of the model/technique.
(1) I am using the entire data set for training and testing. Is that correct? If not, how can I validate my model? Is there a way to know if I am not overfitting by using all the features?
(2)I have tried to implement k-fold cross-validation(k =10), by running logistic regression and getting the sensitivity/specificity for test set 10 times. But my concern is that I am creating a new model for each of the 10 training sets, so in the end I do not have a single classifier.
Thanks,
Vikrant
Best Answer