I am using SGD svm from scikit learn. I find that unlike SVC who has support_ as a member of the model to store the index of the support vectors, SGDClassifier only gives me the weight of the decision boundary. Is there any way that I can identify support vectors within each class while using SGD support vector machine?
Solved – How to identify support vectors in SGD svm
machine learningscikit learnsvm
Related Solutions
The proportion of support vectors is an upper bound on the leave-one-out cross-validation error (as the decision boundary is unaffected if you leave out a non-support vector), and thus provides an indication of the generalisation performance of the classifier. However, the bound isn't necessarily very tight (or even usefully tight), so you can have a model with lots of support vectors, but a low leave-one-out error (which appears to be the case here). There are tighter (approximate) bounds, such as the Span bound, which are more useful.
This commonly happens if you tune the hyper-parameters to optimise the CV error, you get a bland kernel and a small value of C (so the margin violations are not penalised very much), in which case the margin becomes very wide and there are lots of support vectors. Essentially both the kernel and regularisation parameters control capacity, and you can get a diagonal trough in the CV error as a function of the hyper-parameters because their effects are correlated and different combinations of kernel parameter and regularisation provide similarly good models.
It is worth noting that as soon as you tune the hyper-parameters, e.g. via CV, the SVM no longer implements a structural risk minimisation approach as we are just tuning the hyper-parameters directly on the data with no capacity control on the hyper-parameters. Essentially the performance estimates or bounds are biased or invalidated by their direct optimisation.
My advice would be to no worry about it and just be guided by the CV error (but remember that if you use CV to tune the model, you need to use nested CV to evaluate its performance). The sparsity of the SVM is a bonus, but I have found it doesn't generate sufficient sparsity to be really worthwhile (L1 regularisation provides greater sparsity). For small problems (e.g. 400 patterns) I use the LS-SVM, which is fully dense and generally performs similarly well.
Best Answer
You may have some mis-understanding of SVM types. There is no SGD SVM. See this post.
Difference between the types of SVM
Stochastic gradient descent (SGD) is an algorithm to train the model. According to the documentation, SGD algorithm can be used to train many models. SVM is just one special case. More information about SGD, can be found here.
How could stochastic gradient descent save time comparing to standard gradient descent?
Therefore, SGD will not affect what is inside of SVM. If you are using SVC (C-Support Vector Classification) and use SGD for learning, you should still have everything about SVC. The $\alpha$ will tell you where are the "support vectors" as usual.
In python, it is possible they use some object oriented design that hide some fields if you are using SGD classifiers. But that should be a stack overflow question.