Solved – Machine learning classifiers big-O or complexity

algorithmsclassificationmachine learningmultiple-comparisonstime complexity

To evaluate the performance a new classifier algorithm, I'm trying to compare the accuracy and the complexity (big-O in training and classifying). From Machine Learning: a review I get a complete supervised classifiers list, also a accuracy table between the algorithms, and 44 test problems from UCI data repositoy. However, I can't find a review, paper or web-site with the big-O for common classifiers like:

C4.5
RIPPER (I think this might not be possible, but who knows)
ANN with Back Propagation
Naive Bayesian
K-NN
SVM

If anyone has any expression for these classifiers, it will be very useful, thank you.

Best Answer

Let $N$ = number of training examples, $d$ = dimensionality of the features and $c$ = number of classes.

Then training has complexities:

Naive Bayes is $O(Nd)$, all it needs to do is computing the frequency of every feature value $d_i$ for each class.
$k$-NN is in $\mathcal{O}(1)$ (some people even say it is non-existent, but space complexity of training is $\mathcal{O}(Nd)$ since you need to store the data which also takes time).
Nonlinear non-approximate SVM is $O(N^2)$ or $O(N^3)$ depending on the kernel. You can get a $O(N^3)$ down to $O(N^{2.3})$ with some tricks.
Approximate SVM is $O(NR)$ where R is number of iterations.

Testing complexities:

Naive Bayes is in $\mathcal{O}(cd)$ since you have to retrieve $d$ feature values for each of the $c$ classes.
$k$-NN is in $\mathcal{O}(Nd)$ since you have to compare the test point to every data point in your database.

Source: "Core Vector Machines: Fast SVM Training on Very Large Data Sets" - http://machinelearning.wustl.edu/mlpapers/paper_files/TsangKC05.pdf

Sorry I don't know about the others.

Related Solutions

Solved – Comparing multiclass classification algorithms for a particular application

I am simply copy-pasting the answers I got from Alexandre Passos on Metaoptimize. It would really help if someone here can add more to it.

Any binary classifier can be used for multiclass with the 1-vs-all reduction, or the all-vs-all reduction. This list seems to cover most of the common multiclass algorithms.

Logistic regression and SVMs are linear (though SVMs are linear in kernel space). Neural networks, decision trees, and knn aren't lineasr. Naive bayes and discriminant analysis are linear. Random forests aren't linear.

Logistic regression can give you calibrated probabilities. So can many SVM implementations (though it requires slightly different training). Neural networks can do that too, if using a right loss (softmax). Decision trees and KNN can be probabilistic, though are not particularly well calibrated. Naive bayes does not produce well calibrated probabilities, nor does the discriminant analysis. I'm not sure about random forests, depends on the implementation I think.

All are deterministic except for neural networks and random forests.

Why do you want to compare different classification algorithms? Are you trying to decide which one is the best in general, or just for one application?

If the former, it's not worth doing it, as most claims are rather sketchy and there is no method which can give that kind of conclusion. If the latter, it is well accepted that cross-validation, or comparing performance on a fixed test-set, gives you unbiased results. For multiclass classification it is not always obvious which metric to use, but things like accuracy; per-class precision/recall/f1, per-class AUC, and the confusion matrix are commonly used.

Solved – Feature selection + classification in Caret

You should be able to accomplish everything you want with the sbf function instead. I originally assumed it worked the same way you are, but the functionality given by sbf is apparently more like a super set of what's available in train.

For example, something like this sounds like what you're getting at:

fit <- sbf(
  form = response ~ .,
  data = d, method = "glmnet", 
  tuneGrid=expand.grid(.alpha = .01, .lambda = .1),
  preProc = c("center", "scale"),
  trControl = trainControl(method = "none"),
  sbfControl = sbfControl(functions = caretSBF, method = 'cv', number = 10) 
)

This would run 10 outer folds and fit a single glmnet model to each, using only a feature subset. You could also specify some number of cv folds for trControl and a parameter grid to do training on inner folds.

Best Answer

Related Solutions

Solved – Comparing multiclass classification algorithms for a particular application

Solved – Feature selection + classification in Caret

Related Question