If have a decision tree, say with two leaves, how do I calculate the total error? One can calculate the error for each leaf, but is the total error the sum of the errors or the product (or neither)?
Solved – How to calculate total error of decision tree
carterrorprobability
Related Solutions
Just a couple of remarks that may be helpful:
As far as I know decision trees are not traditionally used for anomaly detection. Support Vector Machines, Artificial Neural Networks, Gaussian Mixture Models, and Bayesian Networks are the more commonly used machine learning methodologies for this purpose. Yuo can have a look at this paper for further reading.
What you describe can be used to highlight the 'unlikely' cases in the leaf nodes. However, bear in mind that depending on your feature dimensionality and training data size, you may end up with leaf nodes with very few observations, e.g. 2 positives and 0 negatives. In this case, it is debatable whether labelling a 'negative' observation with the variable combination of that particular leaf node as 'potentially fraudulent' would be wise.
Similarly, as you point out, labelling based on the dominant class of a leaf node, if the class frequencies are 43% positive and 57% negative, it may not make much sense to make any inferences about any observation to be an anomaly (e.g. fradulent).
Furthermore, the prior distribution of your class labels should also affect your decisions. For instance if initially 90% of your observations are labelled as negative, any inferences you will make from the posterior distributions should take this inherent bias into account (not only in detecting anomalies but also in evaluating the performance of your classifier in the first place).
Total error it's just information for you or for some heuristic algorithms where you need just compare current iteration error with error from previous epoch. So you can compute error as you wish.
But before making it you need to think that you calculate it in the right way. For example, if you use error function as
$E = target - output$
and for example you have this data
$target = [1, 0, 1]$
$output = [0, 1, 1]$
Error you will get:
$E = [1, 0, 1] - [0, 1, 1] = [1, -1, 0]$
And if you try calculate the mean, you'll get this result:
$mean = \frac{1 + (-1) + 0}{3} = 0$
So your error is $0$. It's wrong (as solution you can use absolute value of error and then take a mean). But in real algorithm you will probably use cross entropy or square error there no this problem. Simple difference you will use only for simple algorithms like Perceptron.
Also if you use square error in huge data you can get big output error, maybe $10000$ or $100000$ and after n-th iteration you error will get something like $50$ error and your graph will be not really informative, because result function will look like a step function.
Best Answer
The total error will be the sum of the individual errors, but out of the sum of all predictions.
Most likely the easiest way to do this will be to form a confusion matrix for your model. Any software that can fit decision trees for you should be able to make a confusion matrix for you. Here is an example (coded in
R
) adapted from by answer here:The error is 1-accuracy (
1-0.8077 = 0.1923
). To get the raw number, you can sum the off-diagonal elements from the confusion matrix (0+15 = 15
).