Solved – Minimum number of instances to create a decision tree

cartweka

I'm kind of new doing data mining, so sorry if my question is not very clear.

I'm working in a project that is aiming to do data mining over the interactions of the students with a e-learning platform. So, I'm trying to generate decision trees with some data that I collected (number of times that student use a resource, number of activities done, number quizzes taken, etc). The trees that I'm getting are interesting for me and according with the cross-validation tests, it's kind of accurate.

So my doubt is the following, I generated the tree with the data collected over two months and there were 44 students in the course. Is this enough data to trust in the tree? I have only the 44 instances that gather the two months of interactions…

Thanks in advanced for your help.

Best Answer

{I'm guessing that for each person and each variable you have a single value covering the 2 months--i.e., you don't have repeated measures.} 44 sounds like an awfully small number. Is it a random sample of a larger population? The answer to that, for one thing, would affect your confidence in the findings.

I won't say it's impossible to achieve useful, replicable results in this situation, but it seems unlikely. Data mining via decision trees is an opportunistic process and requires crossvalidation, probably even more than other modelling procedures do.

I'd also like to know, if you're getting statistically significant differences that form the basis for the branchings, what criterion are you using for significance?

Related Question