Solved – How to validate the association rules or results obtained from Market Basket Analysis? Train-test methodology

aprioricross-validationr

If I have a large set of transactions where in each has a set of goods and I want to do market basket analysis (affinity analysis) using Apriori. However, compared to traditional supervised machine learning algorithms like Linear Regression, Random Forests, Gradient Boost, etc there does not appear to be a corresponding methodology where you split into a train and test set, train on the train dataset and check for cross validation on the test set for algorithms such as Apriori. How do you know that your model is truly good? Are there some other metrics that can be used to ensure you are not overfitting your model and that it has no bias?

Best Answer

Market basket analysis traditionally isn't predictive, it's inferential. It looks in the past to determine what items were bought together, and it makes the assumption that the trends of the past will continue.

Regarding the reliability of the estimates obtained from market basket analysis, it boils down to sample size of the number of base items and co-occurrences that you have.

In theory, one could conduct statistical tests of significance (or construct confidence intervals) around all estimates of support, confidence, and lift to determine if the relationship is real.

In practice, to make sure that enough data is available, focus is usually put on the item sets that have the high support and high lift (not just high lift). Intuitively, this will give you the relationships that are the most likely to be significant, even without statistical testing.

Related Question