Solved – Is DoE applicable to collect data for machine learning model

experiment-designmachine learning

I'm currently working on a machine learning model for a classification task in an engineering application. While working on this project I realized that the provided data is insufficient to get a robust classification running.

Now I'm planning to collect more data using DoE methods like fractional factorial to capture the whole plausible range of levels for the factors while keeping the number of experimental runs on a reasonable level.

In the course of doing some research on this, I found no proof which verified these method to gather data for training of ML models. So I'm worried to miss something and to end up with just another bunch of insufficient or biased data.

Some figures: The DoE I'm thinking of consists of three continuous factors and five discrete factors with two or three levels.

Best Answer

Design of experiments (DoE) is most often used with regression (or ANOVA)-like models, machine learning here is a red herring, if your intended model is regression-like (including classification, maybe you should look into logistic regression), then surely you can use DoE. But to say much more, we need more details of your setup. But I would maybe start looking into fractional factorial designs.

Some similar posts here with answers is Machine Learning for optimization of configuration file and Is factorial experiment used only for prediction (regression or classification)?. Then search this site.

Related Question