Solved – How does data augmentation reduce overfitting

classificationdata augmentationmachine learningoverfittingsampling

I'm trying to understant the benefit apported by the step of data augmentation in a classification algorithm.
I have a vector of hexadecimal strings and a column vector containing the label associated with the string in the same position. As an optional step in the classification algorithm, a data augmentation process is performed by subsetting the strings in pieces and replating the associated label for the number of split performed.

What are the benefit of this process?

Best Answer

Overfitting occurs when you have too few records relative to other parameters (e.g., predictors or features). I'm not familiar with your data, but it sounds like the subsetting is creating additional records.

Related Question