Solved – How to detect noisy datasets (bias and variance trade-off)

bias-variance tradeoffmachine learningnoisevariance

Studying the bias-variance trade-off:

expected loss = bias + variance + noise

I understand that we minimize this quantity by finding the "best" balance between low bias/high variance and high bias/low variance. However, the noise term is beyond our control. So in a sense, if noise is large, then learning is pointless, right? Are there techniques for detecting when this might be the case?

Best Answer

When noise is "large" then learning is not pointless, but it's "expensive" in some sense. For instance, you know the expression "house always wins". It means that the odds favor the casino against the gambler. However, the odds can be very close to 1:1, they may only so slightly be tilted towards the "house", e.g. 0.5%. Hence, you may call the outcomes series very noisy in some cases, yet the casinos make a ton of money in a long run. So, the fact that the data is "noisy" doesn't mean in isolation that the learning will be pointless or useless or unprofitable.