Solved – Binary classification vs. continuous output with neural networks

classificationcontinuous dataneural networks

Wikipedia says in binary classification:

Tests whose results are of continuous values, such as most blood
values, can artificially be made binary by defining a cutoff value,
with test results being designated as positive or negative depending
on whether the resultant value is higher or lower than the cutoff.

Is there some guidance as to whether this is a desirable thing to do or not? I have data where the output value is continuous in the training set and I'm interested to know how strong the output variable is. Ideally an accurate continuous value would be the best, but I also would be satisfied with binary classification. My layman's assumption is that the binary classification task would be a little simpler. Is there any guidance as to whether to prefer continuous output vs binary classification?

Best Answer

It is a bad idea. It increases both type I and type II error. It also invokes "magical thinking" - that is, that something magical happens at the cutoff value. For example, with newborns, it is common to say babies under 2.5 kg are "low birth weight" and those above 2.5 kg are not. This treats a baby of 2.49 kg as being the same as one of 1.4 kg, but vastly different from a baby of 2.51 kg. Similarly, the 2.51 kg baby is treated just like a baby of 4.5 kg.

It is true that people sometimes need to make "yes/no" decisions based on the output of a statistical model. But the statistical model and its results should be a guide and a tool, not a straitjacket.

Related Question