Solved – How does one most easily overfit

overfitting

This is a weird question, I know.

I'm just a noob and trying to learn about different classifier options and how they work. So I'm asking the question:

Given a dataset of n1-dimensions and n2-observations where each observation can be classified into n3-buckets, which algorithm most efficiently (ideally with just one training iteration) produces a model (classification boundary) that would perfectly classify every observation in the dataset (completely overfit)?

In other words, how does one most easily overfit?

(Please don't lecture me on 'not overfitting'. This is just for theoretical educational purposes.)

I have a suspicion that the answer something like this: "Well, if your number of dimensions is greater than your number of observations, use X algorithm to draw the boundary(ies), otherwise use Y algorithm."

I also have a suspicion that the answer will say: "You can draw a smooth boundary, but that more computationally expensive than drawing straight lines between all differing classified observations."

But that's as far as my intuition will guide me. Can you help?

I have a hand-drawn example of what I think I'm talking about in 2D with binary classification.

Basically, just split the difference, right? What algorithm does this efficiently for n-dimensions?

Basically just split the difference, right? What algorithm does this efficiently?

Best Answer

As long as all the observations are unique, then K-nearest neighbors with K set to 1 and with any arbitrary valid distance metric will give a classifier which perfectly fits the training set (since the nearest neighbor of every point in the training set is trivially, itself). And it's probably the most efficient since no training at all is needed.

Is that the most efficient way to encode the Boundary? Probably right? Since we don't know if the data is entirely random or not, using the data itself as the encoded model with KNN algorithm is probably the best you can generally do. Right?

It's the most time-efficient, but not necessarily the most space efficient.