I doubt that there is a clear and consistent distinction across statistically minded sciences and fields between regression and curve-fitting.
Regression without qualification implies linear regression and least-squares estimation. That doesn't rule out other or broader senses: indeed once you allow logit, Poisson, negative binomial regression, etc., etc. it gets harder to see what modelling is not regression in some sense.
Curve-fitting does literally suggest a curve that can be drawn on a plane or at least in a low-dimensional space. Regression is not so bounded and can predict surfaces in a several dimensional space.
Curve-fitting may or may not use linear regression and/or least squares. It might refer to fitting a polynomial (power series) or a set of sine and cosine terms or in some other way actually qualify as linear regression in the key sense of fitting a functional form linear in the parameters. Indeed curve-fitting when nonlinear regression is regression too.
The term curve-fitting could be used in a disparaging, derogatory, deprecatory or dismissive sense ("that's just curve fitting!") or (almost the complete opposite) it might refer to fitting a specific curve carefully chosen with specific physical (biological, economic, whatever) rationale or tailored to match particular kinds of initial or limiting behaviour (e.g. being always positive, bounded in one or both directions, monotone, with an inflexion, with a single turning point, oscillatory, etc.).
One of several fuzzy issues here is that the same functional form can be at best empirical in some circumstances and excellent theory in others. Newton taught that trajectories of projectiles can be parabolic, and so naturally fitted by quadratics, whereas a quadratic fitted to age dependency in the social sciences is often just a fudge that matches some curvature in the data. Exponential decay is a really good approximation for radioactive isotopes and a sometimes not too crazy guess for the way that land values decline with distance from a centre.
Your example gets no explicit guesses from me. Much of the point here is that with a very small set of data and precisely no information on what the variables are or how they are expected to behave it could be irresponsible or foolish to suggest a model form. Perhaps the data should rise sharply from (0, 0) and then approach (1, 1), or perhaps something else. You tell us!
Note. Neither regression nor curve-fitting is limited to single predictors or single parameters (coefficients).
An ROC curve visualizes the performance of a model in different configurations (= cutoffs), and hence the second option is the right way.
In the first option you are somehow plotting points of different models (same learning approach with different hyperparameters), which is not related to ROC curves. In fact, which points would you even plot of these different models to make them somewhat calibrated and comparable? All points with $P(\hat{Y}=1) = 0.5$?
Best Answer
I am answering this from a pragmatic perspective, simply by looking at code and deducing from examples. A more theoretic answer could be a great supplement.
Generally both can be used. The difference is well explained here.
Yet most relevant, not all algorithms offer both
predict_proba
anddecision_function
. To my knowledge, every classifiers allowspredict_proba
in sklearn. For some - specifically SVC (Support Vector Classification) - both give exactly the same result. To check, I used this example and change the code once usingpredict_proba
and oncedecision_function
.Specifically I changed:
to:
And both yield the exact same result as you can see in the images:
Yet this only counts for SVC where the distance to the decision plane is used to compute the probability - therefore no difference in the ROC.
In another example a specific line of code is relevant for this question:
Therefore, deducing from the sklearn example, I would recommend you use the
decision_function
wherever possible and if not, use the probability provided bypredict_proba
.Examples for algorithms which do not provide a
decision_function
in sklearn:KNeighborsClassifier()
RandomForestClassifier()
GaussianNB()