I work for a health care company on our member satisfaction team where weights are constantly applied to match the sample to the populations of our service regions. This is very important for interpretable modeling that aims to explain magnitude of relationships between variables. We also use a lot of ML for other tasks, but it seems like you may be wondering if this is important when using machine learning for prediction.
As you hinted most machine learning techniques were not developed for the purpose of explaining relationships, but for predictive purposes. While a representative sample is important, it may not be critical..until your performance tanks.
If algorithms have sufficient samples to learn respondent types, they will be able to predict new respondents' class (classification) / value (regression) well. For example if you had a data set with 4 variables, height, weight, sex, and age, your algorithm of choice will learn certain types of a person based of these characteristics. Say most people in the population are female, 5'4", 35 years old, and 130 pounds (not fact, just roll with it) and we are trying to predict gender. Now say my sample has a low representation of this demographic proportionally, yet still has a high enough number (N) of this type of person. Our model has learned what that type of person looks like though that type of person is not well represented in my sample. When our model sees a new person with those characteristics it will have learned which label (gender) is most associated with said person. If our sample shows that those characteristics are more related to females than males and this matches the population then all is well. The problem arises when the sample's outcome variable does not represent the population by so much that it predicts a different class / value.
So when it comes down to it, testing your predictive ML model on representative data is where you can find out if you have a problem. However, I think it would be fairly rare to sample in such a biased way that prediction would suffer greatly. If accuracy / kappa statistic / AUC is low or RMSE is high when testing then you might want to shave off those people that over-represent demographics of interest given you have enough data.
These are not very strict terms and they are highly related. However:
- Loss function is usually a function defined on a data point, prediction and label, and measures the penalty. For example:
- square loss $l(f(x_i|\theta),y_i) = \left (f(x_i|\theta)-y_i \right )^2$, used in linear regression
- hinge loss $l(f(x_i|\theta), y_i) = \max(0, 1-f(x_i|\theta)y_i)$, used in SVM
- 0/1 loss $l(f(x_i|\theta), y_i) = 1 \iff f(x_i|\theta) \neq y_i$, used in theoretical analysis and definition of accuracy
- Cost function is usually more general. It might be a sum of loss functions over your training set plus some model complexity penalty (regularization). For example:
- Mean Squared Error $MSE(\theta) = \frac{1}{N} \sum_{i=1}^N \left (f(x_i|\theta)-y_i \right )^2$
- SVM cost function $SVM(\theta) = \|\theta\|^2 + C \sum_{i=1}^N \xi_i$ (there are additional constraints connecting $\xi_i$ with $C$ and with training set)
- Objective function is the most general term for any function that you optimize during training. For example, a probability of generating training set in maximum likelihood approach is a well defined objective function, but it is not a loss function nor cost function (however you could define an equivalent cost function). For example:
- MLE is a type of objective function (which you maximize)
- Divergence between classes can be an objective function but it is barely a cost function, unless you define something artificial, like 1-Divergence, and name it a cost
Long story short, I would say that:
A loss function is a part of a cost function which is a type of an objective function.
All that being said, thse terms are far from strict, and depending on context, research group, background, can shift and be used in a different meaning. With the main (only?) common thing being "loss" and "cost" functions being something that want wants to minimise, and objective function being something one wants to optimise (which can be both maximisation or minimisation).
Best Answer
The term "machine learning" is somewhat a term of art, but it generally refers to the construction of algorithms that "learn through experience". The requirement of learning through experience necessitates data, and so machine learning is necessarily "data-driven" --- after all, if not from data, what else would it learn from?
When we refer to a "model" in statistics or machine learning, we really just mean a set of assumptions that describe the presumed probabilistic process for the data, and the logical consequences of the assumptions (e.g., resulting distributions of statistics, estimators, etc.). Even very broad forms of non-parametric models are considered "models", so it encompasses a lot. It is difficult to conceive of how you could generate a machine learning algorithm without some assumptions about the generative process for the data, and consequently, one can probably broadly use the term "modelling" for any machine learning process. One might quibble with this, since some machine learning algorithms are broad non-parametric methods, but even here we usually called these "models", and consequently, I think it is reasonable to say that machine learning methods are built on "models". Even such simple methods as least-squares estimation are built on underlying statistical models.
There may certainly be situations in machine learning where an algorithm is built, and even deployed, without regard to setting underlying probabilistic assumptions. If the algorithm is sufficiently adaptive (in the sense that most non-parametric models are). In this case one could argue that the algorithm is "model-free" insofar as it was created without regard to any model. Even then, and even if the algorithm works well in a wide class of situations, one will still tend to find that there are cases where it works well and cases where it works badly. Consequently, subsequent analysts will usually be able to figure out the kinds of assumptions required to ensure that the algorithm works well when deployed in a situation. In this case, the "modelling" gradually catches up to the initial "model-free" creation of the algorithm as we begin to learn more about the situations where the algorithm works well or badly. So you could call some machine-learning algorithms "model-free" in one sense, but modelling catches us up in the end.
In view of these considerations, I think it is reasonable to say that all machine learning involves data-driven modelling. Of course, it is possible to do data-driven modelling without using a computer algorithm at all (e.g., calculation by pen and paper), and in these cases we would not usually call that "machine learning".