I see that both functions are part of data mining methods such as Gradient Boosting Regressors. I see that those are separate objects too.
How is the relationship between both in general?
classificationdata miningdecision-theoryregression
I see that both functions are part of data mining methods such as Gradient Boosting Regressors. I see that those are separate objects too.
How is the relationship between both in general?
Jerome Friedman wrote a paper a while back: Data Mining and Statistics: What's the Connection?, which I think you'll find interesting.
Data mining was a largely commercial concern and driven by business needs (coupled with the "need" for vendors to sell software and hardware systems to businesses). One thing Friedman noted was that all the "features" being hyped originated outside of statistics -- from algorithms and methods like neural nets to GUI driven data analysis -- and none of the traditional statistical offerings seemed to be a part of any of these systems (regression, hypothesis testing, etc). "Our core methodology has largely been ignored." It was also sold as user driven along the lines of what you noted: here's my data, here's my "business question", give me an answer.
I think Friedman was trying to provoke. He didn't think data mining had serious intellectual underpinnings where methodology was concerned, but that this would change and statisticians ought to play a part rather than ignoring it.
My own impression is that this has more or less happened. The lines have been blurred. Statisticians now publish in data mining journals. Data miners these days seem to have some sort of statistical training. While data mining packages still don't hype generalized linear models, logistic regression is well known among the analysts -- in addition to clustering and neural nets. Optimal experimental design may not be part of the data mining core, but the software can be coaxed to spit out p-values. Progress!
There is considerable overlap among these, but some distinctions can be made. Of necessity, I will have to over-simplify some things or give short-shrift to others, but I will do my best to give some sense of these areas.
Firstly, Artificial Intelligence is fairly distinct from the rest. AI is the study of how to create intelligent agents. In practice, it is how to program a computer to behave and perform a task as an intelligent agent (say, a person) would. This does not have to involve learning or induction at all, it can just be a way to 'build a better mousetrap'. For example, AI applications have included programs to monitor and control ongoing processes (e.g., increase aspect A if it seems too low). Notice that AI can include darn-near anything that a machine does, so long as it doesn't do it 'stupidly'.
In practice, however, most tasks that require intelligence require an ability to induce new knowledge from experiences. Thus, a large area within AI is machine learning. A computer program is said to learn some task from experience if its performance at the task improves with experience, according to some performance measure. Machine learning involves the study of algorithms that can extract information automatically (i.e., without on-line human guidance). It is certainly the case that some of these procedures include ideas derived directly from, or inspired by, classical statistics, but they don't have to be. Similarly to AI, machine learning is very broad and can include almost everything, so long as there is some inductive component to it. An example of a machine learning algorithm might be a Kalman filter.
Data mining is an area that has taken much of its inspiration and techniques from machine learning (and some, also, from statistics), but is put to different ends. Data mining is carried out by a person, in a specific situation, on a particular data set, with a goal in mind. Typically, this person wants to leverage the power of the various pattern recognition techniques that have been developed in machine learning. Quite often, the data set is massive, complicated, and/or may have special problems (such as there are more variables than observations). Usually, the goal is either to discover / generate some preliminary insights in an area where there really was little knowledge beforehand, or to be able to predict future observations accurately. Moreover, data mining procedures could be either 'unsupervised' (we don't know the answer--discovery) or 'supervised' (we know the answer--prediction). Note that the goal is generally not to develop a more sophisticated understanding of the underlying data generating process. Common data mining techniques would include cluster analyses, classification and regression trees, and neural networks.
I suppose I needn't say much to explain what statistics is on this site, but perhaps I can say a few things. Classical statistics (here I mean both frequentist and Bayesian) is a sub-topic within mathematics. I think of it as largely the intersection of what we know about probability and what we know about optimization. Although mathematical statistics can be studied as simply a Platonic object of inquiry, it is mostly understood as more practical and applied in character than other, more rarefied areas of mathematics. As such (and notably in contrast to data mining above), it is mostly employed towards better understanding some particular data generating process. Thus, it usually starts with a formally specified model, and from this are derived procedures to accurately extract that model from noisy instances (i.e., estimation--by optimizing some loss function) and to be able to distinguish it from other possibilities (i.e., inferences based on known properties of sampling distributions). The prototypical statistical technique is regression.
Best Answer
A decision function is a function which takes a dataset as input and gives a decision as output. What the decision can be depends on the problem at hand. Examples include:
Typically, there are an infinite number of decision functions available for a problem. If we for instance are interested in estimating the height of Swedish males based on ten observations $\mathbf{x}=(x_1,x_2,\ldots,x_{10})$, we can use any of the following decision functions $d(\mathbf{x})$:
How then can we determine which of these decision functions to use? One way is to use a loss function, which describes the loss (or cost) associated with all possible decisions. Different decision functions will tend to lead to different types of mistakes. The loss function tells us which type of mistakes we should be more concerned about. The best decision function is the function that yields the lowest expected loss. What is meant by expected loss depends on the setting (in particular, whether we are talking about frequentist or Bayesian statistics).
In summary: