Solved – Difference between experimental data and observational data

causalitydata miningdataset

I'm a novice to data mining and started to read about it. What's the exact difference between experimental data and observation data? Both are obviously data; and many say observation data can lead to errors. But I guess it's not possible to do an experiment for all data sets. I'm really confused, explain me what is experimental data and observation data and say when these should be used?

Thanks in advance.

Best Answer

wow, that's a tough one :-)

That question is far more widely relevant than just in data mining. It comes up in medicine and in the social sciences including psychology all the time.

The distinction is necessary when it comes to drawing conclusions about causality, that is, when you want to know if something (e.g. a medical treatment) causes another thing (e.g. recovery of a patient). Hordes of scientists and philosophers debate whether you can draw conclusions about causality from observational studies or not. You might want to look at the question statistics and causal inference?.

So what is an experiment? Concisely, an experiment is often defined as random assignment of observational units to different conditions, and conditions differ by the treatment of observational units. Treatment is a generic term, which translates most easily in medical applications (e.g. patients are treated differently under different conditions), but it also applies to other areas. There are variations of experiments --- you might want to start by reading the wikipedia entries for Experiment and randomized experiment --- but the one crucial point is random assignment of subjects to conditions.

With that in mind, it is definitely not possible to do an experiment for all kinds of hypotheses you want to test. For example, you sometimes can't do experiments for ethical reasons, e.g. you don't want people to suffer because of a treatment. In other cases, it might be physically impossible to conduct an experiment.

So whereas experimentation (controlled randomized assignment to treatment conditions) is the primary way to draw conclusions about causality --- and for some, it is the only way --- people still want to do something empirical in those cases where experiments are not possible. That's when you want to do an observational study.

To define an observational study, I draw on Paul Rosenbaums entry in the encyclopedia of statistics in behavioral science: An observational study is "an empiric comparison of treated and control groups in which the objective is to elucidate cause-and-effect relationships [. . . in which it] is not feasible to use controlled experimentation, in the sense of being able to impose the procedures or treatments whose effects it is desired to discover, or to assign subjects at random to different procedures." In an observational study, you try to measure as many variables as possible, and you want to test hypotheses about what changes in a set of those variables are associated with changes in other sets of variables, often with the goal of drawing conclusions about causality in these associations (see Under what conditions does correlation imply causation

In what ways can observational studies lead to errors? Primarily if you want to draw conclusions about causality. The issue that arises is that there might always be the chance that some variables you did not observe are the "real" causes (often called "unmeasured confounding"), so you might falsely assume that one of your measured variables is causing something, whereas "in truth" it is one of the unmeasured confounders. In experiments, the general assumption is that by random assignment potential confounders will get canceled out.

If you want to know more, start by going through the links provided, and look at publications from people like Paul Rosenbaum or the book-link provided by iopsych: Experimental and Quasi-Experimental Designs for Generalized Causal Inference (Shadish, Cook, and Campbell, (2002)

Best Answer

Related Solutions

Solved – the practical difference between association rules and decision trees in data mining

Related Question