Solved – Exploratory vs. Descriptive Statistical Analysis

definitiondescriptive statisticsexploratory-data-analysis

Descriptive statistics definition is pretty clear to say that it summarizes data using statistical methods like mean, mode, median, and spread.

However, I came across the term 'exploratory' today while while reading "data analysis" in Python programming. I want to know what statistical methods are involved in this type of analysis? How is it different from the other one?

Best Answer

I'm not sure that these are sufficiently well defined anywhere to say definitively what is what in everyday conversation. I think if you look hard enough, you will be able to find something that an author or reviewer calls "descriptive" or "exploratory", but that someone else would say falls within their conception of the other.

That said, the idea was developed by John Tukey, who tried hard to make it clear. In his 1980 American Statistician article, Tukey wrote:

Some have suggested that "exploratory data analysis" is just "descriptive statistics" brought somewhat up to date. Much effort, much intelligence and understanding has been devoted in recent years to convince us that "the map is not the region"! Perhaps an equal effort, at least among statisticians, is needed to persuade us of the equally true statement, "the usual bundle of techniques is not a field of intellectual activity"!
     If we need a short suggestion of what exploratory data anaysis is, I would suggest that

  1. It is an attitude, AND
  2. A flexibility, AND
  3. Some graph paper (or transparencies or both).

No catalog of techniques can convey a willingness to look for what can be seen, whether or not anticipated. Yet this is at the heart of exploratory data analysis. The graph paper—and transparencies—are there, not as a technique, but rather as a recognition that the picture-examining eye is the best finder we have of the wholly unanticipated.

On the other hand, the reason some people may have suggested that EDA is just updated descriptive statistics might be that a cursory skim of Tukey's Exploratory Data Analysis book reveals it lists a lot of quick / simple techniques for describing data.


I would say that there are three perhaps related, but conceptually distinguishable, things that are sometimes called EDA, of which only one is what I think of as properly being EDA. Those are:

  1. Data cleaning
  2. Initial / descriptive data analyses
  3. Question finding / hypothesis generation

Data cleaning is the work of getting your data into shape so that they can be analyzed. This requires describing your data (e.g., getting minimum and maximum values) and trying to figure out what is going on. For instance, are all of the values measured in the same units? (Because, hey, why do that before sending your data to the statistician?) In biomedical research, descriptive data analysis is mostly constructing what is called "Table 1". It amounts to characterizing the sample on whom the study was run. Only the last part is true EDA, as Tukey had conceived it: What might have happened to produce these phenomena? Moving beyond our primary endpoint, what do these data suggest we look at next? What should be the central question for our follow-up study?

Part of the confusion is that all of this is an iterative process. Exploring the data may lead to additional description and cleaning, etc. Nonetheless, the distinction, as I see it, is how you understand what you are doing: Are you preparing the data for analysis, stating what the data are, or looking for insights?

Related Question