Descriptive statistics definition is pretty clear to say that it summarizes data using statistical methods like mean, mode, median, and spread.
However, I came across the term 'exploratory' today while while reading "data analysis" in Python programming. I want to know what statistical methods are involved in this type of analysis? How is it different from the other one?
Best Answer
I'm not sure that these are sufficiently well defined anywhere to say definitively what is what in everyday conversation. I think if you look hard enough, you will be able to find something that an author or reviewer calls "descriptive" or "exploratory", but that someone else would say falls within their conception of the other.
That said, the idea was developed by John Tukey, who tried hard to make it clear. In his 1980 American Statistician article, Tukey wrote:
On the other hand, the reason some people may have suggested that EDA is just updated descriptive statistics might be that a cursory skim of Tukey's Exploratory Data Analysis book reveals it lists a lot of quick / simple techniques for describing data.
I would say that there are three perhaps related, but conceptually distinguishable, things that are sometimes called EDA, of which only one is what I think of as properly being EDA. Those are:
Data cleaning is the work of getting your data into shape so that they can be analyzed. This requires describing your data (e.g., getting minimum and maximum values) and trying to figure out what is going on. For instance, are all of the values measured in the same units? (Because, hey, why do that before sending your data to the statistician?) In biomedical research, descriptive data analysis is mostly constructing what is called "Table 1". It amounts to characterizing the sample on whom the study was run. Only the last part is true EDA, as Tukey had conceived it: What might have happened to produce these phenomena? Moving beyond our primary endpoint, what do these data suggest we look at next? What should be the central question for our follow-up study?
Part of the confusion is that all of this is an iterative process. Exploring the data may lead to additional description and cleaning, etc. Nonetheless, the distinction, as I see it, is how you understand what you are doing: Are you preparing the data for analysis, stating what the data are, or looking for insights?