What are the most accepted ways to visualize the results of an independent two sample t-test? Is a numeric table more often used or some sort of plot? The goal is for a casual observer to look at the figure and immediately see that they are probably from two different populations.
Solved – How to visualize independent two sample t-test
data visualizationt-test
Related Solutions
There's not 'one correct way'; there are some good ways.
The obvious one to my mind would be a Cleveland dot-chart; it's for displaying numeric data on a factor.
Some people would use a bar chart for this purpose instead. If you have a useful classification (such as by region), you'd split by that classification.
With GDPs (whether raw or per capita), the variable covers several orders of magnitude, so it might make a great deal more sense to look on the log-scale (this also obviates any concerns some people might have with 0 not being on the scale above).
There are several uses in such a plot. 1. explicit comparison between countries (is A larger than B?). 2. extracting a data value (what is A's GDP?).
The Cleveland dotchart (or Cleveland dot plot) is based on research[1] into the kinds of comparisons that people are good at or less good at. We're very good at comparison of position along common scales, slightly less good with relative lengths and quite bad at relative areas or angles. In respect of 1. above this comparison is between the values represented by the points (which point is further to the right). In 2. this comparison is between the point and the parallel axis, both comparisons we're good at. The plot eliminates almost all ink that doesn't serve to directly aid these comparisons.
Quick, which is bigger, lemon or lime?
Very thin bars would make for a very similar sort of plot to a Cleveland dot-chart and can sometimes do well (particularly when both plots include 0), but dotcharts have an advantage when you want to plot several numbers for each country, since they can be represented by different symbols. This advantage is even larger if you're only able to use black and white. You also can't really use a log-scale on bar charts (where does the bottom of the bar start and what does the bar-length represent?) and so it's less suitable for data that spans several orders of magnitude.
[1]: Cleveland W.S. and McGill, R. (1984),
"Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods,"
Journal of the American Statistical Association, 79:387 (Sep.), 531-554.
I think this is a common misunderstanding of the CLT. Not only does the CLT have nothing to do with preserving type II error (which no one has mentioned here) but it is often not applicable when you must estimate the population variance. The sample variance can be very far from a scaled chi-squared distribution when the data are non-Gaussian, so the CLT may not apply even when the sample size exceeds tens of thousands. For many distributions the SD is not even a good measure of dispersion.
To really use the CLT, one of two things must be true: (1) the sample standard deviation works as a measure of dispersion for the true unknown distribution or (2) the true population standard deviation is known. That is very often not the case. And an example of n=20,000 being far too small for the CLT to "work" comes from drawing samples from the lognormal distribution as discussed elsewhere on this site.
The sample standard deviation "works" as a dispersion measure if for example the distribution is symmetric and does not have tails that are heavier than the Gaussian distribution.
I do not want to rely on the CLT for any of my analyses.
Best Answer
It is worth being clear on the purpose of your plot. In general, there are two different kinds of goals: you can make plots for yourself to assess the assumptions you are making and guide the data analysis process, or you can make plots to communicate a result to others. These are not the same; for example, many viewers / readers of your plot / analysis may be statistically unsophisticated, and may not be familiar with the idea of, say, equal variance and its role in a t-test. You want your plot to convey the important information about your data even to consumers like them. They are implicitly trusting that you have done things correctly. From your question setup, I gather you are after the latter type.
Realistically, the most common and accepted plot for communicating the results of a t-test1 to others (set aside whether it is actually the most appropriate) is a bar chart of means with standard error bars. This does match the t-test very well in that a t-test compares two means using their standard errors. When you have two independent groups, this will yield a picture that is intuitive, even for the statistically unsophisticated, and (data willing) people can "immediately see that they are probably from two different populations". Here is a simple example using @Tim's data:
That said, data visualization specialists typically disdain these plots. They are often derided as "dynamite plots" (cf., Why dynamite plots are bad). In particular, if you have only a few data, it is often recommended that you simply show the data themselves. If the points overlap, you can jitter them horizontally (add a small amount of random noise) so that they no longer overlap. Because a t-test is fundamentally about means and standard errors, it is best to overlay the means and standard errors onto such a plot. Here is a different version:
If you have a lot of data, boxplots may be a better choice to get a quick overview of the distributions, and you can overlay the means and SEs there too.
Simple plots of the data, and boxplots, are sufficiently simple that most people will be able to understand them even if they aren't very statistically savvy. Bear in mind, though, that none of these make it easy to assess the validity of having used a t-test to compare your groups. Those goals are best served by different kinds of plots.
1. Note that this discussion assumes an independent samples t-test. These plots could be used with a dependent samples t-test, but could also be misleading in that context (cf., Is using error bars for means in a within-subjects study wrong?).