Solved – how to plot 1 dimensional dataset

data visualization

I have a series of results of some hadoop experiments. There are more than 500 datapoints (each experiment) and I want to show the overall timing.
I'm asking for an effective way to plot this dataset (1 graph per experiment obviously).
I'm not sure about an 'ordered' scatter plot like this (actually there is no real order but the timing order, the X value here is just a progressive number):
enter image description here

In this case I don't like the boxplot solution, sometimes the Q1 and mean are too close and the boxplot looks confusing.
Maybe a normal distribution to visually show the mean and variance, there are tools for generate a normal distribution linegraph from data?

Other ideas?

Best Answer

You do not want a normal distribution if your data is in fact skewed, as it clearly is in your example. Here are some thoughts:

First let's get some data similar to yours

set.seed(1)
t <- 40+24*rexp(520)

so something similar to your chart comes from

plot(sort(t), ylim=c(0,250), ylab="Time (s)")

while you say you do not like something like

boxplot(t, ylab="Time (s)")

and a cumulative distribution would look like a reflection of your original chart

plot.ecdf(t, xlim=c(0,250), xlab="Time (s)")

so you might consider a histogram

hist(t, breaks=10*(0:25), xlab="Time (s)")

or perhaps a rather similar smoothed density, possibly with the mean of the data shown

plot(density(t), xlim=c(0,250), xlab="Time (s)", main="Density of times")
abline(v=mean(t), col="red")

with the last of these looking something like

enter image description here