Solved – Cumulative / Cumulative Plot (or “Visualizing a Lorenz Curve”)

data visualizationdistributionsr

I don't know what such plots are called and thus I just gave this question a stupid title.

Let's say I have an ordered dataset as follows

4253  4262  4270  4383  4394  4476  4635  ...

Each number corresponds to the amount of postings a certain user contributed to a website. I am empirically investigating the "participation inequality" phenomenon as defined here.

In order to make it easy to grasp I would like to produce a plot which allows the reader to quickly deduce statements such as "10% of the users contribute 50% of the data". It should probably look similar to this admittedly pretty lousy paint sketch:

enter image description here

I have no clue how this is called thus I don't know where to look for. Also, if somebody had an implementation in R, that would be awesome.

Best Answer

If you want to do it simply with the basic R commands, then following codes may help.

At first you read the data.

person<-rep(1:7)
data<-c(4253, 4262, 4270, 4383, 4394, 4476, 4635)

Then you can see the contribution of each user.

plot(person,data)
lines(person,data)

enter image description here

You can also see how much the first two, three, four, ... , seven persons contribute.

cdata<-cumsum(data)    
plot(person,cdata)
lines(person,cdata)

enter image description here

Finally you can get your desired plot (in proportions in both axes) by the following commands:

plot(person/max(person),cdata/max(cdata),xlab="Top-contributing users",ylab="Data",col="red")
lines(person/max(person),cdata/max(cdata),col="red")

enter image description here

I have labelled the axes as you wanted. It can give you a clear view about how much percentage of data are being contributed by a certain proportion of persons.