Solved – Plot the probability mass function

data visualizationdiscrete datamatplotlibpythonscipy

I am trying to plot the probability mass function of a sample of a discrete metric.

If it was continuous, I know that using pandas it would be as simple as calling:

sample.plot(kind="density")

But I'm afraid that this is not enough (or not right) for my sample. Is there a function within matplotlib, scipy, numpy, etc. that I could use for plotting it?

Best Answer

There are two parts to your question - how to display discrete data (a data visualization issue) and how to do it in Python (a "what function do I call" issue).

I will deal with the first one.

With discrete distributions, there are a number of possible ways to display data.

Leaving aside direct implementation issues for the present, I see three main competitors:

the empirical cdf.
a sample probability function.

These are quite suitable for count data, for example.
a barplot.

This is quite suitable for ordered categories. If you order the bars from largest to smallest (or in some other meaningful-to-your-needs fashion), it's also suitable for unordered categories.

There are numerous other possibilities. However, I don't think a histogram is generally suitable for discrete data, especially not one where the bins are automatically chosen. The first problem is that a histogram density estimate uses area rather than height to convey relative probabilities, so it fairly directly conveys an impression of continuity. The second issue is with bin-width -- you need to choose it carefully or you may be doing things like having alternating bins either combining two categories or one, or perhaps having a smaller or larger gap between two categories than between the others (often an end-category):

As we see the gaps are not of constant width, throwing off the impression the plot conveys.

As for how you do things like this in python, after you choose a display, that would probably be a good, more specific question (but probably more on topic elsewhere; worded right it might fit better on StackOverflow, but you should check their help for what's on topic. With careful phrasing it might survive here, or it might work on Superuser.

Best Answer

Related Solutions

Solved – Calculating 2D Confidence Regions from MCMC Samples

Probability Estimation – How to Estimate Probability Mass Function from Observed Samples?

Related Question