[Tex/LaTex] How to create a bar chart in which the y-axis is expressed in percents

bar chartpgfplotstikz-pgf

Suppose there is a chart like the following:
Example of a chart

This chart was created using the code:

\documentclass{article}
\usepackage{tikz}
\usepackage{pgfplots}
\begin{document}
\begin{tikzpicture}
\begin{axis}[
ybar,
enlargelimits=0.15,
symbolic x coords={Category 1,Category 2,Category 3},
xtick=data,
nodes near coords,
]
\addplot coordinates {(Category 1,6) (Category 2,2) (Category 3,3.5)};
\addplot coordinates {(Category 2,2) (Category 3,2.5)};
\addplot coordinates {(Category 3,4)};
\end{axis}
\end{tikzpicture}
\end{document}

I was wondering if it is possible to modify this chart in pgfplots in
such a way that the y-axis expresses percents. So in this case, the blue
bar in Category 1 would go up to 100%; the blue bar and the red bar in
Category 2 would go up to 50% each; the blue bar, the red bar, and the
brown bar in Category 3 would go up to 35%, 25%, and 40% respectively.
(In other words, it is assumed that in each Category, the combined
values of the bars amount to 100%.) At the same time, I would like to
keep the non-percentage values in the label over the bars and mark only these values (6 in Cat. 1; 2 and 2 in Cat. 2 etc.) or combine them with percentages (6, 100% in Cat. 1; 2, 50% and 2, 50% in Cat. 2 etc.).

In short, my question is: how to do that in pgfplots? (And, if that is not possible, then perhaps straight in PGF/TikZ.)

EDIT: Obviously, you can achieve this manually in pgfplots by calculating the percentages, then writing them like (Category 1,100), then providing the y-axis with an appropriate description (e.g. “in percents”), and then changing the label of each bar so that it shows the non-percentage values. What I'm looking for, however, is a little bit more automated solution 🙂

Best Answer

For things like this, I would always provide the data in the form of a pgfplotstable (which requires the package of the same name to be loaded). You can create a table using something like

\pgfplotstableread[col sep=comma,header=false]{
Category 1,6,0,0
Category 2,2,2,0
Category 3,3.5,2.5,4
}\data

The table is then available as \data.

You can then add a new column sum that contains the sum for each row, using

\pgfplotstablecreatecol[
    create col/expr={
        \thisrow{1} + \thisrow{2} + \thisrow{3}
    }
]{sum}{\data}

You can then scale the data while you plot it, using

 \addplot table [y expr=\thisrow{1}/\thisrow{sum}*100,meta=1] {\data};

which specifies that the data point should be divided by the sum entry in the current row and multiplied by 100, while the label (the meta value) should still come from the unchanged column (column 1 in this case).

You can wrap this into a more comfortable style, so that you only have to call

\addplot table [percentage series=1] {\data}; 

to add a plot.

Edit: To print the original values and the percentages above the bars, you can use the argument to nodes near coords={...}. Since you're going to print two different variables, but only one is available as \pgfplotspointmeta, you need to make the other available using visualization depends on=<value> \as \<macro>. Unfortunately, this doesn't work if you delete some points using restrict y to domain=... or y filter (this looks like a bug, I'll look into what can be done about this). So in order to still only show the bars with nonzero values, one workaround is to draw the axis on top of the chart (otherwise you'd see thin lines where the zero bars are), and make the nodes near coords code check whether the values are larger than zero.

Here's the complete code:

\documentclass{article}
\usepackage{tikz}
\usepackage{pgfplots}
\usepackage{pgfplotstable}
\begin{document}

\pgfplotstableread[col sep=comma,header=false]{
Category 1,6,0,0
Category 2,2,2,0
Category 3,3.5,2.5,4
}\data

\pgfplotstablecreatecol[
    create col/expr={
        \thisrow{1} + \thisrow{2} + \thisrow{3}
    }
]{sum}{\data}

\pgfplotsset{
    percentage plot/.style={
        point meta=explicit,
    every node near coord/.append style={
        align=center,
        text width=1cm
    },
        nodes near coords={
        \pgfmathtruncatemacro\iszero{\originalvalue==0}
        \ifnum\iszero=0
            \pgfmathprintnumber{\originalvalue}$\,\%$\\ \pgfmathprintnumber[fixed zerofill,precision=1]{\pgfplotspointmeta}
        \fi},
    nodes near coords align=vertical,
        yticklabel=\pgfmathprintnumber{\tick}\,$\%$,
        ymin=0,
        ymax=100,
        enlarge y limits={upper,value=0.18},
    visualization depends on={y \as \originalvalue}
    },
    percentage series/.style={
        table/y expr=\thisrow{#1}/\thisrow{sum}*100,table/meta=#1
    }
}

\begin{tikzpicture}
\begin{axis}[
    axis on top,
    width=10cm,
    percentage plot,ybar,bar width=0.75cm,
    enlarge x limits=0.25,
    symbolic x coords={Category 1,Category 2,Category 3},
    xtick=data
]
\addplot table [percentage series=1] {\data};
\addplot table [percentage series=2] {\data};
\addplot table [percentage series=3] {\data};
\end{axis}
\end{tikzpicture}
\end{document}
Related Question