Solved – Is a heat-map of gene expression more informative if Z-scores are used instead of actual expression measurement values

data visualizationgenetics

I have a heat-map of gene expression measurements (log2-transformed microarray signals, after inter-microarray data normalization, etc.) that I am using to illustrate the expression of 72 genes ('rows' of the heat-map) which I had identified as differentially expressed among different sub-groups of the 60 samples ('columns' of the heat-map, ordered by sub-groups) of my study. The ranges of gene measurements are within the 1-12 range (e.g., 4-8 for gene X, 2-10 for gene Y, and so on). It is a two-color heat-map, with the brightest green, black, and brightest red colors of the color scale used for values 1, 4 and 12, respectively.

A reviewer has commented that the heat-map will be more informative if Z-scores of the gene expression measurements are used instead. I don't get this because to me it seems that the heat-map will be less informative; Z-scoring will reduce the dimensionality of the data as one can no longer compare one gene to another for a given sample.

Can anyone comment on this? Thanks.

An image showing the current and reviewer-proposed heat-maps can be seen here: http://i.imgur.com/a2hmT.png

Best Answer

What the reviewer may be referring to is the bottom legend of your figure. It goes from 1 to 12, with 4 right in the middle, which is discomforting. This makes your absolute log expression values difficult to interpret, because when a gene goes from bright green to black, its expression level is multiplied by 16, but when it goes from black to bright red, it is multiplied by 256. In short, I don't think your figure could be "more informative", but the information could be more intuitive.

As explained by @fosgen, Z-scores are centered and normalized, so the user can interpret a color as $x$ standard deviations from the mean and have an intuitive idea of the relative variation of that value.

Like @fosgen, I think you should go for standardization by gene (standardization by cell type does not make sense to me in that context). Black will be the average expression across different cell types (set to 0) and the color distribution will be symmetrical on both sides.

Showing the (relative) gene-wise variation of expression is standard in the field, but you might have specific reasons to show the (absolute) log2-microarray measurements, in which case you can expose them to the reviewers. But I would still straigthen the color gradient to ease interpretation.