Solved – Making a heatmap with a precomputed distance matrix and data matrix in R

data visualizationr

I have made a heatmap based upon a regular data matrix in R, the package I use is pheatmap. Regular clustering of my samples is performed by the distfun function within the package.

Now I want to attach a precomputed distance matrix (generated by Unifrac) to my previously generated matrix/heatmap. Is this possible?

Best Answer

Ok, so you can just look at the code by typing the name of the function at the R prompt, or use edit(pheatmap) to see it in your default editor.

Around line 14 and 23, you'll see that another function is called for computing the distance matrices (for rows and columns), given a distance function (R dist) and a method (compatible with hclust for hierarchical clustering in R). What does this function do? Use getAnywhere("cluster_mat") to print it on screen, and you soon notice that it does nothing more than returning an hclust object, that is your dendrogram computed from the specified distance and linkage options.

So, if you already have your distance matrix, change line 14 (rows) or 23 (columns) so that it reads, e.g.

tree_row = hclust(my.dist.mat, method="complete")

where my.dist.mat is your own distance function, and complete is one of the many methods available in hclust (see help(hclust)). Here, it is important to use fix(pheatmap) and not edit(pheatmap); otherwise, the edited function will not be callable in the correct environment/namespace.

This is a quick and dirty hack that I would not recommend with larger package. It seems to work for me at least, that is I can use a custom distance matrix with complete linkage for the rows.

In sum, assuming your distance matrix is stored in a variable named dd,

library(pheatmap)
fix(pheatmap)
# 1. change the function as you see fit
# 2. save and go back to R
# 3. if your custom distance matrix was simply read as a matrix, make sure
#    it is read as a distance matrix
my.dist.map <- dd  # or as.dist(dd)

Then, you can call pheatmap as you did but now it will use the results of hclust applied to my.dist.map with complete linkage. Please note that you just have to ensure that cluster_rows=TRUE (which is the default). Now, you may be able to change

  • the linkage method
  • choose between rows or columns

by editing the package function appropriately.