Solved – Plotting a heatmap based on clustering in R

biostatisticsclusteringdata miningdata visualizationr

  id       sam1  sam2  sam3  sam4
 gene1       33    23    88    98
 gene2        0     0    99    95
 gene3       77   100    44    65
 gene4        0     0     0     0
 gene5      100   100   100   100
  :
  :
 gene20000   58    33    78    56

I have 20K genes (rows) and four samples (columns) which is given in percentage. Each gene is common to four samples. The hypothesis is, if any of gene has 100% in any of sample it is said to be present in that sample and if 0% it is absent. (For eg. gene3 is present in sample2 (sam2) and gene4 is absent in all samples.)

Since I have 20K genes, I would like to do clustering technique. Can I do biclustering and produce a heatmap? If so, please let me know how to do in R?

Best Answer

Try the pheatmap package. The link includes example code and images for clustering and displaying genetic data. It can apply a variety of clustering methods to your data before displaying them.

You should be able to get started with just

library(pheatmap)
pheatmap(mat)

and then you'll have lots of options for adjusting the display and clustering parameters if the defaults don't suit you.