Solved – TMM-normalization of RNA-seq data in R language using edgeR package

biostatisticsnormalizationr

My data is in a numeric matrix of RNA-seq data from Illumina 2000 platform (with proper alignment and other preprocessing done), where columns represent subjects, and rows represent raw expression counts of genes. My goal is to use the normalized matrix for further regression etc. analyses (with other tools than edgeR).
I wrote a function to do this:

##getNormalized matrix
##input: numeric matrix
##output: numeric matrix with normalized counts
##requires edgeR package
getNormalizedMatrix <- function(M){
  require(edgeR)
  norm.factors <- calcNormFactors(M, method = "TMM")
  return(equalizeLibSizes(DGEList(ah, norm.factors = norm.factors))$pseudo.counts)
}

Is this the way I am supposed to do the TMM-normalization?

Best Answer

Well, your function doesn't entirely make sense as written, depending as it does on an undefined global variable ah.

Assuming that M is a matrix of counts, the edgeR User's Guide advises you to use:

dge <- DGEList(M)
dge <- calcNormFactors(dge)
logCPM <- cpm(dge, log=TRUE)

if your aim is to get normalized quantities for plotting etc.

The User's Guide advises you not to use equalizeLibSizes.