Solved – What’s the difference between statistics and informatics

bioinformatics

We always say that statistics is just dealing with data. But we also know that informatics is also getting knowledge from data analysis. For example, bioinformatics people can totally go without biostatistics. I want to know what is the essential difference between statistics and informatics.

Best Answer

Excellent question!!

I heard several times that bioinformaticians can go without biostatistics, or even without statistics. That's perfectly true until it becomes false. In my opinion, general lack of statistical knowledge has disastrous effect in the field, as shown by Keith Baggerly. I could also observe that lack of basic knowledge in statistics (and linear algebra) is the cause of stagnation of bioinformaticians in the long run: without a deep knowledge of the theory, they tend to reinvent the wheel and resort to ad hoc solutions that solve nothing but their own problem. $ $$ $$ $$ $$ $$ $$ $$ $$ $$ $$ $$ $$ $$ $$ $

But now, to answer your question, I agree that overall, statistics can't do without computers those days. Yet, one of the major aspects of statistics is inference, which has nothing to do with computers. Statistical inference is actually what makes statistics a science, because it tells you whether or not your conclusions hold up in other contexts.

In short, you can analyze the hell out of your data, you will still need statistics to know the validity of the predictions or decisions you will make based on your analyses.

Related Solutions

Solved – Making sense out of statistics theory and applications

I can completely understand your situation. Even though I am PhD student, I find it hard sometimes to related theory and application. If you are willing to immerse yourself in understanding theory, it is definitely rewarding when you think about real world problems. But the process may be frustrating.

One of the many references that I like is Gelman and Hill's Data Analysis Using Hierarchical/Multilevel Models. They avoid the theory where they can express the underlying concept using simulations. It will definitely benefit you as you have experience in MCMC etc. As you say, you are working in bioinformatics, probably Harrell's Regression Modeling Strategies is a great reference too.

I will make this a community wiki and let others add to it.

Solved – the advantage of median polish over the median

What you call (linearly) "borrowing strength" corresponds to what statisticians refer to as affine equivariance. In essence, you want an affine equivariant estimator of location that is also robust to outliers. The best in class estimators are the SDE[1] and the FastMCD[2]

Both have several implementation in R. In both cases, the best implementation is probably in the rrcov package under the CovSde() and CovMcd() functions respectively.

library(MASS)
library(rrcov)
library(matrixStats)
CM<-matrix(0.95,5,5)
diag(CM)<-1
x<-mvrnorm(100,rep(0,5),CM)     
#the real data is correlated: you'd be better off borrowing 
#strength from the adjacent columns.    
z<-mvrnorm(10,rep(50,5),diag(5))    #the outliers
w<-rbind(x,z)

#all three essentially similar:
CovMcd(w)@center
CovSde(w)@center
colMeans(x)


#Not the same b/c of outliers
colMeans(w)
#Not the same b/c does not use the correlation structure:
colMedians(w)

[1] R. A. Maronna and V.J. Yohai (1995) The Behavior of the Stahel-Donoho Robust Multivariate Estimator. Journal of the American Statistical Association 90 (429), 330–341

[2] P. J. Rousseeuw and K. van Driessen (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41, 212–223.

Best Answer

Related Solutions

Solved – Making sense out of statistics theory and applications

Solved – the advantage of median polish over the median

Related Question