Solved – Open source Java library for statistics at the level offered by a graduate statistics course

javarsas

I am taking a graduate course in Applied Statistics that uses the following textbook (to give you a feel for the level of the material being covered): Statistical Concepts and Methods, by G. K. Bhattacharyya and R. A. Johnson.

The Professor requires us to use SAS for the homeworks.

My question is that: is there a Java library(ies), that can be used instead of SAS for problems typically seen in such classes.

I am currently trying to make do with Apache Math Commons and though I am impressed with the library (it's ease of use and understandability) it seems to lack even simple things such as the ability to draw histograms (thinking of combining it with a charting library).

I have looked at Colt, but my initial interest died down pretty quickly.

Would appreciate any input — and I've looked at similar questions on Stackoverflow but have not found anything compelling.

NOTE: I am aware of R, SciPy and Octave and java libraries that make calls to them — I am looking for a Java native library or set of libraries that can together provide the features I'm looking for.

NOTE: The topics covered in such a class typically include: one-samle and two-sample tests and confidence intervals for means and medians, descriptive statistics, goodness-of-fit tests, one- and two-way ANOVA, simultaneous inference, testing variances, regression analysis, and categorical data analysis.

Best Answer

When I am forced to use java for basic statistics, apache commons math is the way to go. For plots, I use and recommend JFreeChart. The latter is widely spread, so stackoverflow even has a populated tag for it.

Edit

If one looks for a suite, then maybe Deducer is an option. The GUI is based on JGR meanwhile the statistical parts are called in R. It seems to be extendable both via R and java. One could e.g. skip the calls to the Rengine but call referenced java libraries instead. But I admit, I did not try it yet.

As far as I have understood the OP, the optimum would be something like Rapidminer for Statistics, since Rapidminer is a pure java framework which supports GUI access (including visualizations), usage as library and custom plugin development. To the best of my knowledge, something like that for statistics does not exist. I do not recommend Rapidminer for that particular task, because to the best of my knowledge it only includes the most basic statistical tests. The visualizations have been extended lately, but I cannot estimate how customizable they are now.