Solved – Does anyone have experience with IBM’s “SPSS Modeler”

data miningmodelingpredictive-modelssoftwarespss-modeler

SPSS Modeler seems like a great tool for data mining (especially for prediction etc.) but it is extremely costly for individuals like me (around 20,000 euros excl. tax). There is also a video.

I am wondering if software as expensive as this really adds value. If it does, my university may consider purchasing it.

What seems unique with this software is that it allows you to do advanced computations without having the programming and statistical knowledge required for languages such as R.

What are your experiences with SPSS Modeler? How does it compare with other GUI based statistical software?

Best Answer

My experiences with SPSS Modeler are very positive. It allows you to build complex systems by simply connecting and disconnecting nodes, and it has a nice graphical representation of the node network that you have built. I prefer seeing the whole "picture", as it is easy to get lost in code if you were to code everything yourself and if things go very complex.

There are nodes for manipulating with rows, columns, outputs and inputs, deriving new features, various modelling nodes, charting and much more. To me it serves as a multi-purpose tool, because I can use SQL to query directly from a database (through an ODBC connection) or use text files (and transform them if broken) if a database connection is not present. It even supports Excel files.

It has a built-in legacy scripting language; a formula builder (to build your logic for calculations, node-specific manipulations, selections etc.); and since version 16 it also supports scripting in Python (scripting in SPSS Modeler is mostly used for automation of tasks - say, you need to regularly perform some calculations as part of some ETL process).

As for statistical modelling, there are a bunch of modelling specific nodes (categorized under clusters, classifiers, networks) which can be fine-tuned with parameters (just as you would define your parameters in R, Python or elsewhere). You can even use R in a modelling node (a specific node for R). Those models can be saved, re-trained, compared etc. There are even auto-classifiers or auto-clusters that try a bunch of stuff and then let you choose the best model (which mostly, of course, varies on input data).

I believe this software is incomparable to other GUI based statistical modelling software and justifies its price.

-- 3 years late to reply, but maybe this helps someone :)