Solved – Classification model for movie rating prediction

cartclassification

I am somewhat new to data mining, and I am working on a classification model for movie rating prediction.

I have collected data sets from IMDB, and I am planning to use a decision trees and nearest neighbor approaches for my model. I would like to know which freely available data mining tool could provide the functionality that I require.

Best Answer

Hein,

there are a lot of tools and libs with the functionality available.

Which to choose depends whether you would like to use a gui for your work or if you would like to embed it in some other program.

Standalone Data mining tools (there are ohters like WEKA with Java interface):

  • Rapid Miner
  • Orange
  • Rattle gui for R
  • KNIME

Text based:

  • GNU R

Libs:

  • Scikit for Python
  • Mahout on Hadoop

If you know a programming language well enough I would use a lib for that language or give R a try. If not you may try one of the tools with gui.

A tree example in R:

# we are using the iris dataset
data(iris)

# for our tree based model we use the rpart package
# to download it type install.packages("rpart")
library(rpart)

# Building the tree
fit <- rpart(Species ~ Petal.Length + Petal.Width, method="class", data=iris)

# Plot the tree
plot(fit)
text(fit)

As suggested the analysis with R requires you to code yourself, but you will find a package for most classification tasks which will work out of the box. An overview can be found here Machine Learning Task View

To get started with RapidMinder you should have a look at Youtube. There are some screencasts, even for decision trees.