Solved – Supervised approaches vs. topic models in sentiment analysis

machine learningsentiment analysistopic-modelsunsupervised learning

I am researching Sentiment Analysis over social media, particularly classifying online texts such as blog posts as positive, negative or neutral.

Most of the approaches I have found for sentiment analysis are supervised (they need labeled data to train a classifier). However, I have also found a couple of papers that do it using joint topic-sentiment models (unsupervised) like this one.

According to the results in the topic model papers, the main advantage of unsupervised approaches based on topic models is that they do no need any labeled data (apart from prior "general" sentiment information, i.e. a dictionary of positive/negative words). However, they do not reach the accuracy of a supervised approach (2% less of accuracy).

Are there any other advantages/disadvantages for using topic-sentiment models for sentiment classification instead of supervised approaches?

Thanks.

Best Answer

One disadvantage of an unsupervised method like LDA is it will generally take considerably longer to train compared to supervised methods. I'm also confused about the 2% increase you mention, based on table 2 it looks like an 8% difference between the best supervised approach they compared against and their best unsupervised model.

While I generally like the idea of "how far can you push unsupervised learning", sentiment seems like a poor fit in pracitce. I say this because sentiment analysis is one of the domains where it's easiest (cost, effort) to get labeled data due to the massive amount of reviews and review like content available on the internet. If your ultimate goal is to classify accurately, even the unsupervised paper you linked seems to suggest you will be better off spending your time scraping this data, as opposed to spending your time building dictionaries of positive negative words and incorporating priors.

Related Question