Solved – Analysing over 300,000 rows in Excel to make pretty graphs

data visualizationexceltime series

I'm doing a research module for my Computer Science degree, and for my topic, I have collected over 500,000 tweets using the Twitter Streaming API, using a ruby script to store them in a Mongo database (BSON/JSON). I started recording the tweets on Tuesday 7th Feb, and stopped the following Tuesday, so there is a week's worth of tweets.

Here is what the spreadsheet looks like.

I have successfully exported around 300,000 tweets to an excel spreadsheet (I can hear groans already).

I would like to make some time series charts, for example volume of tweets over time and eventually include followers_count as a weighting. But I'm unsure as to how I would calculate this. I think I need to make the created_at column more meaningful to excel but converting it to a date/time it can understand.

I've also had a go with Rapid miner and managed to import a spreadsheet and convert the created_at field into something the program can understand, but I didn't really have any idea what I was doing after that!

I'd really appreciate some hints as I'm a bit stuck right now.

Best Answer

You can import the spreadsheet into R.

Then you can use the function qplot, from the ggplot2 package. It doesn't get much easier than that.

Related Question