Data Visualization – Visualizing Movie and Actor Relationships

data visualization

First off, I'm a programmer but my experience with true statistics ended at A-Level so I'm looking to all of you for help with a little side project I've been tinkering with.

At home I use Plex Media Center to display all of my movies. I built an export tool for this to generate a HTML file containing information on your library so that others can view it online. After I made this tool I realised I now had access to a wealth of data about films and the actors in them. And this is where you guys (and gals) hopefully come in.

I want to visualize the relationships between actors and movies somehow. Initially I just used a node graph library to map all actors who have been in more than one movie to all their movies and ended up with this: http://www.flickr.com/photos/dachande663/5574979625/ [section of a 5000x2500px image]

The problem is, with anything more than 250 movies it just turns into a mess of spaghetti that's impossible to follow. I've looked into arc diagrams but think it would just be even more confusing.

My question therefore is: how do I visualize this? Size isn't too much of an issue as I'd love to print this out on a large canvas and actually hang it up. Also, I'll eventually replace the text with images of the respective movies and actors. What I'm trying to avoid is having a million lines snaking everywhere. I've tried to find the most important movies and place them more centrally but at the moment that's more guess work than actual logic.

Are there libraries that can do a better job of this, or even a better way of displaying the data (dropping actors as nodes and adding them as edge labels)? I'm currently using Dracula graph, which provides an okay-starting point but can change as needed.

Any input will be much appreciated. Cheers.

Best Answer

N.B.: This was previously a (long) comment that I've converted to an answer. Hopefully I'll be able to post an example of what I describe below within a day or two.

Why not try something like a heatmap? Have movies as rows and actors as columns. Maybe sort each of them in terms of the number of actors in the movie and number of movies each actor has been in. Then color each cell where there is a match. This is basically a visualization of the adjacency matrix. The proposed sorting should make some interesting patterns and the right use of color could make it both artistic and more informative. Maybe color by movie type or Netflix rating or proportion of male to female actors (or viewers!), etc.

Related Question