mathjax

Sunday, April 20, 2014

Movie Genre Similarity using Genre Co-occurrences

This page shows a few ways of visualizing movie genre similarity. The similarity scores were obtained by iterating through an IMDB dataset of 10,000 movies from the 1950's to the 2010's, and counting the number of times each genre appeared with the other genres. For example, if a movie's genres were Action, Adventure, and Comedy, the co-occurrence counts would go up for the following pairs: (Action, Adventure), (Action, Comedy), and (Adventure, Comedy). The resulting co-occurrence count matrix is shown below. 
The color map is as follows: 1.0 is perfect similarity, and 0 is total non-similarity. 

The next three visualizations focus on individual genres. For the Comedy genre plot below, the more similar a genre, the higher the label. The plot below is not to scale. The genres are simply ordered most-to-least similar, top-to-bottom. The data is also time sliced by decade, as indicated at the bottom of the plot. Observe the changes in rank over time. 




This plot is to scale, although that scale is not shown. 

The plot below shows essentially the same thing as above, but using line plots, showing scale, and with lower similarity genres filtered out. 


There are 24 genres in total, and not enough room for them here. Feel free to request a particular genre, and I'll try to post it. 
For those interested in more detailed view, below is a plot using 1-year time slices (as opposed to the 10-year slices used above), for the Sci-Fi genre. 
The plot below, for the Mystery genre, is for 2-year time slices. Notice the rise in similarity to the Thriller genre, and the jagged wave shape of the Horror genre. Apparently, intense feelings of fear and shock, combined with puzzlement, are popular approximately once per generation. 


This analysis was made with python, specifically numpy, pandas, matplotlib. 


No comments:

Post a Comment