If you’d like to take a look at the data referenced in this post and the script I wrote to gather them, feel free to head over to my Github.
I’ll admit it: I take stock in video game reviews. I don’t read them religiously, nor am I insulted when a site rates a favorite of mine a bit too low. I read reviews to find out which games I might like. With so many games and so little time, reviews and ratings are, for me, a much-needed filtering process.
But video game ratings come with problems. On a 10 point scale, what’s the difference between a 7.3 and a 7.5? Even a 7 and an 8? And why does it seem like reviewers hardly dish out scores from the bottom half of their scales? How can reviewers package their opinions, highly subjective and finicky things that they are, into definitive scores? Numbers feel a lot more objective than words. It might be this perceived objectivity that makes people protest the seemingly arbitrary scores that reviewers select. (I wouldn’t call these ratings arbitrary — they’re backed by opinion. Subjective? Yes.)
Present-day game rating aggregators add to these problems. Averaging across multiple sites, each with their own distributions, makes things messy. A score of 7 from one site very likely might be equivalent to a 5 from another. And what about those troublesome letter-grade ratings? Metacritic converts Cs to 50s — but last time I checked, a C mapped to a 70 percent or so in school. Metacritic claims that they generate their aggregate scores after running their data through a weighted, proprietary algorithm, and I’m sure they do. But I question their algorithm’s efficacy. Pick a few games on their website and do an old-fashioned average of the scores they list. The score you come out with will likely closely resemble their aggregate score.
But I digress.
I decided to do some investigation into the nature of game ratings. I’m currently in the process of building a dataset of game ratings from different sites. I started with IGN. After a bit of web scraping, I collected review data on the 75,005 games that IGN kept track of as of July 13, 2013. Of those, only 17,027 had ratings in IGN’s index (others were not rated, marked as “NR”). I coded a python script to collect the data and used BeautifulSoup for HTML parsing. I analyzed the data with an evaluation copy of Wizard, a suite of statistical tools.
Without further ado, here are some summary statistics of IGN’s rating distribution, peppered with some tidbits of information that I found interesting.
Data retrieval date: July 13, 2013
Population size: 17,027 rated games
Mean score: Approximately 6.9
Standard deviation: Approximately 1.7
Interquartile range: 2.1
Lower quartile: 6.0
Median score: 7.2
Upper quartile 8.1
IGN’s index erroneously assigns a number of games 0s in their index (at least one game, though, actually did manage to earn a zero). I didn’t manually prune these entries from my data, so the above summary statistics are slightly off.
The highs and the lows
Only 314 games, or about two percent of rated games, received a score of 9.5 or above. IGN decorated 38 games with 10s. Classic games (the Zelda, Mario, and Pokemon franchises) dominated here. Some newer games did manage to break into the Hall of 10s, notably Naughty Dog’s third Uncharted entry and their recent The Last of Us. Rockstar’s GTA games and Red Dead DLC also carved out spots for themselves, as did a couple of Kojima’s Metal Gear Solid games. Some 10s you might not have heard of include Checkered Flag, Joust, and Shanghai (all Atari Lynx games), and Tornado Mania, a mobile game. Infinity Blade II was the sole representer of the iPhone.
2,913 games managed to be bad enough to nab a score of 5 or less, accounting for about 17 percent of total games. With a zero, Olympic Hockey Nagano ’98 boasts the lowest score of all games. Looney Tunes: Back in Action: Zany Race followed closely with a 0.5. Two other games, Extreme PaintBrawl and Action Girlz Racing, joined the less-than-one ranks.
Of the major modern consoles, the Nintendo Wii had the lowest median rating (a 6.8). The Wii U, Xbox 360, PS3, and PC have median scores of 7.5.
A closing remark
With a median of 7.2 and an IQR of 2.1 (i.e., 50 percent of scores lie between 6.0 and 8.1), it does look like IGN awards higher scores more often than not. This does not mean that they’re doing anything illicit (this might seem obvious, but you’d be surprised at some of the shoddy “journalism” out there that sensationally misinterprets data). Perhaps IGN thinks that most games just aren’t that bad.
I’ve got a hunch that other review sites’ distributions won’t match IGN’s, just as they likely won’t match each other. So I’ll gather more data — maybe I’ll be able to do something with them.
Update, July 23, 2013: Previously, this article linked to a website that I believed had misinterpreted ratings data. This post no longer links to that site.
Update, July 28, 2013: This post now links to my Github repository containing the source code and data referenced here.