Want To Predict The Best Picture Oscar Winner? Maybe Our Data Can Help
In response to last year's Best Picture envelope debacle, The New Yorker's Adam Gopnik wondered if the "Moonlight"-"La La Land" screw-up was proof that life's just one big computer simulation gone haywire. That was not-yet three months after the 2016 US presidential election threw the world for a loop. Everyone with a platform looked for profound insight, sly humor or both in every unexpected curve life threw out there for a while.
The way we experienced the Oscars night slip-up made it fantastical — all the surrounding context, the films in question, the way the people involved reacted — but the envelope swap itself was an ordinary, seemingly inevitable screw up. Something like that had to happen eventually, right? Weirder things have happened at the Oscars.
That envelope mix-up will forever color the way we look back on the outcome, but a less dramatic perspective should be taken with the real surprise of the moment: "Moonlight" won Best Picture even though most predictions favored "La La Land." This has been interpreted as a sign of shifting tastes and trends within the Academy's voting body, likely stemming from the initiative taken to make the voting body more diverse. If any film was going to upset "La La Land" it was going to be "Moonlight," but were all the predictors who picked the former leaning too hard on gut instincts? Were there trends they missed in the lead-up to the awards?
Wielding the twin powers of hindsight and historical data, we decided to see if they missed something — and if there's anything that can be learned from last year's Best Pictures race to help predict 2018's.
Hi, We're Digg, We Have Lots Of Data
Do you use Digg Reader to follow the RSS feeds of your favorite websites? Maybe you've got Digg integrated into Slack or Facebook Messenger? Between these two services, we're slurping up content from across the internet all the time, including stuff that's relevant to the Oscars. We've got everything pulled in via the RSS feeds plugged into Digg Reader going back a few years.
So, for the 2017 and 2018 Best Picture nominees, we pulled all their mentions from the first of the prior year through to the end of nominations voting in mid-January of the awards year.1 If a nominated film appears in an article's text at least once, that article counts as a single mention.
There was lots of data cleaning to do. Not every movie has a title as unique as "Three Billboards Outside Ebbing, Missouri," — some, like "Fences," "The Post," and "Lion" (a particular bane of our collection process) needed better filtering. We went hunting for unusual patterns and created blacklists for troublesome cases (like removing all articles containing "Cannes de Lions," from the mentions of "Lion" since it did not show at the festival). In some cases, we used whitelists to refine the results by looking for companion terms (i.e. looking for mentions of "Meryl Streep" or "Tom Hanks" alongside "The Post").
Digg Reader users can pull in whatever RSS feeds they like, but there's not much of a case for tallying every blog, YouTube channel or podcast under the sun. Instead, we restricted the results to the list of publications we've circulated here on Digg's front page (in other words: legit publications). After that each of our 2017 and 2018 datasets had over 10,000 hits for their respective Best Picture nominees.
Before You See Our Graphs, Here's A Crash-Course On Best Picture Voting
If you're trying to outperform your friends or coworkers in an Oscars pool, you've probably studied how the Academy goes about nominations and final voting for Best Picture. If you're hoping our graphs will give you an edge, you should definitely know how the votes are tallied in both phases. As it turns out, it's more complicated than you might think.
In 2009, when the Academy decided to expand Best Picture nominations from 5 films to between 5 and 10, they brought the preferential instant runoff voting system from the nomination process to the final voting, with some tweaks. When selecting Best Picture nominees, each voter is allowed to rank 5 to 10 films in order of preference. For the final voting, they rank all of the nominees by order of preference.
Instant runoff systems basically work as follows: in the initial counting of ballots, all of the first place votes are tallied. If a choice passes a mathematically determined threshold on this initial count, it's in — for Best Picture nominations, a film must reach whatever 1/11th of the total ballot count is, rounded-up. A choice that hits that threshold has its ballots set aside. After the initial count, if the threshold hasn't been reached by any choice or there's still potential for additional choices to reach the threshold after redistribution, whatever received the least number of first place votes in the initial count is dropped from consideration and those ballots are redistributed to their second place choices (or to the next choice still in consideration). This process is repeated until either the mathematical limit of winners is reached or until the list is whittled down to the limit.
There are additional rules in effect with Best Picture nominations that mean we don't always end up with 10 nominees. First, there's a surplus rule that kicks into effect for especially popular films: if a film gets over 10% more first place votes than the nomination threshold dictates, those ballots are split, with a fraction of each ballot staying with the first place choice (however much to add up to the threshold) and the remainder of the ballot redistributed to the voter's next choice still in the running. A film must have at least 5% of first place votes to even be eligible for nomination, and any ballot with a first place film that gets less than 1% of the initial count is redistributed instantly.
Once the nominees are chosen, things proceed to final Best Picture voting, where the threshold a nominee has to reach to win is 50%+1. Now a winner has to have widespread support across the Academy's voting body to win — before 2009, when Best Picture was a five nominee field, the final winner was determined by a plurality. This explainer from FiveThirtyEight illustrates how the preferential runoff system can lead to a different winner than whatever gets the plurality in an initial count.
In a year with several strong Best Picture nominees or particularly polarizing films, the second place position on ballots takes on special importance. "Moonlight" probably didn't win with over 50% of votes in 2017's initial count. It's almost certain that "La La Land" started with a plurality, and then a big group of ballots which didn't list "Moonlight" or "La La Land" first gave their second place to "Moonlight," pushing the film over 50% after however many rounds of redistribution.
So, is there anything in our data that suggests we could've better predicted a wave of second place support for "Moonlight" that put it over the top? If so, do we see similar patterns emerge among this year's nominees?
2017 Nominees By The Numbers
Here's the breakdown by month of all 2017's Best Picture nominees, minus "Lion."2 With all of the graphs to follow, you can hover over each bar to see how many stories about the movie came out in that time period and what percentage of those stories each film accounts for. A bar with diagonal shading indicates the film's initial US release happened in that time period — you can hover over it for the exact date.
Those big indigo bars in December and January 2017? That's the effect of "La La Land's" guild awards rampage coupled with the fact that it started in extremely limited release and then kept getting pushed out to more and more theaters through December and into January. Those bars capture the rave reviews for the film, the negative criticism, the awards buzz, all of it.
So what about "Moonlight?" Well, despite coming out late in October and in a fairly limited release, it dominates the month. It has a strong post-release tail in coverage. It raked in wins at critics' awards and smaller festivals through November and December; more stories were written about it in December than in it's month of release; at its peak in November it was only in 650 theaters. Compare "Moonlight" to "Arrival," which opened with a wide release in November and dominated that month's discussion — its share of mentions in December and January drop off a cliff. Same with "Hacksaw Ridge" and "Hell or High Water" earlier in the year.
In FiveThirtyEight's model, which looked at guild and critic awards and weights their results based on past predictive power, "La La Land's" awards from the Producer's and Director's guilds put it far ahead of "Moonlight" in a distant second, where it barely ranked higher than "Arrival." In contrast, our coverage view says "Moonlight" was talked about post-release far more than "Arrival" was. Yes, "La La Land" dominates in total mentions, as it dominated those guild awards. Still, "Moonlight" comes in second for total mentions in the year and it's long tail post-release indicates its buzz relative to the other films stayed strong. Here, in a week-by-week breakdown of the data from September 1st through January, you can see "Moonlight's" staying power in a little more detail:
In an analysis from The Ringer, they concluded that "Moonlight" was better off for having opened in October than in a later month, as late December nominees have been performing worse in recent years. "Moonlight" had ample time to find its champions amongst critics, run a strong Oscars campaign and, at the very least, establish itself as the common number two choice amongst voters.
This isn't robust data analysis, just an effort to see if there were any signs of "Moonlight" being a stronger contender than it was given credit for. What we want to look for in 2018's coverage, given "Moonlight's" success, are films with patterns of strong post-release coverage that could likely scoop up second place votes over the ostensible frontrunners, "Three Billboards Outside Ebbing, Missouri" and "The Shape of Water."
2018 Is A Wild, Wild Race
Once "La La Land" had its Golden Globe and Director's Guild Award, along with a record-tying 14 Oscar nominations, most prognosticators thought Best Picture was a done deal. Folks are a little more hesitant to call 2018's race. "Three Billboards Outside, Ebbing Missouri" and "The Shape of Water" split the guilds — the actors liked "Billboards" while producers and directors went for "Shape." The Writers Guild Awards gave Best Screenplay to "Get Out," both that film and "Lady Bird" have been cleaning up at smaller critics' awards shows. Since guild awards are better predictors, it looks like "Shape of Water" and "Three Billboards" are neck and neck.
So, you know how "La La Land" was last year's Best Picture frontrunner, it had the most mentions in our data, and it lost? Well, "Get Out" is seen as a Best Picture underdog for many reasons: its release date, genre, subject matter… but hey, just look at this:
"Get Out" is the most talked-about Best Picture nominee, with "Dunkirk" not far behind. Though their total mentions are close, "Dunkirk" has its big isolated summer blockbuster month in July and then drops off. "Get Out" managed to be the most discussed movie of October — over 7 months after its release — thanks in large part to Jordan Peele's surprise appearance at a UCLA class on the film. The film is so good that whole college courses are being built around it. It spawned multiple memes and cemented "the Sunken Place" as a potent metaphor for black marginalization. None of this is the same as running a strong Oscars campaign (that calls for advertising, Q&As with Academy members, etc.), but is it that hard to believe a movie that was the talk of the entire year could scoop up a boatload of second place votes and repeat a "Moonlight"-style upset?
Well, "Lady Bird" will probably pick up a fair share of first and second place votes too: it came out two weeks later than "Moonlight" did the year before and it shows similarly strong post-release coverage. Even as "Three Billboards" was met with praise, spurred a backlash and went home with the Best Motion Picture Drama Golden Globe on January 7th, "Lady Bird" consistently outpaced it in coverage and picked up the Best Motion Picture Comedy Globe on the same night (last year, "Moonlight" and "La La Land" picked up those wins). "Three Billboards" was the ostensible Oscars frontrunner after the Golden Globes, but "Lady Bird" shouldn't be counted out.
So, assuming "The Shape of Water" and "Three Billboards" have the lead but neither has the support to cinch 50%+1 in the initial ballot count, what films do we think will drop out first? Probably "Darkest Hour," "The Post" and "Phantom Thread" in some order. Where would their ballots move? Do "Get Out" and "Lady Bird" end up fighting for second place on ballots to both films' detriments?
If you're inclined to lean on conventional Oscars wisdom, you still should look at the guild support between "Shape" and "Billboards" and pick based on those. Given its 13 nominations, "The Shape of Water" is still the sensible pick and "Shape" also has the edge over "Three Billboards" in our data.
That said, if you feel an upset coming in your bones — if you think that deep down, whether it wins or not, that 2017 was the year of "Get Out" — then its strong coverage throughout the year should reassure you that, no, you're not way off base here. It really could take Best Picture. If it does, that also supports our "Moonlight" takeaway about the significance of post-release coverage.
If you make your pick with the assistance of this article and, on Sunday night, you're proven wrong? Well, first: sorry. Second, hopefully this data dive helps give you perspective. Just because Best Picture goes one way or another doesn't indicate that world is somehow broken. In the (misguided) words of Joaquin Phoenix, the Oscars are "total, utter bullshit" — sometimes there are pleasant surprises, sometimes not. Oscars voting is still a group of humans ranking their favorite films, and not, like, an index that tracks with media mentions.
Now if there's another Best Picture envelope mix up this year? Then we should probably start worrying that we're all in a busted simulation.
Special thanks to Sarah Ruddy, Rob Okrzesik and Shivram Subramanian for their help wrangling the data and making it presentable — and thanks to Digg's dev team for letting us borrow their smarts for a bit.
If we tried to include the period leading up to final voting (which ended on Feb 27th this year) we wouldn't have been able to finish this piece before the Oscars. C'est la vie.
We were able to adequately filter out most erroneous mentions for "Lion" — in other graphs, the broad patterns matched up with what common sense dictates the film's buzz cycle would be. That said, there were still enough false hits that, given our choice of graph, when we included "Lion" it looked as though it dominated the conversation in early 2016. It didn't: because 2016's Best Picture nominees had their releases clustered later in the year, there aren't as many stories in the year's first half. All it would take is a handful of erroneously tagged stories about lions to make it look like "Lion" was being covered extensively earlier in the year, so we left it out completely.