Post by kevin on Nov 23, 2019 13:47:50 GMT
Hey, everyone. I've recently been working on a system to try to predict the Best Picture winner and I've finally finished it, so here it is.
Important Note: Here is an overview of the Oscar tables, both in terms of nomination coefficient and win coefficients. If you want to know how everything works and what it all means, the explanation can be found after the table.
2020 Oscars Table
Award show progress (weighted percentage of Award shows already included in calculation) = 81.28%
RANKED BY FINAL WIN COEFFICIENT (FWC) full table:
TOP 10 RANKED BY NOMINATION COEFFICIENT (NC) = current expected nominees
(all movies with a rounded 100%+ nomination coefficient are bolded, movies with a log value better than -1 are colored red, movies with a log value of exactly 0 are colored green):
# - Movie - (nomination coefficient) (log value)
1. Parasite (100%) (0)
2. Once Upon a Time in Hollywood (100%) (0)
3. Joker (100%) (0)
4. 1917 (100%) (0)
5. The Irishman (100%) (~0)
6. Marriage Story (100%) (~0)
7. Jojo Rabbit (100%) (-0.01)
8. Little Women (100%) (-0.2)
8. Knives Out (100%) (-1.2)
10. The Two Popes (100%) (-12)
Coefficients and values (what do they mean)
While the coefficients are given as percentages, they don't equal the actual chance of a movie getting nominated (i.e. 90% nomination coefficient doesn't equal a 90% of an Oscar nom). Why is explained later in this post, but just see them as some kind of way to score which movies are performing the best. Here's an overview of the coefficients used:
*Nomination coefficient = how likely is the movie to get nominated? Since these values quickly tend to go to 100%, another way of representing this, is using the log value. This is 1000*log(nomination coefficient). This scale still allows us to see significant differences, even when all movies have a 100% rounded nomination coefficient. The log value is a negative number and is better the closer it gets to 0. So -50 is better than -400.
*Win coefficient = roughly how likely is the movie to win the Oscar? This is calculated independently from the nomination coefficient.
*Final Win coefficient = a better representation of how likely the movie is to win the Oscar. When actually predicting which movie will win the BP Oscar, the final win coefficient is always used instead of the win coefficient one. The final win coefficient is simply the 'nomination coefficient' times the 'win coefficient' and thus takes into account both the chance of getting nominated in the first place and the chance winning it after that. Logically, once the nominations are known in January and the nomination coefficient for all nominated movies is set to 100% (since there is no chance anymore, we then know for sure), the difference between the win coefficient and final win coefficient disappears.
History cheatsheet (what coefficient values are considered good)
NOMINATION COEFFICIENT STATS
*Average nomination coefficient required to be nominated = 100% (so at the end of the season you usually need 100% for a nom)
*Lowest nomination coefficient for a nominated movie = 100% (Vice, so you need a rounded 100%) -> in this case a log value of -2.1
*Range of nomination coefficients for nominated movies = 100% (log value of -2.1) to 100% (log value of 0)
Average log value of nominated movie = -0.26 (standard deviation of roughly 0.75), -1 or better basically secures a nom
(FINAL) WIN COEFFICIENT STATS
*Average win coefficient of the Best Picture winners = 99%
*Lowest win coefficient for a Best Picture winner = 98% (Green Book)
How does it work?
I went through most important festivals, industry award shows and critic circles and looked how well they performed in terms of predicting the Best Picture winner. For each of them, I collected all their nominees and winners since 2009 (when the Academy extended their nominations to a max. of 10) and looked how well they performed. This is how it works. Let's say there is a Random Award Show that announces nominations in December and its winner in January. Let's first look at the nominations round. Say that this Random Award Show nominates 5 movies for Best Picture. So since 2009 that's 50 nominations. Now I look at the Academy Awards since 2009 and I see that from
the 50 movies nominated for the Random Award Show, 12 of them (= 24%) are nominated for the Best Picture Oscar and 3 of them even won Best Picture (= 6%). Than every single nominated movie gets a 24% chance to be nominated for the Best Picture Oscar and a 6% chance to win it. But now say that from the 10 movies that won the Random Award Show, 8 of them were nominated for the Best Picture Oscar (= 80%) and 2 of them won the Best Picture Oscar (= 20%). So once the winner is announced in January, the one movie that won it
will get an increased chance, from 24% to 80% to get nominated and from 6% to 20% to win the BP Oscar.
This is done for every single award in the database. All the probabilities are combined by a geometric distribution. So if a movie was nominated for Award A & Award B, where Award A gives a 40% chance to get a BP Oscar nom and Award B gives a 12 % chance to get a BP Oscar nom, then the combined probability is 1 - (1 - 0.4) * (1 - 0.12) = 0.472 = 47.2% chance to be nominated. This calculation doesn't translate perfectly when using many award shows. This has to do with the fact that I don't include any covariance information. F.e. (just a random guess), but maybe the real probability to get nominated with both an Award A and an Award B nom is 45% instead of 47.2% due to covariance. Covariance means that the actual chance to be nominated is slightly smaller since Award A and Award B are not completely independent, there is an increased chance of being nominated for Award B is you're nominated for Award A. So far I only calculated the correlation with respect to the Oscars, which is about 97 correlation calculations. However, extending that to every single correlation would mean I'd have to calculate 96*97 = 9312 other correlations, which I really don't want to do. I could write a program to do it for me, but tbh I don't have enough time atm to do that either, maybe I'll do that for next year.
So, in conclusion, that means that the actual percentage values don't directly correlate to how much percent chance a movie has to win. It's likely that every single movie in my top 20 will have a percentage value above 90% by the end of the season, which of course is impossible if only a max. of 10 movies are getting nominated. So, is it all for nothing? Luckily, no. While the values are not a realistic percentage chance, the actual order isn't significantly impacted by the covariance. So the 10 movies with the highest nomination coefficient are still the 10 movies most likely to get nominated/to win. Therefore, the system can still be used to predict Oscar noms and Oscar wins, the percentage scale is just slightly transformed. So instead of percentage, I'll give it a different name: nomination & win coefficients.
Which award shows did I take into account?
I used a total of 53 award shows, film festivals and critic circles for my calculations. The green ones are already included in the calculation. If the text is colored red, it means that only the nominees are included in the calculations and the winner isn't included yet. This is f.e. the case with the Spirit Award, for which we already know the nominations, but not the actual winner just yet. Here's a list of all of them:
What does the table show?
The full award table is ranked based on the 'Final win' coefficient. First of all, the table shows for how many of the 53 listed award shows above, the movie has been nominated. After that it shows the 'nomination coefficient', which as mentioned above is about how big the chance is of the movie being nominated. Then we have the 'win coefficient', which says how big the chance is of the movie winning Best Picture. Finally we have the 'final win coefficient', which is simply the nomination * win coefficient and shows a more realistic picture of the movies' Oscar chances. Once the Oscar nominations have been announced, the nomination coefficient will become obsolete and there will be only one coefficient left (maybe I'll call it the Best Picture coefficient), but we'll get to that when we need it.
How well does it perform?
The algorithm works very well in predicting the Oscar nominations and winners. Here are the results from last year. As mentioned before, due to the lack of covariances, the coefficients don't represent realistic percentages. But with some transformations, you can still get realistic numbers. Here are the transformed percentages to get a better idea of how big the actual win chances were relative to the other movies. The first list shows the most likely movies to get nominated, the second list shows the relative chance of the actual nominated movies to win. As you can see below, every movie in the top 10 most likely to get nominated had a 100% coefficient (rounded). So in the third column I took 1000*log(nomination coefficient), which better shows the space between nomination chances. In this case, the more negative the number is, the worse its nomination chances.
2018
While the algorithm was still slightly in favor of Roma winning, it also shows that there was a pretty big chance of a Green Book upset and showed this much better than most betting polls before the Oscars. All 8 of the nominated movies were also in the top 10 most likely movies to get nominated. If Beale Street Could Talk was the big snub of the award show according to the algorithm, which was quite certain it would at least get a nom. In 2017 all of the 9 nominated movies were also in the top 10 most likely to get nominated. The Florida Project was the only unexpected snub. The algorithm also correctly predicted The Shape of Water to win the big award. I'm still adding a few awards to the 2017 version so I've taken them off this page for now.
Important Note: Here is an overview of the Oscar tables, both in terms of nomination coefficient and win coefficients. If you want to know how everything works and what it all means, the explanation can be found after the table.
2020 Oscars Table
Award show progress (weighted percentage of Award shows already included in calculation) = 81.28%
RANKED BY FINAL WIN COEFFICIENT (FWC) full table:
TOP 10 RANKED BY NOMINATION COEFFICIENT (NC) = current expected nominees
(all movies with a rounded 100%+ nomination coefficient are bolded, movies with a log value better than -1 are colored red, movies with a log value of exactly 0 are colored green):
# - Movie - (nomination coefficient) (log value)
1. Parasite (100%) (0)
2. Once Upon a Time in Hollywood (100%) (0)
3. Joker (100%) (0)
4. 1917 (100%) (0)
5. The Irishman (100%) (~0)
6. Marriage Story (100%) (~0)
7. Jojo Rabbit (100%) (-0.01)
8. Little Women (100%) (-0.2)
8. Knives Out (100%) (-1.2)
10. The Two Popes (100%) (-12)
Coefficients and values (what do they mean)
While the coefficients are given as percentages, they don't equal the actual chance of a movie getting nominated (i.e. 90% nomination coefficient doesn't equal a 90% of an Oscar nom). Why is explained later in this post, but just see them as some kind of way to score which movies are performing the best. Here's an overview of the coefficients used:
*Nomination coefficient = how likely is the movie to get nominated? Since these values quickly tend to go to 100%, another way of representing this, is using the log value. This is 1000*log(nomination coefficient). This scale still allows us to see significant differences, even when all movies have a 100% rounded nomination coefficient. The log value is a negative number and is better the closer it gets to 0. So -50 is better than -400.
*Win coefficient = roughly how likely is the movie to win the Oscar? This is calculated independently from the nomination coefficient.
*Final Win coefficient = a better representation of how likely the movie is to win the Oscar. When actually predicting which movie will win the BP Oscar, the final win coefficient is always used instead of the win coefficient one. The final win coefficient is simply the 'nomination coefficient' times the 'win coefficient' and thus takes into account both the chance of getting nominated in the first place and the chance winning it after that. Logically, once the nominations are known in January and the nomination coefficient for all nominated movies is set to 100% (since there is no chance anymore, we then know for sure), the difference between the win coefficient and final win coefficient disappears.
History cheatsheet (what coefficient values are considered good)
NOMINATION COEFFICIENT STATS
*Average nomination coefficient required to be nominated = 100% (so at the end of the season you usually need 100% for a nom)
*Lowest nomination coefficient for a nominated movie = 100% (Vice, so you need a rounded 100%) -> in this case a log value of -2.1
*Range of nomination coefficients for nominated movies = 100% (log value of -2.1) to 100% (log value of 0)
Average log value of nominated movie = -0.26 (standard deviation of roughly 0.75), -1 or better basically secures a nom
(FINAL) WIN COEFFICIENT STATS
*Average win coefficient of the Best Picture winners = 99%
*Lowest win coefficient for a Best Picture winner = 98% (Green Book)
How does it work?
I went through most important festivals, industry award shows and critic circles and looked how well they performed in terms of predicting the Best Picture winner. For each of them, I collected all their nominees and winners since 2009 (when the Academy extended their nominations to a max. of 10) and looked how well they performed. This is how it works. Let's say there is a Random Award Show that announces nominations in December and its winner in January. Let's first look at the nominations round. Say that this Random Award Show nominates 5 movies for Best Picture. So since 2009 that's 50 nominations. Now I look at the Academy Awards since 2009 and I see that from
the 50 movies nominated for the Random Award Show, 12 of them (= 24%) are nominated for the Best Picture Oscar and 3 of them even won Best Picture (= 6%). Than every single nominated movie gets a 24% chance to be nominated for the Best Picture Oscar and a 6% chance to win it. But now say that from the 10 movies that won the Random Award Show, 8 of them were nominated for the Best Picture Oscar (= 80%) and 2 of them won the Best Picture Oscar (= 20%). So once the winner is announced in January, the one movie that won it
will get an increased chance, from 24% to 80% to get nominated and from 6% to 20% to win the BP Oscar.
This is done for every single award in the database. All the probabilities are combined by a geometric distribution. So if a movie was nominated for Award A & Award B, where Award A gives a 40% chance to get a BP Oscar nom and Award B gives a 12 % chance to get a BP Oscar nom, then the combined probability is 1 - (1 - 0.4) * (1 - 0.12) = 0.472 = 47.2% chance to be nominated. This calculation doesn't translate perfectly when using many award shows. This has to do with the fact that I don't include any covariance information. F.e. (just a random guess), but maybe the real probability to get nominated with both an Award A and an Award B nom is 45% instead of 47.2% due to covariance. Covariance means that the actual chance to be nominated is slightly smaller since Award A and Award B are not completely independent, there is an increased chance of being nominated for Award B is you're nominated for Award A. So far I only calculated the correlation with respect to the Oscars, which is about 97 correlation calculations. However, extending that to every single correlation would mean I'd have to calculate 96*97 = 9312 other correlations, which I really don't want to do. I could write a program to do it for me, but tbh I don't have enough time atm to do that either, maybe I'll do that for next year.
So, in conclusion, that means that the actual percentage values don't directly correlate to how much percent chance a movie has to win. It's likely that every single movie in my top 20 will have a percentage value above 90% by the end of the season, which of course is impossible if only a max. of 10 movies are getting nominated. So, is it all for nothing? Luckily, no. While the values are not a realistic percentage chance, the actual order isn't significantly impacted by the covariance. So the 10 movies with the highest nomination coefficient are still the 10 movies most likely to get nominated/to win. Therefore, the system can still be used to predict Oscar noms and Oscar wins, the percentage scale is just slightly transformed. So instead of percentage, I'll give it a different name: nomination & win coefficients.
Which award shows did I take into account?
I used a total of 53 award shows, film festivals and critic circles for my calculations. The green ones are already included in the calculation. If the text is colored red, it means that only the nominees are included in the calculations and the winner isn't included yet. This is f.e. the case with the Spirit Award, for which we already know the nominations, but not the actual winner just yet. Here's a list of all of them:
AFI
Annie Awards
Austin Film Critics
BAFTAS
Black Reel
Boston Society of Film Critics
British Independent Film Awards
Camerimage Film Festival
Cannes
Capri
César Award
Chicago Film Critics Association
Critics Choice
Dallas-Fort Worth Film Critics Association
Detroit Film Critics Society
Dublin Film Critics Circle
European Film Awards
Florida Film Critics Circle
Georgia Film Critics Association
Golden Globes
Gotham Independent Film Awards
Houston Film Critics Society
IMDb highest rated movies
International Cinephile Society
Las Vegas Film Critics Society
London Film Critics Circle
Los Angeles Film Critics Circle
Lumières Award
Metacritic highest rated movies
National Board of Review
National Society of Film Critics
New York Film Critics Circle
New York Film Critics Online
North Carolina Film Critics Circle
Online Film Critics Circle
People's Choice Awards
Philedelphia Film Critics Circle
Phoenix Film Critics Circle
Producers Guild Awards (PGA)
Rotten Tomatoes highest rated movies
Rotterdam International Film Festival
San Diego Film Critics Society
San Fransisco Film Critics Circle
Satellite Awards
Seattle Film Critics
Seattle International Film Festival
Spirit Awards
St. Louis Film Critics Association
Toronto Film Critics Association
Toronto International Film Festival
Vancouver Film Critics Circle
Venice Film Festival
Washtington DC Film Critics Association
Annie Awards
Austin Film Critics
BAFTAS
Black Reel
Boston Society of Film Critics
British Independent Film Awards
Camerimage Film Festival
Cannes
Capri
César Award
Chicago Film Critics Association
Critics Choice
Dallas-Fort Worth Film Critics Association
Detroit Film Critics Society
Dublin Film Critics Circle
European Film Awards
Florida Film Critics Circle
Georgia Film Critics Association
Golden Globes
Gotham Independent Film Awards
Houston Film Critics Society
IMDb highest rated movies
International Cinephile Society
Las Vegas Film Critics Society
London Film Critics Circle
Los Angeles Film Critics Circle
Lumières Award
Metacritic highest rated movies
National Board of Review
National Society of Film Critics
New York Film Critics Circle
New York Film Critics Online
North Carolina Film Critics Circle
Online Film Critics Circle
People's Choice Awards
Philedelphia Film Critics Circle
Phoenix Film Critics Circle
Producers Guild Awards (PGA)
Rotten Tomatoes highest rated movies
Rotterdam International Film Festival
San Diego Film Critics Society
San Fransisco Film Critics Circle
Satellite Awards
Seattle Film Critics
Seattle International Film Festival
Spirit Awards
St. Louis Film Critics Association
Toronto Film Critics Association
Toronto International Film Festival
Vancouver Film Critics Circle
Venice Film Festival
Washtington DC Film Critics Association
What does the table show?
The full award table is ranked based on the 'Final win' coefficient. First of all, the table shows for how many of the 53 listed award shows above, the movie has been nominated. After that it shows the 'nomination coefficient', which as mentioned above is about how big the chance is of the movie being nominated. Then we have the 'win coefficient', which says how big the chance is of the movie winning Best Picture. Finally we have the 'final win coefficient', which is simply the nomination * win coefficient and shows a more realistic picture of the movies' Oscar chances. Once the Oscar nominations have been announced, the nomination coefficient will become obsolete and there will be only one coefficient left (maybe I'll call it the Best Picture coefficient), but we'll get to that when we need it.
How well does it perform?
The algorithm works very well in predicting the Oscar nominations and winners. Here are the results from last year. As mentioned before, due to the lack of covariances, the coefficients don't represent realistic percentages. But with some transformations, you can still get realistic numbers. Here are the transformed percentages to get a better idea of how big the actual win chances were relative to the other movies. The first list shows the most likely movies to get nominated, the second list shows the relative chance of the actual nominated movies to win. As you can see below, every movie in the top 10 most likely to get nominated had a 100% coefficient (rounded). So in the third column I took 1000*log(nomination coefficient), which better shows the space between nomination chances. In this case, the more negative the number is, the worse its nomination chances.
2018
Most likely to get nominated
# / Movie / Nomination coefficient / Log value
1. Roma / 100% / 0
2. Green Book / 100% / 0
3. The Favourite / 100% / 0
4. A Star Is Born / 100% / 0
5. Bohemian Rhapsody / 100% / 0
6. If Beale Street Could Talk / 100% / -9*10^-7
7. BlackKklansman / 100% / -3*10^-5
8. First Reformed / 100% / -0.002
9. Black Panther / 100% / -0.003
9. Eighth Grade / 100% / -0.77
Most likely to win
# / Movie / Win coefficient / Transformed realistic percentage
1. Roma / 100% / 31%
2. Green Book / 99% / 27%
3. The Favourite / 97% / 19%
4. A Star Is Born / 97% / 18%
5. BlackKklansman / 91% / 5%
6. Black Panther / 83% / 1%
7. Vice / 65% / 0%
8. Bohemian Rhapsody / 37% / 0%
# / Movie / Nomination coefficient / Log value
1. Roma / 100% / 0
2. Green Book / 100% / 0
3. The Favourite / 100% / 0
4. A Star Is Born / 100% / 0
5. Bohemian Rhapsody / 100% / 0
6. If Beale Street Could Talk / 100% / -9*10^-7
7. BlackKklansman / 100% / -3*10^-5
8. First Reformed / 100% / -0.002
9. Black Panther / 100% / -0.003
9. Eighth Grade / 100% / -0.77
Most likely to win
# / Movie / Win coefficient / Transformed realistic percentage
1. Roma / 100% / 31%
2. Green Book / 99% / 27%
3. The Favourite / 97% / 19%
4. A Star Is Born / 97% / 18%
5. BlackKklansman / 91% / 5%
6. Black Panther / 83% / 1%
7. Vice / 65% / 0%
8. Bohemian Rhapsody / 37% / 0%
While the algorithm was still slightly in favor of Roma winning, it also shows that there was a pretty big chance of a Green Book upset and showed this much better than most betting polls before the Oscars. All 8 of the nominated movies were also in the top 10 most likely movies to get nominated. If Beale Street Could Talk was the big snub of the award show according to the algorithm, which was quite certain it would at least get a nom. In 2017 all of the 9 nominated movies were also in the top 10 most likely to get nominated. The Florida Project was the only unexpected snub. The algorithm also correctly predicted The Shape of Water to win the big award. I'm still adding a few awards to the 2017 version so I've taken them off this page for now.