Jump to content

The Panda

The Panda's Extremely Poor Simple Linear Regression Analysis: Do YouTube Trailer Views Correlate With Higher Box Office Grosses?

Recommended Posts

11 hours ago, The Panda said:

 

Yeah I was only measuring YouTube trailer views and DOM Gross/OW.

 

Sometime in the future I may try and do a better test with more movies (top 200), and add in a few extra variables like review, whether it's a franchise movie or not, whether it has Oscar nods or not, etc.

 

This analysis was pretty basic.

 

Don't some movies have duplicate videos for the same trailer across Youtube though?  And what about movies that are based on IPs with high brand awareness, that naturally get higher views on Youtube because of their name?  Saban's Power Rangers has 23.6 million views for its trailer, and based on your recommended calculations, it would have a $51 million opening weekend.  I'm skeptical of course, from that club I started for it, and especially with it opening behind Beauty and the Beast.  Maybe at some point at the highest openings for the year, the Youtube view count just does not work as well?

Link to comment
Share on other sites



1 hour ago, Outrageous! said:

 

Don't some movies have duplicate videos for the same trailer across Youtube though?  And what about movies that are based on IPs with high brand awareness, that naturally get higher views on Youtube because of their name?  Saban's Power Rangers has 23.6 million views for its trailer, and based on your recommended calculations, it would have a $51 million opening weekend.  I'm skeptical of course, from that club I started for it, and especially with it opening behind Beauty and the Beast.  Maybe at some point at the highest openings for the year, the Youtube view count just does not work as well?

 

Obviously it's not a perfect fit model, so I don't recommend it to solely base your predictions on it.  Also, as I mentioned before, I really didn't account for anything besides YouTube views and BO Grosses.  Obviously very flawed, and there are obviously other factors, I just wasn't able to take the time to do a full model on it at this moment.

 

From the data I did gather though, we know that YouTube views are a good indicator of BO gross.

  • Like 1
Link to comment
Share on other sites



This is great. I actually did this exact project two years ago (I posted it in the CC thread) and found very similar results. I used about 12 different variables and found trailer views to be a statistically significiant factor compared to reviews or audience score, for example. 

  • Like 2
Link to comment
Share on other sites



33 minutes ago, Cmasterclay said:

This is great. I actually did this exact project two years ago (I posted it in the CC thread) and found very similar results. I used about 12 different variables and found trailer views to be a statistically significiant factor compared to reviews or audience score, for example. 

 

Did you compare them separately, or did you do a multiple regression?

 

I've wanted to do something like that for a while, the main thing stopping me is that pretty much every source of data other than Box Office Mojo makes it an absolute pain in the ass to gather the data. With the possible exception of Metacritic.

Link to comment
Share on other sites

20 minutes ago, Jason said:

 

Did you compare them separately, or did you do a multiple regression?

 

I've wanted to do something like that for a while, the main thing stopping me is that pretty much every source of data other than Box Office Mojo makes it an absolute pain in the ass to gather the data. With the possible exception of Metacritic.

I did a multiple regression model. My laptop crashed and so I lost it but I'm going to try and see if I saved it in some drive. 

  • Like 3
Link to comment
Share on other sites



Wasn't able to find my whole project with graphs and explanations, but found a PowerPoint with my regression analysis. Trailer views and RT audience score were very solid indicators. Things like budgets and Theater Count had to be in there for control. Super Franchise was a multi-tiered dummy variable indicating what level of franchise it was (starting up based on existing material, sequels, or completely original)

 

  • Source |       SS       df       MS              Number of obs =      51

  • -------------+------------------------------           F(  9,    41) =   20.21

  •       Model |  529194.749     9  58799.4166           Prob > F      =  0.0000

  •    Residual |  119268.862    41  2908.99664           R-squared     =  0.8161

  • -------------+------------------------------           Adj R-squared =  0.7757

  •       Total |  648463.612    50  12969.2722           Root MSE      =  53.935

 

  • ------------------------------------------------------------------------------

  •       Gross |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

  • -------------+----------------------------------------------------------------

  •  ProdBudget |  -1.236758   .4330427    -2.86   0.007    -2.111305   -.3622103

  • MarketingBud |   2.151938   .9392942     2.29   0.027     .2549945    4.048881

  • TrailerViews |   1.101537   .3635741     3.03   0.004     .3672848     1.83579

  • RottenToma~s |   .1106297   1.490837     0.07   0.941    -2.900176    3.121436

  •   RTAverage |   .5181892   1.060976     0.49   0.628    -1.624496    2.660874

  •  RTAudience |   3.671616   1.013446     3.62   0.001     1.624921     5.71831

  • TheaterCount |   .1041881   .0323623     3.22   0.003     .0388312     .169545

  •   SuperFran |   65.13007   26.70952     2.44   0.019      11.1891    119.0711

  •        RTsq |  -.0067532   .0122797    -0.55   0.585    -.0315526    .0180462

  •       _cons |  -500.1232   136.2386    -3.67   0.001    -775.2626   -224.9839

  • Like 5
Link to comment
Share on other sites



49 minutes ago, Cmasterclay said:

Wasn't able to find my whole project with graphs and explanations, but found a PowerPoint with my regression analysis. Trailer views and RT audience score were very solid indicators. Things like budgets and Theater Count had to be in there for control. Super Franchise was a multi-tiered dummy variable indicating what level of franchise it was (starting up based on existing material, sequels, or completely original)

 

  • Source |       SS       df       MS              Number of obs =      51

  • -------------+------------------------------           F(  9,    41) =   20.21

  •       Model |  529194.749     9  58799.4166           Prob > F      =  0.0000

  •    Residual |  119268.862    41  2908.99664           R-squared     =  0.8161

  • -------------+------------------------------           Adj R-squared =  0.7757

  •       Total |  648463.612    50  12969.2722           Root MSE      =  53.935

 

  • ------------------------------------------------------------------------------

  •       Gross |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

  • -------------+----------------------------------------------------------------

  •  ProdBudget |  -1.236758   .4330427    -2.86   0.007    -2.111305   -.3622103

  • MarketingBud |   2.151938   .9392942     2.29   0.027     .2549945    4.048881

  • TrailerViews |   1.101537   .3635741     3.03   0.004     .3672848     1.83579

  • RottenToma~s |   .1106297   1.490837     0.07   0.941    -2.900176    3.121436

  •   RTAverage |   .5181892   1.060976     0.49   0.628    -1.624496    2.660874

  •  RTAudience |   3.671616   1.013446     3.62   0.001     1.624921     5.71831

  • TheaterCount |   .1041881   .0323623     3.22   0.003     .0388312     .169545

  •   SuperFran |   65.13007   26.70952     2.44   0.019      11.1891    119.0711

  •        RTsq |  -.0067532   .0122797    -0.55   0.585    -.0315526    .0180462

  •       _cons |  -500.1232   136.2386    -3.67   0.001    -775.2626   -224.9839

 

Thats similar to what I'm wanting to do.  But I think I'll do separate tests for different genres/types of movies.  As I'd reckon RT scores, trailer views, etc. will have a varying effect on Suicide Squad vs La La Land for example

Link to comment
Share on other sites



4 minutes ago, The Panda said:

 

Thats similar to what I'm wanting to do.  But I think I'll do separate tests for different genres/types of movies.  As I'd reckon RT scores, trailer views, etc. will have a varying effect on Suicide Squad vs La La Land for example

There's no really good way around this. What I did was tracked the top 20 or so movies for 2013, 2014, and 2015, to try to cut through that noise and give the data an opportunity to work itself out. There's no one factor that is THAT statistically significant, sadly. It's very messy and the confidence is mediocre, but that's fitting, given what we know about box office. 

  • Like 1
Link to comment
Share on other sites





Okay, so I ran another regression test.  This time I added the variables, Tomatometer, YouTube Views, Audience Score, Screen Count, Budget, Best Picture Nomination, Franchise Film, Summer Film, Holiday Film, Major Franchise Film (Had a movie this decade that grossed 200m+), Animation, and Comic Book.

 

What I found is that even when accounting for all of these extra variables, YouTube views remained significant (albeit much less significant than when they're the only variable, as you might expect), however the model does a pretty piss poor at actually predicting Domestic Totals and Opening Weekends (as you might expect).  I also found the variables Animation and Major Franchise as significant.

 

I may make a thread sharing the model later.  But I need to improve it some.  I may add some adjusted grosses from other movies this decade to try and get a better set.  I also might try and making separate models for different kinds of films, instead of treating the type of film as a variable.  So a model for big budget studio films, a model for animated movies, etc.  I suspect the latter might work better.

  • Like 3
Link to comment
Share on other sites





1 hour ago, The Panda said:

Okay, so I ran another regression test.  This time I added the variables, Tomatometer, YouTube Views, Audience Score, Screen Count, Budget, Best Picture Nomination, Franchise Film, Summer Film, Holiday Film, Major Franchise Film (Had a movie this decade that grossed 200m+), Animation, and Comic Book.

 

What I found is that even when accounting for all of these extra variables, YouTube views remained significant (albeit much less significant than when they're the only variable, as you might expect), however the model does a pretty piss poor at actually predicting Domestic Totals and Opening Weekends (as you might expect).  I also found the variables Animation and Major Franchise as significant.

 

I may make a thread sharing the model later.  But I need to improve it some.  I may add some adjusted grosses from other movies this decade to try and get a better set.  I also might try and making separate models for different kinds of films, instead of treating the type of film as a variable.  So a model for big budget studio films, a model for animated movies, etc.  I suspect the latter might work better.

 

Perhaps you could also try regressing without certain outliers or influential points. Or perhaps a regression based on medians instead of averages? That line would be a lot more useful if it shot straight through the center of the group instead of pulling upward as it does now.

  • Like 1
Link to comment
Share on other sites





2 hours ago, The Panda said:

Okay, so I ran another regression test.  This time I added the variables, Tomatometer, YouTube Views, Audience Score, Screen Count, Budget, Best Picture Nomination, Franchise Film, Summer Film, Holiday Film, Major Franchise Film (Had a movie this decade that grossed 200m+), Animation, and Comic Book.

 

What I found is that even when accounting for all of these extra variables, YouTube views remained significant (albeit much less significant than when they're the only variable, as you might expect), however the model does a pretty piss poor at actually predicting Domestic Totals and Opening Weekends (as you might expect).  I also found the variables Animation and Major Franchise as significant.

 

I may make a thread sharing the model later.  But I need to improve it some.  I may add some adjusted grosses from other movies this decade to try and get a better set.  I also might try and making separate models for different kinds of films, instead of treating the type of film as a variable.  So a model for big budget studio films, a model for animated movies, etc.  I suspect the latter might work better.

 

What about factoring in social media followers of the movie accounts and lead star(s)?

Edited by Jay Beezy
  • Like 1
Link to comment
Share on other sites





Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.



  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use and Guidelines. Feel free to read our Privacy Policy as well.