Hello, welcome to my blog. I know it’s been long since my
last post – I apologize for that. I have been quite busy for the past few weeks
with some projects and I have not had any time to write. One of the things that
has kept me busy is some of the courses I have been taking on Coursera –
particularly the Statistics with R specialization.
In this post, I will present the project I did for one of the courses of this
specialization.
The project involved answering a research question using a dataset
containing information from Rotten Tomatoes and IMDB for a random sample of
movies released before 2016. The goal of this project was to identify
attributes of movies that make them popular using a multiple linear regression
model. To view the project click here.
For my research question, I developed two multiple regression
models for predicting audience and critics scores for a movie respectively.
INTERPRETATION OF RESULTS
The results of the project indicate that the ‘Drama’ genre is
significantly associated with higher audience scores (since its coefficient is
positive). Concretely, if all other variables in the model are held constant,
the audience score for movies that belong to the Drama genre is expected on
average to be higher by about 10.3 points.
We also see that the variable best_dir_win (i.e. whether or not the director of the movie has won
an Oscar or not) is significantly associated with higher critics scores. Concretely,
if all other variables in the model are held constant, the critic’s score for
movies whose directors have won an Oscar is expected on average to be higher by
about 13.4 points. Note that even though this variable is a significant
predictor of critic’s scores, it is not
a significant predictor of audience scores. This should not come as a surprise
because I think audiences may not really care if the director of a movie has
won an Oscar whereas it may be important to critics.
PREDICTION
Next, I predicted the critic’s and audience scores for the
movie X-Men Apocalypse. I obtained a
critics score of 60% with a 95% confidence interval of (46.1, 73.2) and an
audience score of 69% with a 95% confidence interval of (59.4, 79.0). In other
to verify how good my predictions were, I checked the Rotten Tomatoes website
for the actual scores for the movie.
The audience score for the movie was 71% which is pretty close to the model’s prediction of 69% while the critics for the movie was 48% which although is not very close to our prediction of 60% is still within the 95% confidence interval.
Note that even though we
did not get the exact prediction correct, the 95% confidence interval we obtained contained the critics/audience score for the movie which is very good.
SUMMARY
In this blog post, I presented a project which aims to
identify the variables associated with higher or lower critics/audience scores.
I interpreted the results of the project and I also used the model to predict
the critics and audience score for a movie.
I have some news I would like to announce. I am currently in
a sort of ‘partnership’ with Jumia travel so I will be publishing some articles
related to travel, tourism and hospitality in Nigeria in the coming weeks. The
first of these posts should be out hopefully this coming week. I hope you enjoy
them in addition to my posts about data science and machine learning.
Thank you once again for reading my blog – in spite of my
absence. I will try to make my posts more frequent in the future. Please
subscribe to my blog in case you have not done so. Until next time. Cheers!!!
No comments:
Post a Comment