You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This project is about predicting the box office performance of movies for their opening weekend. The group collects data from four different movie sites and aims to assist marketers and decisions about how many theaters to plan to release a movie at.
Some highlights of the project:
Data examples and features are pulled from many sources and aggregated, but the process is very well outlined and explained as it progresses. Reasoning is included as are definitions of what values made up the features.
The process of cleaning the data and extracting from the examples and features pulled from websites is very methodical, and each decision to remove information is well explained to the reader.
Visualizations of each model are very helpful to follow how the performance is doing and view the accuracy of each.
Some room for future improvements:
The number of sources the team pulls data from is impressive, but although I understand data scraping is time consuming, I would not be convinced that the number of samples used to train the model is sufficient to make accurate predictions moving forward. More than one year of movies would also likely be more accurate to capture trends better.
You mention using the column mean to fill missing values, but given the number of missing values is 84 and 41 out of only 165, I wonder if the mean is really the best choice for say all 84 missing values. Did you try any other methods such as matrix completion, or possibly removing this column altogether to see how it really affects the accuracy of the predictions? It is later said director_gross does not seem to be highly correlated with open_gross, and I wonder if this is due to the number of values imputed with the mean of the other 81/165 values in the column.
The page of visualizations is a little hard to follow. Maybe including the most indicative graphs of important features would have been easier to include. As a reader I am not sure what to focus on for this page.
Overall, I can tell a lot of work has been put into this project, and it was definitely one of the more interesting ones to read. Great work!
The text was updated successfully, but these errors were encountered:
This project is about predicting the box office performance of movies for their opening weekend. The group collects data from four different movie sites and aims to assist marketers and decisions about how many theaters to plan to release a movie at.
Some highlights of the project:
Some room for future improvements:
Overall, I can tell a lot of work has been put into this project, and it was definitely one of the more interesting ones to read. Great work!
The text was updated successfully, but these errors were encountered: