The goal of this project is for you to use EDA, visualization, data cleaning, preprocesing, and linear models to predict home prices given the features of the home, and interpret your linear models to find out what features add value to a home! This project is a bit more open-ended than project 1.
Be sure to ...
- Think about your choices when it comes to your choices about the data. Be ready to defend your decisions!
- Use lots of plots to dig deeper into the data! Describe the plots and convey what you learned from them.
- Don't forget to read the description of the data at the kaggle website! This has valuable information that will help you clean and impute data.
NaN
means something in many of the columns! Don't just drop or fill them! - Try fitting many models! Document your work and note what you've tried.
- Apply what you've learned in class, books, videos, and blog posts.
From the Kaggle competition website:
Ask a home buyer to describe their dream house, and they probably won't begin with the height of the basement ceiling or the proximity to an east-west railroad. But this playground competition's dataset proves that much more influences price negotiations than the number of bedrooms or a white-picket fence.
With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, this competition challenges you to predict the final price of each home.
- Assigned: Tuesday, 10/10/2017
- Project Due Date: Friday, 10/20/2017 (Submission form here)
- Self-Evaluation Due Date: Sunday, 10/22/2017 (Submission form here)
- You should be working on a fork of the GA project one repository.
- Use git to manage versions of your project. Make sure to
add
,commit
, andpush
your changes to your fork of the github
- You will be generating long data strutures- avoid displaying the whole thing. Display just the first or last few entries and look at the length or shape to check whether your code gives you back what you want and expect.
- Make functions whenever possiblle!
- Be explicit with your naming. You may forget what
this_list
is, but you will have an idea of whatpassenger_fare_list
is. Variable naming will help you in the long run! - Don't forget about tab autocomplete!
- Use markdown cells to document your planning, thoughts, and results.
- Delete cells you will not include in your final submission
- Try to solve your own problems using this framework:
- Check your spelling
- Google your errors. Is it on stackoverflow?
- Ask your classmates
- Ask a TA or instructor
- Do not include errors or stack traces (fix them!)