- SCHOOL_LOCATIONS_2014_2015_JSON.json
- SCHOOL_LOCATIONS_2015_2016_JSON.json
- SCHOOL_LOCATIONS_2016_2017_JSON.json
- HS_SQR_2014_2015_Summary.csv
- HS_SQR_2015_2016_Summary.csv
- HS_SQR_2016_2017_Summary.csv
- EMS_SQR_2014_2015_Summary.csv
- EMS_SQR_2015_2016_Summary.csv
- EMS_SQR_2016_2017_Summary.csv
B. Modify HDFS paths in School_Locations.scala, Updated_EMS_HS.scala, Updated_Join_Latitude_Longitude.scala
spark-shell --packages com.databricks:spark-csv_2.10:1.5.0
:load School_Locations.scala
:load Updated_EMS_HS.scala
:load Updated_Join_Latitude_Longitude.scala
- resources/crime/RawData/misdemeanor-offenses-by-precinct-2000-2017.csv
- resources/crime/RawData/non-seven-major-felony-offenses-by-precinct-2000-2017.csv
- resources/crime/RawData/seven-major-felony-offenses-by-precinct-2000-2017.csv
- resources/crime/RawData/violation-offenses-by-precinct-2000-2017.csv
spark-shell --packages com.databricks:spark-csv_2.10:1.5.0
:load CrimeDataETL.scala
All files are in resources/housing/rawData
or from Dumbo
- /user/sc2936/housingSalesRaw/rollingsales_bronx.csv
- /user/sc2936/housingSalesRaw/rollingsales_brooklyn.csv
- /user/sc2936/housingSalesRaw/rollingsales_manhattan.csv
- /user/sc2936/housingSalesRaw/rollingsales_queens.csv
- /user/sc2936/housingSalesRaw/rollingsales_statenisland.csv
These last five folders for "borough"_sales_prices each contain seperate files of data per year 2005-2016
- /user/sc2936/housingSalesRaw/bronx_sales_prices
- /user/sc2936/housingSalesRaw/brooklyn_sales_prices
- /user/sc2936/housingSalesRaw/manhattan_sales_prices
- /user/sc2936/housingSalesRaw/queens_sales_prices
- /user/sc2936/housingSalesRaw/staten_island_sales_prices
- You must run Recent-And-2017-Sales.scala first as it will generate the input file "housingSalesClean/new2017_5.1.18" for Historical-Housing.scala
- It will also generate "summary_2017_2018_5.1.2018", which is an input of recent sale prices for the model
- Historical-Housing.scala will generate one of the input files for the model "housingSalesClean/historical_all_buildingType_5.1.18"
spark-shell --packages com.databricks:spark-csv_2.10:1.5.0
:load Recent-And-2017-Sales.scala
:load Historical-Housing.scala
output files can be found at:
- resources/housing/historical_all_buildingType_5.1.18.csv
- resources/housing/summary_2017_2018_5.2.2018
- resources/crime/CrimeWithPrediction.csv
- resources/school/Final_Output/NYC_School_Data.csv
The output of the model will generate the input for the geo json builder -> resources/model/Output_H.0.35_C.0.4_S.0.25_6.csv
The weights for the model are hard coded within the file and changed for each iteration.
The best wieghts are currently in use:
val housing_weight = .35
val crime_weight = .4
val school_weight = .25
The model will also output a file to manually review goodness -> resources/model/Goodness_H.0.35_C.0.4_S.0.25_6.csv
spark-shell --packages com.databricks:spark-csv_2.10:1.5.0
:load Model.scala
- model result: resources/model/Output_H.0.35_C.0.4_S.0.25_6.csv
- NYC GeoJSON: resources/nyc.geojson
- Result: resources/model/JsonBuilderResult.json
- python -m SimpleHTTPServer
- open index.html or go to 0.0.0.0:8000
- The red areas reveal the top five neighborhoods that are about to pop with respect to your budget and home needs (some budget ranges will reveal fewer than five)
- These areas will update as you change your budget and home type
- If you would like to see the data relating to the top five ranks, view the console log.