Skip to content

Latest commit

 

History

History
156 lines (97 loc) · 8.16 KB

README.md

File metadata and controls

156 lines (97 loc) · 8.16 KB

hawkwatchers

hawkwatchers is an interface allowing users to predict fluctuations in the US Federal interest rate based on the text of press releases, helping them make more informed financial decisions.

Getting Started

hawkwatchers runs on Django and makes use of a variety of Python 3.6 packages.

  1. pandas
  2. numpy
  3. scikit-learn
  4. nltk
  5. pyenchant
  6. beautifulsoup4
  7. urllib3
  8. django

The following built-in modules are also used: sys, re, math, csv

All requisite packages can be installed as such:

$ pip install [package name]

After installing all requisite packages, run the following command within the home directory to view the Django site locally:

$ python3 hawksite/manage.py runserver

Then, type the following URL into your favorite browser to view the site! http://127.0.0.1:8000/hawk_tracker/

Authors

  • Elena Badillo Goicoechea
  • Natasha Mathur
  • Joseph Denby

We are graduate students in the CAPP and MACSS programs at the University of Chicago.

Note on Modified Code

Code from outside sources was used in the following contexts and manners:

Web Scraping: We used the following util functions from files provided in course CAPP 30122 as part of PA #2.

>>> util.read_request()
>>> util.get_request()
>>> util.is_absolute_url()

Text Processing & Model Exploration: Code was inspired by (and heavily modified from) materials from the GitHub repos for Computational Content Analysis and Perspectives on Computational Modeling.

Website Construction:

  • We used the functions, with few or no modifications, that are necessary to run a Django application such as migrations.py and manage.py, which come with Django when it is installed.
  • We wrote the necessary classes and functions to allow online interaction with potential users for the website via queries (i.e. texts with monetary policy content they want us to analyze) and our models predictions. This was done closely following the guidelines and using the material learned in the Django workshop provided by the course instructors, as well as tutorials and documentation they suggested.
  • When designing the aesthetics for the web pages we used html code snippets from a Bootstrap template.

Data Collection

Web Scraping

The press releases we used were scraped from the Federal Reserve. The releases are stored in different locations of the Fed website. For 2013 - 2018, and for pre-2013, as well as in different formats between pre- and post-2006. As such, we employed separate web scrapers functions and scripts.

scraper/scrape.py (Elena, Natasha, Joseph) (Original / Heavily Modified)

scraper/util.py (Direct copy)

Rate Sources

The effective federal funds rate (EFFR) was collected from the Federal Reserve Bank of St. Louis. There is a period of time when the EFFR does not change (staying at its zero lower bound). For that time range, the shadow interest rate calculated by Wu/Xia was used (measuring other menas of monetary easing the Fedc used at the time). It was sourced from the Federal Reserve Bank of Atlanta. It is housed within data/all_rates.csv.

The Labor Market Conditions Index was used as a means of comparison to our text model. The data was downloaded from the Federal Reserve Bank of Kansas City. It is housed within data/Labor_Conditions_Index.csv.

Combined Data

The data as used for modeling is consolidated in the following .csv files:

data/allratesdf.csv (Joseph) (Original)

  • Contains Federal Funds rate data since Feb. 1994

data/allreleasescleaned.csv (Joseph) (Original)

  • Contains press release text data since Feb. 1994

data/allreleaserates.csv (Joseph) (Original)

  • Merged representation of above two datasets

data/Labor_Conditions_Index.csv (Elena and Joseph) (Original)

  • Contains LMCI data from Feb. 1994

Model Construction

nltk_processing.ipynb (Joseph) (Original / Heavily Modified)

The above notebook contains the code to clean and combine the aforementioned data sources, as well as several modeling techniques and methods of validation.

The consolidated modeling code used for the Django site is outlined in the file below:

hawkwatchers/hawksite/hawk_tracker/nn_model.py (Joseph) (Original)

Website Construction

The form classes, model classes, views.py, and templates are original.

In the hawkwatchers folder:

hawksite/hawk_tracker/__pycache_ (Generated by Django)

hawksite/hawk_tracker/migrations (Direct)

hawksite/hawk_tracker/static/hawk_trackerstyle2.css (Natasha) (Modified)

hawksite/hawk_tracker/admin.py (Elena) (Modified)

hawksite/hawk_tracker/apps.py (Generated by Django)

hawksite/hawk_tracker/forms.py (Elena) (Original/Heavily Modified)

hawksite/hawk_tracker/models.py (Elena) (Original/Heavily Modified)

hawksite/hawk_tracker/nn_model.py (Joseph) (Original)

hawksite/hawk_tracker/tests.py (Generated by Django)

hawksite/hawk_tracker/urls.py (Elena) (Heavily Modified)

hawksite/hawk_tracker/views.py (Elena and Natasha) (Heavily Modified)

hawksite/hawk_tracker/templates/hawk_tracker (Elena and Natasha) (Heavily Modified)

Instructions to Run All Code

Link & Text Scraping

All functions/scripts relating to data collection via webscraping are contained within the scraper directory.

Running the following command within that directory will scrape all relevant press releases (by aggregating links to press releases and scraping via the appropriate HTML format) and output the file scrapeddata.csv, which contains all Fed press releases since January 1994 paired with their release date:

$ python3 scrape.py

Ignore any InsecureRequestWarning error messages; they are merely a product of the scraping package used and are inconsequential for our purposes.

Data Cleaning and Model Exploration

In nltk_processing.ipynb lives the code used to clean and aggregate the data scraped in the above step into a manageable format. Further, it contains code responsible for processing the text data and employing it to train a variety of classification models from scikit-learn, as well as code for assessing and validating those models (e.g., classification reports, LOOCV, Bootstrapping, etc.). It is not essential to the running of our Django site – it stands simply as a record of our data cleaning, EDA, and modeling thought processes. Anyone wishing to recreate the steps we took to achieve our final modeling decisions would find everything they need for this step within this file.

Final Model & Website

All code related to the website is contained within the hawksite directory. To view the website locally, ensure that you have all requisite packages installed, then run the following command within that directory:

$ python3 manage.py runserver

Then, type the following URL into your favorite browser to view the site! http://127.0.0.1:8000/hawk_tracker/