2017.5.11: For Edgar MD&A Extraction, see edgar-10k-mda
To see full command: python crawl10k.py -h
-
Class FormIndex: - First we download the full indexes with year range(urls of form10k files) - Save to csv file
-
Class Form: - We download with http requests(edgar closed ftp service since 2017) with previously downloaded form indices
- The 10k are stored in html format, so use BeautifulSoup to parse the raw html and also preprocess text for easier MDA finding
- Save to txt dir in 'filename.txt'
- Class MDAParser: - Try to extract MDA section from preprocessed text - Save file to mda dir in 'filename.mda' - Save parsing results to 'parsing.log', shows SUCCESS/FAILURE of each file
II. Sentiment Analysis with Bill McDonald's Code (Code can be found at http://sraf.nd.edu/textual-analysis/)
- Specify mda files, dictionary file & result csv file in Generic_Parser.py
- run 'python Generic_Parser.py'
- Code has been modified to add CIK for this repo(CIK is included in filename in the first section)