GitHub - pezon/rushing: Rush Ingest: Crawl, download, and parse TV news rush transcripts!

▄▄▄  ▄• ▄▌.▄▄ ·  ▄ .▄▪   ▐ ▄  ▄▄ • 
▀▄ █·█▪██▌▐█ ▀. ██▪▐███ •█▌▐█▐█ ▀ ▪
▐▀▀▄ █▌▐█▌▄▀▀▀█▄██▀▐█▐█·▐█▐▐▌▄█ ▀█▄
▐█•█▌▐█▄█▌▐█▄▪▐███▌▐▀▐█▌██▐█▌▐█▄▪▐█
.▀  ▀ ▀▀▀  ▀▀▀▀ ▀▀▀ ·▀▀▀▀▀ █▪·▀▀▀▀

Rush Ingest: Crawl, download, and parse news TV rush transcripts!

Broadcast and cable news networks often publish rush transcripts on their website. Rushing contains loosely-coupled utilities to allow users to parse rush transcripts. There are also light utilities to screen scrape and crawl transcript indeces.

It's left to the user to find news sources to crawl, build searchable databases, and analyze the transcripts. Rushing just converts it to machine-useable text!

Installing

git clone https://github.com/pezon/rushing.git
python setup.py install

Usage

from rushing.parser import parse_transcript
from rushing.util.web import Webpage

url = http://transcripts.cnn.com/TRANSCRIPTS/1701/29/rs.01.html
body_xpath = '//table[@id=\'cnnArticleWireFrame\']'

with Webpage(url) as webpage:
	webpage_text = webpage.text(select=body_xpath)
	transcript = parse_transcript(webpage_text)

Notes

Currently this project is a development stub pulled out from a different project. This library is not documented and untested. There are no tests, and there are still known edge cases. It may not even install properly.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
rushing		rushing
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installing

Usage

Notes

About

Releases

Packages

Languages

License

pezon/rushing

Folders and files

Latest commit

History

Repository files navigation

Installing

Usage

Notes

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages