Skip to content
/ rushing Public

Rush Ingest: Crawl, download, and parse TV news rush transcripts!

License

Notifications You must be signed in to change notification settings

pezon/rushing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

▄▄▄  ▄• ▄▌.▄▄ ·  ▄ .▄▪   ▐ ▄  ▄▄ • 
▀▄ █·█▪██▌▐█ ▀. ██▪▐███ •█▌▐█▐█ ▀ ▪
▐▀▀▄ █▌▐█▌▄▀▀▀█▄██▀▐█▐█·▐█▐▐▌▄█ ▀█▄
▐█•█▌▐█▄█▌▐█▄▪▐███▌▐▀▐█▌██▐█▌▐█▄▪▐█
.▀  ▀ ▀▀▀  ▀▀▀▀ ▀▀▀ ·▀▀▀▀▀ █▪·▀▀▀▀ 

Rush Ingest: Crawl, download, and parse news TV rush transcripts!

Broadcast and cable news networks often publish rush transcripts on their website. Rushing contains loosely-coupled utilities to allow users to parse rush transcripts. There are also light utilities to screen scrape and crawl transcript indeces.

It's left to the user to find news sources to crawl, build searchable databases, and analyze the transcripts. Rushing just converts it to machine-useable text!

Installing

git clone https://github.com/pezon/rushing.git
python setup.py install

Usage

from rushing.parser import parse_transcript
from rushing.util.web import Webpage

url = http://transcripts.cnn.com/TRANSCRIPTS/1701/29/rs.01.html
body_xpath = '//table[@id=\'cnnArticleWireFrame\']'

with Webpage(url) as webpage:
	webpage_text = webpage.text(select=body_xpath)
	transcript = parse_transcript(webpage_text)

Notes

Currently this project is a development stub pulled out from a different project. This library is not documented and untested. There are no tests, and there are still known edge cases. It may not even install properly.

About

Rush Ingest: Crawl, download, and parse TV news rush transcripts!

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages