GitHub - notnews/good_nyt: Patterns in NYT production from 1987 to 2007

The Good NYT

The New York Times (NYT) is the nation's newspaper of record. It is both well-regarded and popular. It has won more Pulitzer awards than any other newspaper. And it is the 30th most visited website in the U.S. (as of October, 2017).

We explore some patterns in production of NYT between 1987 and 2007 using the annotated New York Times Corpus.

Data Analysis

Convert NYT Corpus to CSV, and Recode
Not News
Has the proportion of news stories about topics unrelated to politics or the economy, such as, cooking, travel, fashion, music, etc., gone up over time?

We measure kinds of news stories using news.desk and online.section. (See the script for other ideas for how we can measure the kind of news.)
- Proportion of Apolitical News Over Time: Script and Figure: Entire Newspaper (Using News Desk), Figure: Section A1 (Using News Desk), and Figure: Entire Newspaper (Using News Desk and Online Section)
Urban Vs. Rural
We use the locations (hand indexed), online.locations (algorithmically generated), and dateline fields to estimate rural vs. urban coverage within the US.
- Script and Figure
National Vs. International We use the news.desk field Foreign News to estimate coverage of foreign news. We can also use the locations (hand indexed), online.locations (algorithmically generated), and dateline fields to estimate national vs. international coverage.
- Proportion of Foreign Desk Stories Over Time: Script and Figure.
Corrections
We use the correction.date and correction.text to estimate rate of corrections over time, and what is being corrected (later).
- Proportion of Corrections Per Year: Script and Figure.
Length of Articles
We use the word.count field to estimate average length of articles and how it has changed over time.
- Article Word Count Over Time: Script, Figure: Average Word Count, and Figure: Median Word Count.
Number of Authors per Article
We use normalized.byline to estimate number of authors per article and how that has changed over time.
- Average Number of Authors per Article Over Time: Script and Figure.
No. of Articles per Author per Year
One common conjecture is that people are producing more. Is that true? We use the normalized.byline field to estimate average number of articles per year per author and how that metric has evolved over time.
- No. of articles per author per year over time: Script and Figure.
Proportion of Wire Stories
Using byline.
- Proportion of AP and Reuters Stories Over Time: Script and Figure
Race and Gender of Reporters
We use normalized.byline to get the names of the reporters. And we use the gender package and the ethnicolr package to impute gender and race of reporters.
- Proportion of Female Journalists on Staff Over Time: Script and Figure.
- Average Number of Female Journalists per Article Over Time: Script and Figure.

Author

Gaurav Sood

License

Released under CC BY 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
figs		figs
scripts		scripts
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Good NYT

Data Analysis

Author

License

About

Releases

Languages

notnews/good_nyt

Folders and files

Latest commit

History

Repository files navigation

The Good NYT

Data Analysis

Author

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Languages