Philosophy in Early New Zealand Newspapers

The files in this repository are work towards an investigation of philosophical content, broadly understood, in early New Zealand Newspaper writing (using the New Zealand National Library's Papers Past Open Data Pilot dataset https://natlib.govt.nz/about-us/open-data/papers-past-metadata/papers-past-newspaper-open-data-pilot)

The directories contain:

NPOD_Starter: the starter corpus from the National Library of New Zealand
classifiers: trained classification models (pickled)
dictionaries: dictionaries generated from various subsets of the corpus with gensim
lda_models: trained lda topic models
pickles: pickles of various subsets of the dataset. Note: some pickled corpora are too large for GitHub.
presentation: Latex code for project presentation
report: Latex code for project report.

The jupyter notebooks have the following roles:

'Classifying texts.ipynb': code used to assign categorical labels to articles.
'Entity Extraction *.ipynb': application of spaCy to extract named entities and proper nouns from corpora.
'NaiveBayes_PhilosoClassification*.ipynb': application of Naive Bayes classifiers trained on labelled dataset and then applied to the corpus as a whole.
'*_exp.ipynb': Use of collocation, cooccurence, and concordancing to explore candidate corpora.
'starter_topicmodels.ipynb': Use of gensim topic modelling to explore the 'Starter kit' of the dataset.
'Religion and Evolution in the REL corpus.ipynb': what the filename says.
'NZ Content': looking for NZ-specific content in the NB2 corpus.
'Relabelling.ipynb': Proposals to improve labelling, begun but not completed.

Various scripts are also included:

'NL_helpers.py': a set of helper functions used in the notebooks above
'NL_topicmodels.py': a corpus class for use with gensim and helpers specifically for the topic modelling side of the project.
'generate_corpus_df.py': script to go from dataset stored in tarballs to a collection of pickled pandas dataframes.
'keywords_from_corpus.py': a script to search for keywords in the complete corpus using dataframes generated by 'generate_corpus_df.py'
'cooccurrence.py': a script to generate cooccurrence scores for given terms and store the results in a dataframe. This is particularly useful for the Dash app (in a distinct github repository).
'add_cooccurrence_terms.py': Used to add terms to already generated cooccurrence dataframes.
'generate_*.py': scripts to generate various useful outputs.
'corpus2markdown.py': Takes a corpus and saves it as a series of Markdown files with links to Papers Past website.

This repository contains almost all code I have used in the course of the project, but does not contain all of the data (too big for github). Much of the code is in rough-and-ready script form and has not been tidied to the point which would be required for a complete recreation of the project.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
NPOD_Starter		NPOD_Starter
classifiers		classifiers
dictionaries		dictionaries
lda_models		lda_models
pickles		pickles
presentation		presentation
report		report
.gitattributes		.gitattributes
Classifying texts.ipynb		Classifying texts.ipynb
Entity Extraction with Philoso Subset.ipynb		Entity Extraction with Philoso Subset.ipynb
Entity Extraction with Rel Subset.ipynb		Entity Extraction with Rel Subset.ipynb
Entity_NB2.ipynb		Entity_NB2.ipynb
Exploring_NB2Corpus.ipynb		Exploring_NB2Corpus.ipynb
NL_helpers.py		NL_helpers.py
NL_topicmodels.py		NL_topicmodels.py
NZ content.ipynb		NZ content.ipynb
NaiveBayes_PhilosophyClassification.ipynb		NaiveBayes_PhilosophyClassification.ipynb
NaiveBayes_PhilosophyClassification_SecondSet.ipynb		NaiveBayes_PhilosophyClassification_SecondSet.ipynb
NaiveBayes_Rel_Classification.ipynb		NaiveBayes_Rel_Classification.ipynb
README.md		README.md
Relabelling.ipynb		Relabelling.ipynb
Religion and Evolution in the REL corpus.ipynb		Religion and Evolution in the REL corpus.ipynb
add_cooccurrence_terms.py		add_cooccurrence_terms.py
code_search.py		code_search.py
cooc_2.py		cooc_2.py
cooccurrence_df.py		cooccurrence_df.py
corpus2markdown.py		corpus2markdown.py
corpus_count.py		corpus_count.py
external_validation.py		external_validation.py
generate_choropleth.py		generate_choropleth.py
generate_codes2names_dictionary.py		generate_codes2names_dictionary.py
generate_corpus_df.py		generate_corpus_df.py
generate_desired_docs_table.py		generate_desired_docs_table.py
generate_matrices_entities.py		generate_matrices_entities.py
generate_matrices_propn.py		generate_matrices_propn.py
generate_region_dict.py		generate_region_dict.py
generate_word_clouds.py		generate_word_clouds.py
keywords_from_corpus.py		keywords_from_corpus.py
missing_codes.py		missing_codes.py
multithread_keyword_search.py		multithread_keyword_search.py
nb1_subset_exp.ipynb		nb1_subset_exp.ipynb
nb2_cooccurrence_df.py		nb2_cooccurrence_df.py
nb2_subset_exp.ipynb		nb2_subset_exp.ipynb
nb2_topicmodel_objective_measures.py		nb2_topicmodel_objective_measures.py
philo_subset_exp.ipynb		philo_subset_exp.ipynb
rel_cooccurrence_df.py		rel_cooccurrence_df.py
rel_cooccurrence_df_fix.py		rel_cooccurrence_df_fix.py
rel_subset_exp.ipynb		rel_subset_exp.ipynb
relsci_topicmodels.py		relsci_topicmodels.py
slice_corpus_dfs.py		slice_corpus_dfs.py
starter_topicmodels.ipynb		starter_topicmodels.ipynb
starter_topicmodels.py		starter_topicmodels.py
subset_corpus.py		subset_corpus.py
subset_topicmodels.py		subset_topicmodels.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Philosophy in Early New Zealand Newspapers

About

Releases

Packages

Languages

JoshuaWilsonBlack/NPOD_Philosophy

Folders and files

Latest commit

History

Repository files navigation

Philosophy in Early New Zealand Newspapers

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages