Jupyter notebooks for digital humanities

En: Jupyter notebooks are useful for organizing and documenting code, and embedding code within scholarly arguments and/or pedagogical materials. The following list of notebooks for digital humanities purposes was sourced from Twitter in June 2019, but PRs with suggested additions are welcome! If you only want notebooks in English, search for "en".

De: Jupyter-Notizbücher eignen sich zum Organisieren und Dokumentieren von Code und zum Einbetten von Code in wissenschaftliche Argumente und / oder pädagogische Materialien. Die folgende Liste von Notizbüchern für die Digital Humanities wurde im Juni 2019 von Twitter bezogen, PRs mit Ergänzungsvorschlägen sind jedoch willkommen! Wenn Sie Notizbücher nur in Deutsch möchten, suchen Sie nach der Abkürzung "de".

Es: Los cuadernos Jupyter son útiles para organizar y documentar códigos, e incorporar códigos dentro de argumentos académicos y / o materiales pedagógicos. La siguiente lista de cuadernos para fines de humanidades digitales se obtuvo de Twitter en junio de 2019, ¡pero los RP con sugerencias de adiciones son bienvenidos! Si quieres cuadernos en español, busca en esta página por "es".

Fr: Les cahiers Jupyter sont utiles pour organiser et documenter le code, et pour incorporer du code dans des arguments scientifiques et / ou du matériel pédagogique. La liste suivante de cahiers à des fins de sciences humaines numériques a été extraite de Twitter en juin 2019, mais les PR avec les ajouts suggérés sont les bienvenus! Si vous voulez des cahiers en français, cherchez sur cette page l'abréviation "fr".

Research & projects

(de/Python) Modul Datenanalyse & -auswertung: Die Repräsentanz von weiblichen Sprecherinnen in den Theaterstücken der deutschen und französischen DramaCorpora. By Janina Pingel und Vivian Schlosser.
(en/Python) Notebooks for "A Gospel of Health and Salvation: Modeling the Religious Culture of Seventh-day Adventism, 1843-1920" dissertation. By Jeri Wieringa.
(en/Python) AKU Classification: using machine learning to classify Hieratic script. By Bernhard Bermeitinger.
(es/Python) Autoría y estilo: Análisis estilométrico mediante clasificación de la Conquista de Jerusalén: Código de análisis del artículo realizado por Juan Cerezo Soler y José Calvo Tello.
(en/Python) Bughunt: text mining of English children's literature 1789-1914 for the representation of insects and other creepy crawlies, by Mary Chester-Kadwell, on Binder
(en/Python) CORDE: analyzes the genres of the diachronic Spanish diachronic corpus, by José Calvo.
(en/Python) Delta Inside Valle-Inclán: Stylometric Classification of Periods and Groups of His Novels: uses machine learning and stylometry to analyze the periodization and groups of novels from the Spanish novelist Valle-Inclán. By José Calvo.
(en/Python) Exploring A Medical History of British India: a pedagogical notebook demonstrating how to investigate questions using the Natural Language Toolkit (NLTK), including "How are the native Indian populations discussed?" and "What was the colonial attitude towards prostitution?" and "What was the perception of people with mental illness?". Developed by the National Library of Scotland.
(en/Python) Exploring Britain and UK Handbooks: a pedagogical notebook exploring how to investigate questions using the Natural Language Toolkit (NLTK), including "How has the UK portrayed itself to the outside world?" and "Which topics were written about with more or less detail as the years passed?". Developed by the National Library of Scotland.
(en/Python) Exploring Edinburgh Ladies’ Debating Society: a pedagogical notebook exploring how to investigate questions using the Natural Language Toolkit (NLTK), including "How similar or different is the vocabulary used in the two publications?" and "Who is named in the publications?" and "Which topics were written about with more or less detail as the years passed?". Developed by the National Library of Scotland.
(en/Python) Exploring Lewis Grassic Gibbon First Editions: a pedagogical notebook exploring how to investigate questions using the Natural Language Toolkit (NLTK), including "What are the most common words and bigrams in Gibbon’s works?" and "How does Gibbon’s vocabulary change from one of his works to another?" and "How can one visualise the diversity of Gibbon’s word choice for each year he published a work?". Developed by the National Library of Scotland.
(en/Python) Exploring The National Bibliography of Scotland (version 1): a pedagogical notebook showing how to load and work with XML, and export results as a CSV. Research questions include "What years are books published in the National Bibliography of Scotland so far?" and "In what languages are works written in the National Bibliography of Scotland so far?". Developed by the National Library of Scotland.
(en/Python) Genizah Medical Data Visualisation: using Python data science tools to undertake exploratory data analysis on metadata from Cambridge Digital Library's Cairo Genizah collection, by Hal Blackburn, on Binder.
(en/Python) "Late Style" PCA: code and prose related to a computational literary analysis which attempt to test literary critical claims of authorial periodicity known as "late style," and popularized most recently by Edward Said's book On Late Style. By Jonathan Reeve.
(en/Python) Linguistic Fingerprints on Translation’s Lens: Jupyter notebook for April 15, 2019 presentation: workflow and analysis for a presentation, by Quinn Dombrowski, Yulia Ilchuk, Antonio Lenzo, J.D. Porter.
(en/Python) Phraseorom: evaluates whether different formalizatio ns of the concept of the "lexical unit" get better results when evaluating literary genres in Spanish, by José Calvo.
(en/Python) Save Page Now: notebooks that document research into the Internet Archive's Save Page Now web archive data. The project is a collaboration between Shawn Walker, Jess Ogden and Ed Summers.
(en/Python) Socialist Realism Project: uses text analysis techniques to examine the characteristics of Soviet socialist realism of the pre-WII era. By Sarah McEleney.
(en/Python) Topic modeling Public Archive of Police Violence in Cleveland transcripts: an experiment to see if it's possible to use topic modeling to guide the use of tags or a controlled vocabulary for the content that's stored in Omeka. By Ed Summers.
(en/Python) What We Talk About When We Talk About Digital Humanities: supporting code for a blog post, looking at the use of the terms 'digital' and 'humanities' as they are used in Matthew K. Gold's edited volume Debates in the Digital Humanities, by Teddy Roland.

Course materials

(en/Python) Applied Data Analysis as taught at DHOxSS 2019: covers tidying data, visualization, modeling, and advanced applications of data analysis. By Giovanni Colavizza and Matteo Romanello. On Binder
(en/Python) Applied Natural Language Processing course at UC Berkeley: covers impact of tokenization choices on sentiment classification, distinctive terms using different methods, text classification, hyperparameter choices for classification accuracy, hypothesis testing, word embeddings, CNN and LTSM, social networks in literary texts, and more. By David Bamman.
(en/Python) Becoming a Historian course at UC Berkeley: notebook with introduction to Python using AHA job posting data, by Chris Hench.
(en/Python) Chinatown and the Culture of Exclusion course at UC Berkeley: using demographic data from the 20th-21st century, this module has students analyzing how a specific Chinatown, such as SF Chinatown, has changed over time. Students use some simple computational text analysis methods to explore and compare the structures of poems written on Angel Island and in Chinatown publications from the early 20th century. By Michaela Palmer, Maya Shen, Cynthia Leu, Chris Cheung, course taught by Amy Lee.
(en/Python) Data Arts course at UC Berkeley: notebooks looking at coincidence, correlation, and causation; and the evolution of social networks over time. Course by Greg Niemeyer.
(en/Python) Deconstructing Data Science course at UC Berkeley, by David Bamman, on Binder.
(en/Python) European Economic History course at UC Berkeley: notebooks related to the Industrial Revolution and the rise of the European economy to world dominance in the 19th century, emphasizing the diffusion of the industrial system and its consequences, the world trading system, and the rise of modern imperialism. Developed by Alec Kan, Beom Jin Lee, Anusha Mohan.
(en/Python) History data science connector course at UC Berkeley: various notebooks for analyzing historical data using data science methods.
(en/Python) Introduction to Cultural Analytics: Jupyter Book (collection of Jupyter notebooks) textbook for learning Python for humanities and social science research. Text analysis chapter includes a section on working in languages beyond English, with examples for Chinese, Danish, Portuguese, Russian, and Spanish. Developed by Melanie Walsh.
(en/Python) Introduction to Regular Expressions: teaching students the basics of regular expressions. Developed by Lauren F. Klein.
(en/Python) Japanese Internment course at UC Berkeley: notebook for mapping the beginning and end coordinates of people who migrated from one location to another after being placed in internment camps. By Melanie Yu, Andrew Linxie, Nga Pui Leung, and Francis Kumar.
(en/Python) Literature and data course at UC Berkeley: a mix of readings and Jupyter notebooks that experiment with popular statistical methods that have recently gained visibility in literary study, and consider them as forms of “distant reading.” By Teddy Roland.
(de/Python) Seminar »Methoden computergestützter Textanalyse«: Das Seminar wendet sich diesen Möglichkeiten computergestützter Textanalyse zu. Neben der Diskussion der theoretischen und methodologischen Grundlagen geht es insbesondere um die praktische Anwendung der entsprechenden Verfahren. Am Beispiel der Programmiersprache Python soll gezeigt werden, wie sich konkretes Textmaterial aufbereiten, analysieren und interpretieren lässt. Von Frederik Elwert.
(en/Python) Sumerian text analysis course at UC Berkeley: intro to Python, how to find differences between texts based on their words, and how to visualize the results. By Jonathan Lin, Stephanie Kim, Erik Cheng, and Sujude Dalieh; course taught by Niek Veldhius.
(en/Python) Text analysis for graduate medievalists course at UC Berkeley: an introduction into parsing and performing text analysis on medieval manuscripts using Python. By Mingyue Tang, Sierra Blatan, Shubham Gupta, Tejas Priyadarshan, and Sasank Chaganty.

Learning Python

(es/Python) Escuela de Verano 2019 - UNED: Introducción al modelado conceptual, introducción al procesamiento de textos con Python, usando librerías de Python para procesamiento de texto. Una introducción práctica a la exploración, análisis y manipulación del texto, aproximaciones modernas al análisis de texto, por Javier de la Rosa.
(en/Python) Intro to Python (Stanford CIDR): covers basic syntax in Python for variables, functions, and control flow. By Scott Bailey, Javier de la Rosa, Ashley Jester. (Filled-in version)
(fr/Python) Introduction à Python et au développement web avec Python pour les sciences humaines, par Thibault Clérice, traduit du matériel de Matthew Munson.
(en/Python) Introduction to Text-Mining with Python: notebooks from the Cambridge Digital Humanities 'Introduction to Text-Mining with Python' workshop series in the Cambridge Digital Humanities Learning programme 2019, by Mary Chester-Kadwell, on Binder.
(en/Python) Python Introduction Notebooks, by Erik Fredner
(en/Python) Python Programming for the Humanities, by Matthew Munson; 4 chapters adapted from Folgert Karsdorp.
(en/Python) Python Programming for the Humanities, by Folgert Karsdorp; 10 chapters.
(en/Python) Introduction to Python: created for the “Introduction to Python” session at the Historical Network Research workshop 2015. By Frederik Elwert.

Text analysis

Word vectors

(en/Python) Beyond the Black Box: Word2vec: uses one model for generating word vectors, word2vec, to explain how distance between terms is calculated by these models, and will show participants how to create, and how to begin to interpret, their own word embedding models. By Teddy Roland.
(en/Python) Word vector workshop materials: includes Understanding Word Vectors with Visualization, Word Vectors via Word2Vec, Pre-trained Models and Extended Vector Algorithms, and Role of Bias in Word Embeddings. By Eun Seo Jo, Javier de la Rosa, and Scott Bailey. On Binder: part 1, part 2, part 3, part 4.

Text classification

(en/Python) Intro to Machine Learning for text: covers how to identify when a problem related to a textual dataset might be approached using a classification strategy, and the different steps involved in the general workflow of machine learning using scikit-learn. Uses document classification to sort literary text by genre. By Scott Bailey, Javier de la Rosa, Ashley Jester. (Filled-in version)
(en/Python) Training and Fine-Tuning BERT for Classification: Classfying Goodreads Reviews By Book Genre - part of BERT for Humanists, where you can "fine-tune a BERT model on Goodreads reviews from the UCSD Book Graph with the goal of predicting the genre of the book being reviewed". By David Mimno, Melanie Walsh, Maria Antoniak. Uses Google Colab.

General NLP

(en/Python) Intro to NLP: provides a basic understanding of natural language processing, and enough familiarity with one NLP package, Textblob, to perform basic NLP tasks like tokenization and part of speech tagging. By Scott Bailey, Javier de la Rosa, Ashley Jester. Filled-in version.
(en/Python) Old Norse Notebook: Uses CLTK for analysis of Old Norse texts. By Clément Besnier.
(en/Python) SpaCy workshop from DH2019: using Jupyter notebooks for teaching text analysis with spaCy. By Andrew Janco.

Part-of-speech tagging

(en/Python) Using word endings and sentence position to determine part of speech, for ancient Greek. By Patrick Burns.

Named-entity recognition

(en/Python) Named entity recognition in CSVs: named-entity recognition on specified columns from a CSV file using Spacy; should work for any language where Spacy has a model with entity support (Dutch, English, French, German, Greek, Italian, Lithuanian, Norwegian, Spanish, Portuguese, as of October 2019). By Quinn Dombrowski.
(en/Python) Introduction to Named Entity Recognition with Python. A series of notebook lessons for students with different levels of coding. By Mary Chester-Kadwell. On Binder.

Sentiment analysis

(en/Python + R) Katia and the Sentiment Snobs: sentiment analysis using VADER and TextBlob (Python) and Syuzhet (R). By Katherine Bowers and Quinn Dombrowski.

Topic modeling

(en/Python) Topic modeling seminar: using extracted text and abstracts from CSCW 2016 for the topic modeling exercise. PDFs were downloaded and text/abstracts generated with pdf2text.py. By Ed Summers. On Binder.
(en/Python) Topic modeling workflow - using Italian texts from Project Gutenberg, slides available here. By Christof Schöch, Daniel Schlör, Ulrike Henny-Krahmer.

Multiple methods

(en/Python) Chapter 5: Text analysis from "Introduction to Cultural Analytics and Python" by Melanie Walsh. Includes a section on working in languages beyond English, with code for Chinese, Danish, Portuguese, Russian, and Spanish. Covers topic modeling, named entity recognition, part-of-speech tagging, and keyword extraction.
(es/Python) Escuela de Verano 2019 - UNED: Introducción al modelado conceptual, introducción al procesamiento de textos con Python, usando librerías de Python para procesamiento de texto. Una introducción práctica a la exploración, análisis y manipulación del texto, aproximaciones modernas al análisis de texto, por Javier de la Rosa.
(en/Python) Notebooks from Computational Text Analysis Workgroup for Classical Languages at the University of Kansas. Includes notebooks for installing CLTK, tokenization, lemmatization, and stopwords & stoplists, concordances and n-gram analysis, and stylometry.
(en/Python) Notebooks from Critical Digital Humanities: The Search for a Methodology by Jed Dobson. Includes code for collocations, feature distance, topic models (LDA, N-grams), sentiment analysis, k-nearest neighbors, word2vec similarity, and text alignment, and related notebooks).
(en/Python) Text analysis for humanities research: a workshop taking a research-oriented approach to computational text analysis. Each session explores a published literary study that has used computational evidence along with close reading. Specifically, we will look at code and practice exercises that reproduce the researchers' method or main finding, while building proficiency with common text analysis methods. By Teddy Roland.
(en/Python) Various notebooks for ancient world study, by Patrick Burns. Includes Plotting Generic Diction in Latin Poetry with Scattertext, Latin word embeddings and synonymity, Classifying Roman Names by Gender, and Plotting Words per Line over Narrative Space in Virgil’s Aeneid, along with many others.

Using APIs

(en/Python) Archaeological data: how to retrieve information from the Portable Antiquities Scheme database, which could also be analyzed in particular ways. This notebook is in two parts; the first shows how to transform the json search results into a csv. The second uses that data to find the filepaths to download images of the objects into nicely sorted folders. By Shawn Graham and Dan Pett, on Binder.
(en/Python) Chronicling America API search: by Eric Kansa, part of Open Context Jupyter (on Binder)
(en/Python) Collecting events: Analysis of a different approach for collecting Twitter data for events, by Ed Summers.
(fr/Python) Exploitation des données enrichies à l'aide du SPARQL endpoint d'ISIDORE: nous avons voir quelle est la répartition disciplinaire des données du programme de recherche AsilEuropeXIX (voir : https://asileurope.huma-num.fr) à l'aide des enrichissements sémantiques produit par ISIDORE. Par Stephane Pouyllau.
(en/Python) HTRC connector course at UC Berkeley: covers working with extracted features, visualization, mapping, and classification. By Sasank Chaganty, Alex Chan, Nathan Magee.
(en/Python) Library of Congress Data Exploration: includes notebooks for accessing images from the loc.gov JSON API for image analysis, looking at dominant colors in images, and others.
(en/Python) Open Context API search by Eric Kansa, part of Open Context Jupyter (on Binder)
(en/Python) Vine tweets: describes how to download and work with the Vine-Tweets Dataset which was generated as part of an effort by the ArchiveTeam to archive videos from the Vine social media platform after they announced it was sunsetting the service, by Ed Summers.

Scraping

(en/Python) Archives Unleashed DC: working with Internet ARchive CDX files of museum websites, by Ed Summers.
(en/Python) Job Census (Google Colab): scraping the Academic Jobs Wiki over time, by Ryan Heuser.
(en/Python) Scraping: building a Scrapy (Python) scraper, by Shawn Graham (on Binder).
(it/Julia) Scraping dei vincitori del Leone d'Oro alla Mostra di Venezia: ottiene i dati da un articolo di Wikipedia e crea un grafico dei vincitori, di Marco Goldin.

Data cleaning

(en/Python) EEBO-TCP full-text document cleaning: code to make EEBO-TCP texts more easily analyzed in Natural Language Processing (NLP), though most of the edits can be used on any text file. By Jamel Ostwald.
(en/Python) Data manipulation workshop: covers how to load in data into a Pandas DataFrame, perform basic cleaning and analysis, and visualize relevant aspects of a dataset, using a dataset of tweets. By Scott Bailey, Javier de la Rosa, Ashley Jester. (filled-in version)
(en/Python) Japanese text segmentation: uses the RakutenMA Python module to segment Japanese text, by Quinn Dombrowski.
(en/Python) Unicode to ASCII: notebook for converting Unicode text to ASCII, by Quinn Dombrowski.

Mapping

(en/Python) Mapping Geographic Subjects using the HathiTrust Extracted Features Dataset: Retrieves a book-level dataset from the HathiTrust Extracted Features Dataset, "recreate" the book's text using token-frequency data (i.e. tokenPosCount), runs the text through a named entity recognition tagger (Stanford NER Tagger), separate out 'location' NER data, queries the Geonames API for geographic coordinates for all locations, and maps the coordinates using Folium. By Patrick Burns.
(en/Python) Mapping with iPyLeaflet by Eric Kansa, part of Open Context Jupyter (on Binder)
(en/Python) Using NER to Map MARC Geographic Subject Headings: Retrieves a MARC record from the NYU Library LMS, runs the text of the MARC record through a named entity recognition tagger (Stanford NER Tagger), separates out 'location' NER data, queries the Geonames API for geographic coordinates for all locations, maps the coordinates using Folium. By Patrick Burns.

Analyzing Images

(en/Python) Distant Viewing with Deep Learning: An Introduction to Analyzing Large Corpora of Images: provides a hands-on introduction to the use of deep learning techniques in the study of large image corpora. By Taylor Arnold and Lauren Tilton.
(en/Python) Identifying Similar Images with TensorFlow (O-Date): based on Doug Duhaime's tutorial, by Shawn Graham, on Binder.
(en/Python) O-Date image classifier: As a demonstration, it contains a small gallery of training images and a few images on which you may test the classifier once it is trained. (on Binder)
(en/Python) TensorFlow for Poets (Google Colab): trains an image recognition model using transfer learning

Image Generation

(en/Python-Colab) Multi-Perceptor VQGAN + CLIP (v.3.2021.11.29): create an image based on a text prompt; works especially well if you include an artist's name. by Remi Durant. (context on Twitter)

Archaeology

(en/Python) abm3: notebooks for running Netlogos without the GUI by specifying parameters for an experiment, by Shawn Graham, (on Binder).
(en/Python) creativity2: Generating a fantasy-style world map and history using several connected models in sequence, for archaeogaming. By Shawn Graham, on Binder
(en/Python) LiDAR: downloading and visualizing LiDAR data from the City of Montreal. By Shawn Graham, adapted from Tyler Sloan. On Binder.
(en/Python) Open Context Zooarchaeology Measurements by Eric Kansa: gets measurement data from Open Context, part of Open Context Jupyter (on Binder)
(en/Python) Sonification by Shawn Graham: Using the miditime package to represent time-series data in sound (on Binder).
(en/Python) Spatial Archaeology course exercises by Rachel Opitz, on Binder.

Large language models (e.g. BERT, GPT-2)

(en/Python) Bardbot: Google Colab notebook for using GPT-2 to attempt to write rhyming poetry, by Ryan Heuster. (context on Twitter)
(en/Python) Measuring Word Similarity with BERT (English Language Public Domain Poems) - part of BERT for Humanists, where you can "look for words that have a similar vector to a query word from a collection of poems. The results are illustrative of what BERT vectors represent, but also of the limitations of the tokenization scheme that it uses." By David Mimno, Melanie Walsh, Maria Antoniak. Uses Google Colab.

Other

(es/Python) Análisis chocométrico: al final de la conferencia HD Hispanicas 2019 hubo análisis chocométrico: 4 chocolates Lindt para probar, con diferentes porcentajes de cacao: 50, 78, 90 y 99%. El objetivo era saber a nivel personal y social qué porcentaje gusta más. Por José Calvo.
(en/Python) GLAM Workbench: over 50 notebooks with tools and examples to help you work with data from galleries, libraries, archives, and museums (the GLAM sector), focusing on Australia and New Zealand, by Tim Sherratt.
(en/Python & R) Notebook templates for Binder: containing all you need to set up jupyter notebooks for use with Mybinder. mybinder.org creates a docker file that it then launches for you in the Jupyter Notebook exectuable environment. By clicking on a button, you get a version of the repository as it currently exists. You would then be able to create new Python or R notebooks and run the code, writing your analysis and your code together. You could also open a terminal within Jupyter and use git to push any changes you make back to this repository. By Shawn Graham, on Binder.
(en/Python) Processing: example of integrating the Processing languages for sketches and visualizations into Python and Jupyter notebooks, by Shawn Graham, on Binder.
(en/Python & R) SPARQL and LOD: introduces SPARQL and Linked Open Data, by Shawn Graham, on Binder
(en/Python & R) sqlite: introduces some of the basic commands for querying and modifying a database using the Structured Query Language, SQL; illustrates writing a query into a 'dataframe', a table that you can then manipulate or visualize; shows how to load a sqlite database into R. By Shawn Graham, on Binder.
(en/Python) WARC Processing with Spark and Python: a brief guide for processing WARC (Web Archive) data using PySpark and warcio, by Ed Summers.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Jupyter notebooks for digital humanities

Research & projects

Course materials

Learning Python

Text analysis

Word vectors

Text classification

General NLP

Part-of-speech tagging

Named-entity recognition

Sentiment analysis

Topic modeling

Multiple methods

Using APIs

Scraping

Data cleaning

Mapping

Analyzing Images

Image Generation

Archaeology

Large language models (e.g. BERT, GPT-2)

Other

About

Releases

Packages

quinnanya/dh-jupyter

Folders and files

Latest commit

History

Repository files navigation

Jupyter notebooks for digital humanities

Research & projects

Course materials

Learning Python

Text analysis

Word vectors

Text classification

General NLP

Part-of-speech tagging

Named-entity recognition

Sentiment analysis

Topic modeling

Multiple methods

Using APIs

Scraping

Data cleaning

Mapping

Analyzing Images

Image Generation

Archaeology

Large language models (e.g. BERT, GPT-2)

Other

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages