Skip to content

A collection of Jupyter notebooks in many human and computer languages for doing digital humanities. PRs welcome!

Notifications You must be signed in to change notification settings

quinnanya/dh-jupyter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 

Repository files navigation

Jupyter notebooks for digital humanities

En: Jupyter notebooks are useful for organizing and documenting code, and embedding code within scholarly arguments and/or pedagogical materials. The following list of notebooks for digital humanities purposes was sourced from Twitter in June 2019, but PRs with suggested additions are welcome! If you only want notebooks in English, search for "en".

De: Jupyter-Notizbücher eignen sich zum Organisieren und Dokumentieren von Code und zum Einbetten von Code in wissenschaftliche Argumente und / oder pädagogische Materialien. Die folgende Liste von Notizbüchern für die Digital Humanities wurde im Juni 2019 von Twitter bezogen, PRs mit Ergänzungsvorschlägen sind jedoch willkommen! Wenn Sie Notizbücher nur in Deutsch möchten, suchen Sie nach der Abkürzung "de".

Es: Los cuadernos Jupyter son útiles para organizar y documentar códigos, e incorporar códigos dentro de argumentos académicos y / o materiales pedagógicos. La siguiente lista de cuadernos para fines de humanidades digitales se obtuvo de Twitter en junio de 2019, ¡pero los RP con sugerencias de adiciones son bienvenidos! Si quieres cuadernos en español, busca en esta página por "es".

Fr: Les cahiers Jupyter sont utiles pour organiser et documenter le code, et pour incorporer du code dans des arguments scientifiques et / ou du matériel pédagogique. La liste suivante de cahiers à des fins de sciences humaines numériques a été extraite de Twitter en juin 2019, mais les PR avec les ajouts suggérés sont les bienvenus! Si vous voulez des cahiers en français, cherchez sur cette page l'abréviation "fr".

Research & projects

Course materials

  • (en/Python) Applied Data Analysis as taught at DHOxSS 2019: covers tidying data, visualization, modeling, and advanced applications of data analysis. By Giovanni Colavizza and Matteo Romanello. On Binder
  • (en/Python) Applied Natural Language Processing course at UC Berkeley: covers impact of tokenization choices on sentiment classification, distinctive terms using different methods, text classification, hyperparameter choices for classification accuracy, hypothesis testing, word embeddings, CNN and LTSM, social networks in literary texts, and more. By David Bamman.
  • (en/Python) Becoming a Historian course at UC Berkeley: notebook with introduction to Python using AHA job posting data, by Chris Hench.
  • (en/Python) Chinatown and the Culture of Exclusion course at UC Berkeley: using demographic data from the 20th-21st century, this module has students analyzing how a specific Chinatown, such as SF Chinatown, has changed over time. Students use some simple computational text analysis methods to explore and compare the structures of poems written on Angel Island and in Chinatown publications from the early 20th century. By Michaela Palmer, Maya Shen, Cynthia Leu, Chris Cheung, course taught by Amy Lee.
  • (en/Python) Data Arts course at UC Berkeley: notebooks looking at coincidence, correlation, and causation; and the evolution of social networks over time. Course by Greg Niemeyer.
  • (en/Python) Deconstructing Data Science course at UC Berkeley, by David Bamman, on Binder.
  • (en/Python) European Economic History course at UC Berkeley: notebooks related to the Industrial Revolution and the rise of the European economy to world dominance in the 19th century, emphasizing the diffusion of the industrial system and its consequences, the world trading system, and the rise of modern imperialism. Developed by Alec Kan, Beom Jin Lee, Anusha Mohan.
  • (en/Python) History data science connector course at UC Berkeley: various notebooks for analyzing historical data using data science methods.
  • (en/Python) Introduction to Cultural Analytics: Jupyter Book (collection of Jupyter notebooks) textbook for learning Python for humanities and social science research. Text analysis chapter includes a section on working in languages beyond English, with examples for Chinese, Danish, Portuguese, Russian, and Spanish. Developed by Melanie Walsh.
  • (en/Python) Introduction to Regular Expressions: teaching students the basics of regular expressions. Developed by Lauren F. Klein.
  • (en/Python) Japanese Internment course at UC Berkeley: notebook for mapping the beginning and end coordinates of people who migrated from one location to another after being placed in internment camps. By Melanie Yu, Andrew Linxie, Nga Pui Leung, and Francis Kumar.
  • (en/Python) Literature and data course at UC Berkeley: a mix of readings and Jupyter notebooks that experiment with popular statistical methods that have recently gained visibility in literary study, and consider them as forms of “distant reading.” By Teddy Roland.
  • (de/Python) Seminar »Methoden computergestützter Textanalyse«: Das Seminar wendet sich diesen Möglichkeiten computergestützter Textanalyse zu. Neben der Diskussion der theoretischen und methodologischen Grundlagen geht es insbesondere um die praktische Anwendung der entsprechenden Verfahren. Am Beispiel der Programmiersprache Python soll gezeigt werden, wie sich konkretes Textmaterial aufbereiten, analysieren und interpretieren lässt. Von Frederik Elwert.
  • (en/Python) Sumerian text analysis course at UC Berkeley: intro to Python, how to find differences between texts based on their words, and how to visualize the results. By Jonathan Lin, Stephanie Kim, Erik Cheng, and Sujude Dalieh; course taught by Niek Veldhius.
  • (en/Python) Text analysis for graduate medievalists course at UC Berkeley: an introduction into parsing and performing text analysis on medieval manuscripts using Python. By Mingyue Tang, Sierra Blatan, Shubham Gupta, Tejas Priyadarshan, and Sasank Chaganty.

Learning Python

Text analysis

Word vectors

  • (en/Python) Beyond the Black Box: Word2vec: uses one model for generating word vectors, word2vec, to explain how distance between terms is calculated by these models, and will show participants how to create, and how to begin to interpret, their own word embedding models. By Teddy Roland.
  • (en/Python) Word vector workshop materials: includes Understanding Word Vectors with Visualization, Word Vectors via Word2Vec, Pre-trained Models and Extended Vector Algorithms, and Role of Bias in Word Embeddings. By Eun Seo Jo, Javier de la Rosa, and Scott Bailey. On Binder: part 1, part 2, part 3, part 4.

Text classification

  • (en/Python) Intro to Machine Learning for text: covers how to identify when a problem related to a textual dataset might be approached using a classification strategy, and the different steps involved in the general workflow of machine learning using scikit-learn. Uses document classification to sort literary text by genre. By Scott Bailey, Javier de la Rosa, Ashley Jester. (Filled-in version)
  • (en/Python) Training and Fine-Tuning BERT for Classification: Classfying Goodreads Reviews By Book Genre - part of BERT for Humanists, where you can "fine-tune a BERT model on Goodreads reviews from the UCSD Book Graph with the goal of predicting the genre of the book being reviewed". By David Mimno, Melanie Walsh, Maria Antoniak. Uses Google Colab.

General NLP

  • (en/Python) Intro to NLP: provides a basic understanding of natural language processing, and enough familiarity with one NLP package, Textblob, to perform basic NLP tasks like tokenization and part of speech tagging. By Scott Bailey, Javier de la Rosa, Ashley Jester. Filled-in version.
  • (en/Python) Old Norse Notebook: Uses CLTK for analysis of Old Norse texts. By Clément Besnier.
  • (en/Python) SpaCy workshop from DH2019: using Jupyter notebooks for teaching text analysis with spaCy. By Andrew Janco.

Part-of-speech tagging

Named-entity recognition

  • (en/Python) Named entity recognition in CSVs: named-entity recognition on specified columns from a CSV file using Spacy; should work for any language where Spacy has a model with entity support (Dutch, English, French, German, Greek, Italian, Lithuanian, Norwegian, Spanish, Portuguese, as of October 2019). By Quinn Dombrowski.
  • (en/Python) Introduction to Named Entity Recognition with Python. A series of notebook lessons for students with different levels of coding. By Mary Chester-Kadwell. On Binder.

Sentiment analysis

  • (en/Python + R) Katia and the Sentiment Snobs: sentiment analysis using VADER and TextBlob (Python) and Syuzhet (R). By Katherine Bowers and Quinn Dombrowski.

Topic modeling

  • (en/Python) Topic modeling seminar: using extracted text and abstracts from CSCW 2016 for the topic modeling exercise. PDFs were downloaded and text/abstracts generated with pdf2text.py. By Ed Summers. On Binder.
  • (en/Python) Topic modeling workflow - using Italian texts from Project Gutenberg, slides available here. By Christof Schöch, Daniel Schlör, Ulrike Henny-Krahmer.

Multiple methods

Using APIs

Scraping

Data cleaning

  • (en/Python) EEBO-TCP full-text document cleaning: code to make EEBO-TCP texts more easily analyzed in Natural Language Processing (NLP), though most of the edits can be used on any text file. By Jamel Ostwald.
  • (en/Python) Data manipulation workshop: covers how to load in data into a Pandas DataFrame, perform basic cleaning and analysis, and visualize relevant aspects of a dataset, using a dataset of tweets. By Scott Bailey, Javier de la Rosa, Ashley Jester. (filled-in version)
  • (en/Python) Japanese text segmentation: uses the RakutenMA Python module to segment Japanese text, by Quinn Dombrowski.
  • (en/Python) Unicode to ASCII: notebook for converting Unicode text to ASCII, by Quinn Dombrowski.

Mapping

  • (en/Python) Mapping Geographic Subjects using the HathiTrust Extracted Features Dataset: Retrieves a book-level dataset from the HathiTrust Extracted Features Dataset, "recreate" the book's text using token-frequency data (i.e. tokenPosCount), runs the text through a named entity recognition tagger (Stanford NER Tagger), separate out 'location' NER data, queries the Geonames API for geographic coordinates for all locations, and maps the coordinates using Folium. By Patrick Burns.
  • (en/Python) Mapping with iPyLeaflet by Eric Kansa, part of Open Context Jupyter (on Binder)
  • (en/Python) Using NER to Map MARC Geographic Subject Headings: Retrieves a MARC record from the NYU Library LMS, runs the text of the MARC record through a named entity recognition tagger (Stanford NER Tagger), separates out 'location' NER data, queries the Geonames API for geographic coordinates for all locations, maps the coordinates using Folium. By Patrick Burns.

Analyzing Images

Image Generation

Archaeology

Large language models (e.g. BERT, GPT-2)

  • (en/Python) Bardbot: Google Colab notebook for using GPT-2 to attempt to write rhyming poetry, by Ryan Heuster. (context on Twitter)

  • (en/Python) Measuring Word Similarity with BERT (English Language Public Domain Poems) - part of BERT for Humanists, where you can "look for words that have a similar vector to a query word from a collection of poems. The results are illustrative of what BERT vectors represent, but also of the limitations of the tokenization scheme that it uses." By David Mimno, Melanie Walsh, Maria Antoniak. Uses Google Colab.

Other

  • (es/Python) Análisis chocométrico: al final de la conferencia HD Hispanicas 2019 hubo análisis chocométrico: 4 chocolates Lindt para probar, con diferentes porcentajes de cacao: 50, 78, 90 y 99%. El objetivo era saber a nivel personal y social qué porcentaje gusta más. Por José Calvo.
  • (en/Python) GLAM Workbench: over 50 notebooks with tools and examples to help you work with data from galleries, libraries, archives, and museums (the GLAM sector), focusing on Australia and New Zealand, by Tim Sherratt.
  • (en/Python & R) Notebook templates for Binder: containing all you need to set up jupyter notebooks for use with Mybinder. mybinder.org creates a docker file that it then launches for you in the Jupyter Notebook exectuable environment. By clicking on a button, you get a version of the repository as it currently exists. You would then be able to create new Python or R notebooks and run the code, writing your analysis and your code together. You could also open a terminal within Jupyter and use git to push any changes you make back to this repository. By Shawn Graham, on Binder.
  • (en/Python) Processing: example of integrating the Processing languages for sketches and visualizations into Python and Jupyter notebooks, by Shawn Graham, on Binder.
  • (en/Python & R) SPARQL and LOD: introduces SPARQL and Linked Open Data, by Shawn Graham, on Binder
  • (en/Python & R) sqlite: introduces some of the basic commands for querying and modifying a database using the Structured Query Language, SQL; illustrates writing a query into a 'dataframe', a table that you can then manipulate or visualize; shows how to load a sqlite database into R. By Shawn Graham, on Binder.
  • (en/Python) WARC Processing with Spark and Python: a brief guide for processing WARC (Web Archive) data using PySpark and warcio, by Ed Summers.

About

A collection of Jupyter notebooks in many human and computer languages for doing digital humanities. PRs welcome!

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published