Skip to content

This is source code for transforming PDFs from the Mamluk journal project to Simple Archive Format import objects for knowledgespace.uchicago.edu

License

Notifications You must be signed in to change notification settings

uchicago-library/mamluk-knowledgespace-import

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

This is a tool that takes PDF files, allows the user to extract metadata form those pdfs into a CSV so taht stakeholder can edit the metadata in a single place, then from that CSV file and a location of the pdf files create a series of SAFS

Quickstart

  1. create a virtualenv
  2. git clone this repo
  3. cd into the repo directory
  4. activate the virtualenv
  5. run python setup.py install
  6. call either extractor.py on the pdf directory to extract the metadata or build_safs.py on the csv metadata file and the pdfs directory to generate SAFs

About

This is source code for transforming PDFs from the Mamluk journal project to Simple Archive Format import objects for knowledgespace.uchicago.edu

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages