Skip to content

A HTTP interface to the Project Gutenberg corpus.

License

Notifications You must be signed in to change notification settings

cxdy/gutenberg-http

 
 

Repository files navigation

Gutenberg-HTTP

https://travis-ci.org/c-w/gutenberg-http.svg?branch=master

Overview

This project is an HTTP wrapper for the Python Gutenberg API. As such, it lets you search for books, retrieve information about books and get the text of books via a set of easy-to-use HTTP endpoints.

The API is implemented using the Flask web-framework and served in a Docker container. You can run the project locally using:

docker-compose up --build web

This will serve the API at http://localhost:8000. It will take a while to bring up the service the first time since the Gutenberg metadata cache needs to get populated.

To refresh the Gutenberg metadata cache and reload the service after the initial server start, you can run:

./scripts/update-data.sh

Endpoints

Fetch all metadata for a book

# fetch all metadata for a book-id
curl 'http://localhost:8000/texts/2701'
{
  "metadata": {
    "title": ["Moby Dick; Or, The Whale"],
    "rights": ["Public domain in the USA."],
    "author": ["Melville, Herman"],
    "subject": [
      "Mentally ill -- Fiction",
      "Whaling -- Fiction",
      "Ship captains -- Fiction",
      "Sea stories",
      "Whaling ships -- Fiction",
      "Psychological fiction",
      "Ahab, Captain (Fictitious character) -- Fiction",
      "PS",
      "Whales -- Fiction",
      "Adventure stories"
    ],
    "language": ["en"]
  },
  "text_id": 2701
}

Fetch specific metadata for a book

# fetch specific metadata for a book-id
curl 'http://localhost:8000/texts/2701?include=title,author'
{
  "metadata": {
    "author": ["Melville, Herman"],
    "title": ["Moby Dick; Or, The Whale"]
  },
  "text_id": 2701
}

Fetch the text of a book

# fetch the text for a book-id
curl 'http://localhost:8000/texts/2701/body'

Simple search for books

# simple single-predicate query with field expansion
curl 'http://localhost:8000/search/title+eq+Moby+Dick?include=author,rights,language'
{
  "texts": [
    {
      "author": ["Melville, Herman"],
      "language": ["en"],
      "text_id": 9147,
      "rights": ["Copyrighted. Read the copyright notice inside this book for details."]
    },
    {
      "author": ["Melville, Herman"],
      "language": ["en"],
      "text_id": 15,
      "rights": ["Public domain in the USA."]
    }
  ]
}

Conjunctive query for books

# conjunctive query
curl 'http://localhost:8000/search/author+eq+"Melville,+Herman"+and+rights+eq+"Public+domain+in+the+USA."+and+title+eq+"Moby+Dick"'
{"texts": [{"text_id": 15}]}

About

A HTTP interface to the Project Gutenberg corpus.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 78.6%
  • Shell 11.3%
  • Dockerfile 5.1%
  • Jinja 5.0%