This repository contains the JSON schemas used in the Impresso project.
Impresso JSON Schemas are used to define, declare, validate and document the structure, constraint and data types of Impresso JSON documents that can represent data or processes (e.g. manifests).
We define schemas for:
- Canonical format:
- Issue (draft 06)
- Page (draft 06)
- Content Item (draft 06)
- Rebuilt format:
- Rebuilt (todo)
- Topic Model
- Topic Assignment (draft 06)
- Topic Description (draft 06)
- Language Identification
- Language Identification (draft 06)
- Entities
- Entities (2020-12)
- OCR Quality Assessment (OCR-QA)
- Data processing manifests (todo)
- Data release manifests (todo)
json/
subdirectory for JSON schemasexamples/
subdirectory for example/test filesdocs/
documentation of schemas in markdown format
To validate an instance (example file) against a JSON schema, run:
make tests
Generated by using jsonschema2md
with the following commands:
make documentation
The 'impresso - Media Monitoring of the Past' project is funded by the Swiss National Science Foundation (SNSF) under grant number CRSII5_173719 (Sinergia program). The project aims at developing tools to process and explore large-scale collections of historical newspapers, and at studying the impact of this new tooling on historical research practices. More information at https://impresso-project.ch.
Copyright (C) 2020 The impresso team. Contributors to this program include: Simon Clematide, Maud Ehrmann and Matteo Romanello.
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. See the GNU Affero General Public License for more details.