RDFIngest - A simple tool for ingesting local and remote RDF data sources into a triplestore.
WARNING: This project is in an early stage of development and should be used with caution.
- Python >= 3.11
RDFIngest is availabe on PyPI:
pip install rdfingest
Also the RDFIngest CLI can be installed with pipx:
pipx install rdfingest
For installation from source either use poetry or run pip install .
from the package folder.
RDFIngest reads two YAML files:
- a config file for obtaining triplestore credentials and
- a registry which defines the RDF sources to be ingested.
service:
endpoint: "https://sometriplestore.endpoint"
user: "admin"
password: "supersecretpassword123"
graphs:
- source: https://someremote.ttl
graph_id: https://somenamedgraph.id
- source: [
somelocal.ttl,
https://someotherremote.ttl
]
graph_id: https://someothernamedgraph.id
- source: https://someremote.trig
- source: [
https://someotherremote.trig,
someotherlocal.ttl,
yetanotherremote.ttl
]
graph_id: https://yetanothernamedgraph.id
RDFIngest parses all registered RDF sources and ingests the data as named graphs into the specified triplestore by executing POST requests for every source.
By default also a SPARQL DROP operation is run for every Graph ID before POSTing.
For contextless RDF sources a graph_id
is required, RDF Datasets/Quad formats obviously do not require a graph_id
field.
For Datasets, the default graph (at least for now) is ignored. Running automated DROP and/or POST operations on a remote default graph is considered somewhat dangerous.
Namespaces are one honking great idea -- let's do more of those!
The tool accepts both local and remote RDF data sources.
Consider the following entry:
graphs:
- source: [
https://someremote.trig,
somelocal.ttl,
anotherremote.ttl
]
graph_id: https://somenamedgraph.id/
In this case every named graph in the Dataset https://someremote.trig
is ingested using their respective named graph identifiers,
somelocal.ttl
and anotherremote.ttl
are ingested into a named graph https://somenamedgraph.id/
.
Run the rdfingest
command.
rdfingest --config ./config.yaml --registry ./registry.yaml
Default values for config and registry are ./config.yaml
and ./registry.yaml
.
Also see rdfingest --help
.
Point an RDFIngest
instance to a config file and a registry and invoke run_ingest
.
rdfingest = RDFIngest(
config="./config.yaml"
registry="./registry.yaml",
drop=True,
debug=False
)
rdfingest.run_ingest()