This is a pilot project to produce an intermediate data format that makes the
bulk ingest of data into the Fedora Commons repository software simple. While the goal
is to provide as simple of a format as possible, some affordances are made for
defining standard datastreams used by Hydra project front-ends, such as the
rightsMetadata
datastream.
See spec/fixtures/vecnet-citation.json
as a sample two object model.
An overview of the format is in bulk-ingest.md.
Sample command line usage:
$ bin/rof ingest --fedora 'http://localhost:8983/fedora' --user fedoraAdmin:fedoraAdmin spec/fixtures/vecnet-citation.json
1. Ingesting vecnet:d217qs82g ...ok. 0.882s
2. Ingesting vecnet:h415pf50x ...ok. 0.283s
Total time 1.165s
0 errors
ROF does more than just ingesting. Should an object already exist in Fedora, it will be updated to match what is provided in the source file. (However, this only applies to datastreams which are mentioned in the source file. Unmentioned datastreams are untouched).
If the fedora path and user are omitted then rof lints the json file.
$ bin/rof ingest spec/fixtures/vecnet-citation.json
1. Verifying vecnet:d217qs82g ...ok. 0.108s
2. Verifying vecnet:h415pf50x ...ok. 0.002s
Total time 0.111s
0 errors
There is a filter which will assign objects identifiers. This requires an external noids service to provide the identifiers. See labels.md.
$ bin/rof filter label spec/fixtures/label.json --noids localhost:13001:test-pool --prefix temp
[
{
"type": "fobject",
"pid": "temp:0k225999n60"
},
{
"type": "fobject",
"rels-ext": {
"partOf": [
"temp:0k225999n60"
],
"refines": [
"temp:0r96736668t"
]
},
"pid": "temp:0p096682x75"
},
{
"type": "fobject",
"pid": "temp:0r96736668t",
"rels-ext": {
"partOf": [
"temp:0r96736668t",
"temp:0k225999n60",
"another"
]
}
}
]
It is envisioned that there could be higher level objects, and that the ingesting into fedora done by this utility will be simply the final step of many. Other ideas for transformations:
- A service to expand higher-level objects, say an
image-collection
, into a sequence offobjects
. - The ability to run file characterizations and create derivatives before ingest.
Since the files are JSON, any tool for working with JSON files will work with these.
For example, the jq tool makes it easy to extract all
the pid
field from every object in a file, and return it as a JSON array:
jq '[.[]|.pid]' < spec/fixtures/vecnet-citation.json