From 4dbfbd9d3f593053f7614a38d67eb62aa6cd15ff Mon Sep 17 00:00:00 2001 From: Jan van Mansum Date: Wed, 3 Jul 2024 11:08:35 +0200 Subject: [PATCH] wrapping up --- docs/data-vault-storage-root.md | 9 ++- docs/ocfl-repo.md | 97 --------------------------------- mkdocs.yml | 1 + 3 files changed, 7 insertions(+), 100 deletions(-) delete mode 100644 docs/ocfl-repo.md diff --git a/docs/data-vault-storage-root.md b/docs/data-vault-storage-root.md index 6de220b..b538277 100644 --- a/docs/data-vault-storage-root.md +++ b/docs/data-vault-storage-root.md @@ -71,12 +71,15 @@ a 1-to-_n_ relationship. be done by a superuser and is known as **"updatecurrent"**. A new Dataset Version Export will be created and therefore a new OCFL Object Version will be created as well. The Data Station version history, however, will **not** display an additional version. -### Metadata - - +### Identifying metadata +To identify datasets, versions and data files in the OCFL repository, the following metadata is used: +![Vault metadata](vault-metadata.png){: .align-center} +The full metadata of each dataset version is stored, but the way it is stored depends on the export format used. The current export format is based on Dataverse +implementation of the [RDA Research Data Repository Interoperability WG recommendations]{:target=_blank}. +[RDA Research Data Repository Interoperability WG recommendations]: {{ rda_research_data_repo_interoperability_wg_recommendations }} [bag]: {{ bagit_specs }} [Oxford Common File Layout]: {{ ocfl_url }} \ No newline at end of file diff --git a/docs/ocfl-repo.md b/docs/ocfl-repo.md deleted file mode 100644 index 8d36ae2..0000000 --- a/docs/ocfl-repo.md +++ /dev/null @@ -1,97 +0,0 @@ -OCFL repository -=============== - -The DANS Data Vault is implemented as a collections of [OCFL]{:target=_blank} Storage Roots. OCFL stands for Oxford Common File -Layout. It is a community specification for the layout of a repository that stores digital objects. - -Mapping of conceptual dataset model to OCFL objects ---------------------------------------------------- - -The diagram below details how the Data Vault stores the datasets in the OCFL repository. Datasets are conceived as a collection of -Dataset Versions. Dataset versions are exported to Dataset Version Exports (DVEs). Currently, this is done as an RDA compliant -bag. In the future other packaging formats may be supported. - -![vault-impl](./vault-impl.png){:width="75%"} - -Since every Dataset Version is stored in a separated OCFL object, multiple DVEs can be stored for the same Dataset Version. This -is necessary to support the following scenarios: - -* Replacing a version that was updated in place, i.e. with "updatecurrent" in Dataverse. -* Repackaging dataset versions in a different packaging format. - -[OCFL]: {{ ocfl_url }} - -Layout of a repository ----------------------- - -An OCFL repository consists of a Storage Root and objects stored hierarchically under that Root. Although the hierarchy in -question does not have to reside on a hierarchical file system, it is conceptually represented as a file/folder tree. - -The picture below shows an example of a complete valid OCFL Storage Root, containing one object with two versions. - -```plaintext -example-ocfl-storage-root -├── 0004-hashed-n-tuple-storage-layout.md -├── 0=ocfl_1.1 -├── ocfl_1.1.md -├── ocfl_layout.json -├── 866 -│ └── 456 -│ └── e5a -│ └── 866456e5a267286c35b3a697b4a4ecf42ff34a3060699ce4dc88da9b58862341 -│ ├── 0=ocfl_object_1.1 -│ ├── inventory.json -│ ├── inventory.json.sha512 -│ ├── v1 -│ │ ├── content -│ │ │ └── test -│ │ │ └── path -│ │ │ └── packaged-dataset-version.zip -│ │ ├── inventory.json -│ │ └── inventory.json.sha512 -│ └── v2 -│ ├── content -│ │ └── test -│ │ └── path -│ │ └── repackaged-dataset-version.zip -│ ├── inventory.json -│ └── inventory.json.sha512 -└── extensions - └── 0004-hashed-n-tuple-storage-layout - └── config.json -``` - -Packaging an OCFL Storage Root in immutable TAR files ------------------------------------------------------ - -### Aggregating OCFL objects into a TAR file - -The tape storage facility used to implement the DANS Data Vault requires the stored files to be on average 1G or larger. Since the -typical dataset versions stored by DANS are much smaller, a number of OCFL Objects is first collected and packaged together as a -TAR file containing an OCFL Storage Root. This Storage Root will be valid if each OCFL Object contains a continuous range of -version starting with version 1. - -`````` - -### Storing new OCFL Object versions - -As long as OCFL objects for a TAR file are still being collected, new versions for existing objects can be added without breaking -the validity of the OCFL Storage Root. What happens if a new version for an OCFL Object is created, when that object is already -part of a TAR file on tape? - -The answer is that instead of a complete Object an *Object Layer* will be included in the currently open OCFL Storage Root. This -means that—in turn—the open OCFL Storage Root will no longer be valid and is merely an OCFL Storage Root Layer. - -`````` - -The TAR files on tape therefore contain a mix of OCFL Storage Root Layers and OCFL Storage Roots. The OCFL Storage Roots can be -conceived as a special case of OCFL Storage Root Layers, namely those that contain OCFL Objects with a continuous range of -versions starting with version 1. In general, we can therefore say that the TAR files on tape contain OCFL Storage Root Layers. - -### Restoring an OCFL Storage Root from tape - - - - - - \ No newline at end of file diff --git a/mkdocs.yml b/mkdocs.yml index 6e9434a..3b63aa6 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -76,6 +76,7 @@ extra: ocfl_url: https://ocfl.io/ bagit_specs: https://www.rfc-editor.org/rfc/rfc8493.html dans_layer_store_lib: https://dans-knaw.github.io/dans-layer-store-lib/ + rda_research_data_repo_interoperability_wg_recommendations: http://doi.org/10.15497/RDA00025 markdown_extensions: - attr_list