404
+ +Page not found
+ + +diff --git a/.nojekyll b/.nojekyll new file mode 100644 index 0000000..e69de29 diff --git a/404.html b/404.html new file mode 100644 index 0000000..4748a09 --- /dev/null +++ b/404.html @@ -0,0 +1,120 @@ + + +
+ + + + +Page not found
+ + +This module is a component in the DANS Data Station Architecture.
+ +Service for ingesting datasets into Dataverse via the API.
+The datasets are prepared as deposit directories (or "deposits" for short) in the ingest area. A deposit is a directory with the following structure:
+087920d1-e37d-4263-84c7-1321e3ecb5f8
+├── bag
+│ ├── bag-info.txt
+│ ├── bagit.txt
+│ ├── data
+│ │ ├── file1.txt
+│ │ ├── file2.txt
+│ │ └── subdirectory
+│ │ └── file3.txt
+│ ├── dataset.yml
+│ └── manifest-sha1.txt
+└── deposit.properties
+
+The name of the deposit directory must be a UUID. The deposit directory contains the following files:
+File | +Description | +
---|---|
deposit.properties |
+Contains instructions for dd-dataverse-ingest on how to ingest the dataset. |
+
bag/ |
+A bag, i.e. a directory with the files to be ingested, laid out according to the BagIt specification. The name of the bag does not have to be "bag"; it may be any valid filename. |
+
Instead of one bag multiple bags may be included, see below.
+In the root of the bag, the following files can be included to provide metadata and instructions for the ingest process. The files are in YAML format and +closely follow the JSON that is passed to the Dataverse API.
+File | +Description | +
---|---|
dataset.yml |
+Dataset level metadata. | +
edit-files.yml |
+Instructions for deleting, replacing or moving files, or updating the file metadata; also included: restricting and embargoing files |
+
edit-metadata.yml |
+Edit dataset level metadata, including metadata value deletions | +
edit-permissions.yml |
+Role assignments to create or delete on the dataset | +
update-state.yml |
+Whether to publish the dataset version or submit it for review | +
The format is the same as the JSON that is passed to the createDataset endpoint of the Dataverse API. Note that the files
field is not used.
+It will be set to the empty list by the service, because otherwise Dataverse will reject the request.
datasetVersion:
+ license:
+ name: "CC0 1.0"
+ uri: "http://creativecommons.org/publicdomain/zero/1.0"
+ fileAccessRequest: true
+ metadataBlocks:
+ citation:
+ displayName: "Citation Metadata"
+ name: "citation"
+ fields:
+ - typeName: "title"
+ multiple: false
+ typeClass: "primitive"
+ value: "My dataset"
+ # Add more metadata fields and blocks as needed
+
+editFiles:
+ deleteFiles:
+ - 'file1.txt'
+ - 'subdirectory/file3.txt'
+ replaceFiles:
+ - 'file2.txt'
+ addRestrictedFiles:
+ - 'file4.txt'
+ - 'subdirectory/file5.txt'
+ addUnrestrictedFiles:
+ - 'file6.txt'
+ moveFiles:
+ - from: 'file6.txt' # Old location in the dataset
+ to: 'subdirectory/file6.txt' # New location in the dataset
+ updateFileMetas:
+ - description: "This is the first file"
+ label: "file1.txt"
+ directoryLabel: "subdirectory"
+ restricted: false
+ categories: [ 'Testlabel' ]
+ autoRenameFiles:
+ - from: "Unsanitize'd/file?" # Local file name
+ to: "Sanitize_d/file_" # The file name assigned in the dataset
+ addEmbargoes:
+ - filePaths: [ 'file1.txt' ] # All other files will NOT be embargoed
+ dateAvailable: '2030-01-01'
+ reason: 'Pending publication'
+ - filePaths: [ 'file2.txt' ] # All other files will be embargoed
+ dateAvailable: '2040-01-01'
+ reason: 'Pending publication'
+
+The actions specified in this file correspond roughly to the actions available in the dropdown menu in the file view of a dataset in Dataverse.
+The replacement file is looked up in the bag, under the data
directory under the same path as the original file has in the dataset. Note that files in
+replaceFiles
will automatically be skipped in the add files step, the deleted files, however, will not. In other words, it is also possible to remove a
+file and add a file back to the same location in one deposit. In that case, there will be no continuous history of the file in the dataset.
The addRestrictedFiles
action is included, because it allows you to add a large number of restricted files in a more efficient way than by updating the file
+metadata of each file individually after adding them unrestricted first. The default action is to add files unrestricted, so there is no explicit action for
+that.
updateFileMetas
contains items in the format of the JSON that is passed to the updateFileMetadata endpoint of the Dataverse API.
editMetadata:
+ addFieldValues:
+ - typeName: "subject"
+ typeClass: "controlledVocabulary"
+ multiple: true
+ value:
+ - 'Astronomy and Astrophysics'
+ replaceFieldValues:
+ - typeName: "producer"
+ typeClass: "compound"
+ multiple: true
+ value:
+ - producerName:
+ typeName: "producerName"
+ value: "John Doe"
+ - producerAffiliation:
+ typeName: "producerAffiliation"
+ value: "University of Somewhere"
+ deleteFieldValues:
+ - typeName: "subject"
+ typeClass: "controlledVocabulary"
+ multiple: true
+ value:
+ - 'Astronomy and Astrophysics'
+
+Allows you to selectively delete, add or replace metadata field values. The format is the based on the JSON that is passed to the
+editDatasetMetadata and deleteDatasetMetadata endpoints of the Dataverse API. However, unlike in the JSON accepted by
+Dataverse, the typeClass
and `multiple fields are not optional in the YAML file. This is due to the library used to parse the YAML files, which uses
+a deserializer that was designed to parse the JSON that is returned by the Dataverse API (which does not include these fields).
The only difference between addFieldValues
and replaceFieldValues
is that the latter will pass the replace=true
parameter to the API. See the API
+documentation for the exact behavior.
Unlike in the editing of files, deletion of field values takes place at the end of the process, so that we don't create a situation where a required field is +temporarily empty and Dataverse refuses to save the metadata.
+editPermissions:
+ deleteRoleAssignments:
+ - role: 'admin'
+ assignee: '@user1'
+ addRoleAssignments:
+ - role: 'admin'
+ assignee: '@user2'
+
+Allows you to selectively delete or add role assignments on the dataset. The format is the same as the JSON that is passed to the +assignNewRole and deleteRoleAssignments endpoints of the Dataverse API.
+action: 'submit-for-review'
+# One of the following actions:
+# - 'leave-draft' (default)
+# - 'publish-major-version'
+# - 'publish-minor-version'
+# - 'submit-for-review'
+
+A deposit can also be used to create a new version of an existing dataset. In this case, the deposit.properties
file must contain the following property:
updates-dataset: 'doi:10.5072/FK2/ABCDEF'
+
+in which the value is the DOI of the dataset to be updated.
+Instead of one bag directory, the deposit may contain multiple bags. In this case the directories are processed in lexicographical order, so you should name the
+bags accordingly, e.g. 1-bag
, 2-bag
, 3-bag
, etc. , or 001-bag
, 002-bag
, 003-bag
, etc., depending on the number of bags.
A DANS bag is a directory in the BagIt format, that also conforms to the DANS bag profile. This is a legacy format that is +used by the DANS SWORD2 service. The service can convert a DANS deposit to the standard one described above.
+ + +The deposit area is a directory with the following structure:
+imports
+├── inbox
+│ └── path
+│ └── to
+│ ├── batch1
+│ │ ├── 0223914e-c053-4ee8-99d8-a9135fa4db4a
+│ │ ├── 1b5c1b24-de40-4a40-9c58-d4409672229e
+│ │ └── 9a47c5be-58c0-4295-8409-8156bd9ed9e1
+│ └── batch2
+│ ├── 5e42a936-4b90-4cac-b3c1-798b0b5eeb0b
+│ └── 9c2ce5a5-b836-468a-89d4-880efb071d9d
+└── outbox
+ └── path
+ └── to
+ └── batch1
+ ├── failed
+ ├── processed
+ │ └── 7660539b-6ddb-4719-aa31-a3d1c978081b
+ └── rejected
+
+The deposits to be processed are to be placed under inbox
. All the files in it must be readable and writable by the service.
+When the service is requested to process a batch, it will do the following:
creation.timestamp
property in deposit.properties
, in ascending order.outbox/path/to/batch/processed
if the all versions were published successfully, or tooutbox/path/to/batch/rejected
if one or more of the versions were not valid, or tooutbox/path/to/batch/failed
if some other error occurred.Note that the relative path of the processed deposits in outbox is the same as in the inbox, except for an extra level of directories for the status of the +deposit.
+The actions described in the Yaml files will be executed in same order as they are listed above. Note that changing the order of the actions in the Yaml files
+has no effect on the order in which they are executed. All files and all action fields (e.g., addRestrictedFiles
) are optional, except for dataset.yml
, when
+creating a new dataset.
To locally debug you need to have the following services running:
+A dataverse instance. Internal DANS developers can use the vagrant boxes with development versions of the Data Stations for this. You will need to configure + access to the admin interface to use the unblock-key:
+curl -X PUT -d s3kretKey http://localhost:8080/api/admin/settings/:BlockedApiKey
+curl -X PUT -d unblock-key http://localhost:8080/api/admin/settings/:BlockedApiPolicy
+
+# When done debugging, you can reset the policy to localhost-only:
+curl -X PUT -d localhost-only http://localhost:8080/api/admin/settings/:BlockedApiPolicy/?unblock-key=s3kretKey
+
+
+dd-validate-dans-bag. Note that its validation.baseFolder
configuration property should point to the deposit area or an ancestor of it.
Calling dd-dataverse-ingest
is most conveniently done through the dd-dataverse-ingest-cli command line tool.
Currently, this project is built as an RPM package for RHEL8/Rocky8 and later. The RPM will install the binaries to
+/opt/dans.knaw.nl/dd-dataverse-ingest
and the configuration files to /etc/opt/dans.knaw.nl/dd-dataverse-ingest
.
For installation on systems that do no support RPM and/or systemd:
+/opt/dans.knaw.nl/dd-dataverse-ingest
./opt/dans.knaw.nl/dd-dataverse-ingest/bin/dd-dataverse-ingest server /opt/dans.knaw.nl/dd-dataverse-ingest/cfg/config.yml
Prerequisites:
+Steps:
+git clone https://github.com/DANS-KNAW/dd-dataverse-ingest.git
+cd dd-dataverse-ingest
+mvn clean install
+
+
+If the rpm
executable is found at /usr/local/bin/rpm
, the build profile that includes the RPM
+packaging will be activated. If rpm
is available, but at a different path, then activate it by using
+Maven's -P
switch: mvn -Pprm install
.
Alternatively, to build the tarball execute:
+mvn clean install assembly:single
+
+
+ ' + escapeHtml(summary) +'
' + noResultsText + '
'); + } +} + +function doSearch () { + var query = document.getElementById('mkdocs-search-query').value; + if (query.length > min_search_length) { + if (!window.Worker) { + displayResults(search(query)); + } else { + searchWorker.postMessage({query: query}); + } + } else { + // Clear results for short queries + displayResults([]); + } +} + +function initSearch () { + var search_input = document.getElementById('mkdocs-search-query'); + if (search_input) { + search_input.addEventListener("keyup", doSearch); + } + var term = getSearchTermFromLocation(); + if (term) { + search_input.value = term; + doSearch(); + } +} + +function onWorkerMessage (e) { + if (e.data.allowSearch) { + initSearch(); + } else if (e.data.results) { + var results = e.data.results; + displayResults(results); + } else if (e.data.config) { + min_search_length = e.data.config.min_search_length-1; + } +} + +if (!window.Worker) { + console.log('Web Worker API not supported'); + // load index in main thread + $.getScript(joinUrl(base_url, "search/worker.js")).done(function () { + console.log('Loaded worker'); + init(); + window.postMessage = function (msg) { + onWorkerMessage({data: msg}); + }; + }).fail(function (jqxhr, settings, exception) { + console.error('Could not load worker.js'); + }); +} else { + // Wrap search in a web worker + var searchWorker = new Worker(joinUrl(base_url, "search/worker.js")); + searchWorker.postMessage({init: true}); + searchWorker.onmessage = onWorkerMessage; +} diff --git a/search/search_index.json b/search/search_index.json new file mode 100644 index 0000000..9644d7d --- /dev/null +++ b/search/search_index.json @@ -0,0 +1 @@ +{"config":{"indexing":"full","lang":["en"],"min_search_length":3,"prebuild_index":false,"separator":"[\\s\\-]+"},"docs":[{"location":"","text":"dd-dataverse-ingest \u00b6 Service for ingesting datasets into Dataverse via the API. SYNOPSIS \u00b6 sudo systemctl {start|stop|restart|status} dd-dataverse-ingest","title":"Introduction"},{"location":"#dd-dataverse-ingest","text":"Service for ingesting datasets into Dataverse via the API.","title":"dd-dataverse-ingest"},{"location":"#synopsis","text":"sudo systemctl {start|stop|restart|status} dd-dataverse-ingest","title":"SYNOPSIS"},{"location":"arch/","text":"DANS Data Station Architecture \u00b6 This module is a component in the DANS Data Station Architecture .","title":"Context"},{"location":"arch/#dans-data-station-architecture","text":"This module is a component in the DANS Data Station Architecture .","title":"DANS Data Station Architecture"},{"location":"description/","text":"DESCRIPTION \u00b6 Service for ingesting datasets into Dataverse via the API. Deposit directories \u00b6 The datasets are prepared as deposit directories (or \"deposits\" for short) in the ingest area. A deposit is a directory with the following structure: 087920d1-e37d-4263-84c7-1321e3ecb5f8 \u251c\u2500\u2500 bag \u2502 \u251c\u2500\u2500 bag-info.txt \u2502 \u251c\u2500\u2500 bagit.txt \u2502 \u251c\u2500\u2500 data \u2502 \u2502 \u251c\u2500\u2500 file1.txt \u2502 \u2502 \u251c\u2500\u2500 file2.txt \u2502 \u2502 \u2514\u2500\u2500 subdirectory \u2502 \u2502 \u2514\u2500\u2500 file3.txt \u2502 \u251c\u2500\u2500 dataset.yml \u2502 \u2514\u2500\u2500 manifest-sha1.txt \u2514\u2500\u2500 deposit.properties The name of the deposit directory must be a UUID. The deposit directory contains the following files: File Description deposit.properties Contains instructions for dd-dataverse-ingest on how to ingest the dataset. bag/ A bag, i.e. a directory with the files to be ingested, laid out according to the BagIt specification. The name of the bag does not have to be \"bag\"; it may be any valid filename. Instead of one bag multiple bags may be included, see below . Metadata and instructions \u00b6 In the root of the bag, the following files can be included to provide metadata and instructions for the ingest process. The files are in YAML format and closely follow the JSON that is passed to the Dataverse API. File Description dataset.yml Dataset level metadata. edit-files.yml Instructions for deleting, replacing or moving files, or updating the file metadata; also included: restricting and embargoing files edit-metadata.yml Edit dataset level metadata, including metadata value deletions edit-permissions.yml Role assignments to create or delete on the dataset update-state.yml Whether to publish the dataset version or submit it for review dataset.yml \u00b6 The format is the same as the JSON that is passed to the createDataset endpoint of the Dataverse API. Note that the files field is not used. It will be set to the empty list by the service, because otherwise Dataverse will reject the request. datasetVersion: license: name: \"CC0 1.0\" uri: \"http://creativecommons.org/publicdomain/zero/1.0\" fileAccessRequest: true metadataBlocks: citation: displayName: \"Citation Metadata\" name: \"citation\" fields: - typeName: \"title\" multiple: false typeClass: \"primitive\" value: \"My dataset\" # Add more metadata fields and blocks as needed edit-files.yml \u00b6 editFiles: deleteFiles: - 'file1.txt' - 'subdirectory/file3.txt' replaceFiles: - 'file2.txt' addRestrictedFiles: - 'file4.txt' - 'subdirectory/file5.txt' addUnrestrictedFiles: - 'file6.txt' moveFiles: - from: 'file6.txt' # Old location in the dataset to: 'subdirectory/file6.txt' # New location in the dataset updateFileMetas: - description: \"This is the first file\" label: \"file1.txt\" directoryLabel: \"subdirectory\" restricted: false categories: [ 'Testlabel' ] autoRenameFiles: - from: \"Unsanitize'd/file?\" # Local file name to: \"Sanitize_d/file_\" # The file name assigned in the dataset addEmbargoes: - filePaths: [ 'file1.txt' ] # All other files will NOT be embargoed dateAvailable: '2030-01-01' reason: 'Pending publication' - filePaths: [ 'file2.txt' ] # All other files will be embargoed dateAvailable: '2040-01-01' reason: 'Pending publication' The actions specified in this file correspond roughly to the actions available in the dropdown menu in the file view of a dataset in Dataverse. The replacement file is looked up in the bag, under the data directory under the same path as the original file has in the dataset. Note that files in replaceFiles will automatically be skipped in the add files step, the deleted files, however, will not. In other words, it is also possible to remove a file and add a file back to the same location in one deposit. In that case, there will be no continuous history of the file in the dataset. The addRestrictedFiles action is included, because it allows you to add a large number of restricted files in a more efficient way than by updating the file metadata of each file individually after adding them unrestricted first. The default action is to add files unrestricted, so there is no explicit action for that. updateFileMetas contains items in the format of the JSON that is passed to the updateFileMetadata endpoint of the Dataverse API. edit-metadata.yml \u00b6 editMetadata: addFieldValues: - typeName: \"subject\" typeClass: \"controlledVocabulary\" multiple: true value: - 'Astronomy and Astrophysics' replaceFieldValues: - typeName: \"producer\" typeClass: \"compound\" multiple: true value: - producerName: typeName: \"producerName\" value: \"John Doe\" - producerAffiliation: typeName: \"producerAffiliation\" value: \"University of Somewhere\" deleteFieldValues: - typeName: \"subject\" typeClass: \"controlledVocabulary\" multiple: true value: - 'Astronomy and Astrophysics' Allows you to selectively delete, add or replace metadata field values. The format is the based on the JSON that is passed to the editDatasetMetadata and deleteDatasetMetadata endpoints of the Dataverse API. However, unlike in the JSON accepted by Dataverse, the typeClass and `multiple fields are not optional in the YAML file. This is due to the library used to parse the YAML files, which uses a deserializer that was designed to parse the JSON that is returned by the Dataverse API (which does not include these fields). The only difference between addFieldValues and replaceFieldValues is that the latter will pass the replace=true parameter to the API. See the API documentation for the exact behavior. Unlike in the editing of files, deletion of field values takes place at the end of the process, so that we don't create a situation where a required field is temporarily empty and Dataverse refuses to save the metadata. edit-permissions.yml \u00b6 editPermissions: deleteRoleAssignments: - role: 'admin' assignee: '@user1' addRoleAssignments: - role: 'admin' assignee: '@user2' Allows you to selectively delete or add role assignments on the dataset. The format is the same as the JSON that is passed to the assignNewRole and deleteRoleAssignments endpoints of the Dataverse API. update-state.yml \u00b6 action: 'submit-for-review' # One of the following actions: # - 'leave-draft' (default) # - 'publish-major-version' # - 'publish-minor-version' # - 'submit-for-review' New versions of existing datasets \u00b6 A deposit can also be used to create a new version of an existing dataset. In this case, the deposit.properties file must contain the following property: updates-dataset: 'doi:10.5072/FK2/ABCDEF' in which the value is the DOI of the dataset to be updated. Instead of one bag directory, the deposit may contain multiple bags. In this case the directories are processed in lexicographical order, so you should name the bags accordingly, e.g. 1-bag , 2-bag , 3-bag , etc. , or 001-bag , 002-bag , 003-bag , etc., depending on the number of bags. DANS bag \u00b6 A DANS bag is a directory in the BagIt format, that also conforms to the DANS bag profile . This is a legacy format that is used by the DANS SWORD2 service. The service can convert a DANS deposit to the standard one described above. Processing \u00b6 The deposit area is a directory with the following structure: imports \u251c\u2500\u2500 inbox \u2502 \u2514\u2500\u2500 path \u2502 \u2514\u2500\u2500 to \u2502 \u251c\u2500\u2500 batch1 \u2502 \u2502 \u251c\u2500\u2500 0223914e-c053-4ee8-99d8-a9135fa4db4a \u2502 \u2502 \u251c\u2500\u2500 1b5c1b24-de40-4a40-9c58-d4409672229e \u2502 \u2502 \u2514\u2500\u2500 9a47c5be-58c0-4295-8409-8156bd9ed9e1 \u2502 \u2514\u2500\u2500 batch2 \u2502 \u251c\u2500\u2500 5e42a936-4b90-4cac-b3c1-798b0b5eeb0b \u2502 \u2514\u2500\u2500 9c2ce5a5-b836-468a-89d4-880efb071d9d \u2514\u2500\u2500 outbox \u2514\u2500\u2500 path \u2514\u2500\u2500 to \u2514\u2500\u2500 batch1 \u251c\u2500\u2500 failed \u251c\u2500\u2500 processed \u2502 \u2514\u2500\u2500 7660539b-6ddb-4719-aa31-a3d1c978081b \u2514\u2500\u2500 rejected Processing a batch \u00b6 The deposits to be processed are to be placed under inbox . All the files in it must be readable and writable by the service. When the service is requested to process a batch, it will do the following: Sort the deposits in the batch by their creation.timestamp property in deposit.properties , in ascending order. Process each deposit in the batch in order. Processing a deposit \u00b6 Sort the bags in the deposit by lexicographical order. Process each bag in the deposit in order. Move the deposit to: outbox/path/to/batch/processed if the all versions were published successfully, or to outbox/path/to/batch/rejected if one or more of the versions were not valid, or to outbox/path/to/batch/failed if some other error occurred. Note that the relative path of the processed deposits in outbox is the same as in the inbox, except for an extra level of directories for the status of the deposit. Processing a bag \u00b6 The actions described in the Yaml files will be executed in same order as they are listed above. Note that changing the order of the actions in the Yaml files has no effect on the order in which they are executed. All files and all action fields (e.g., addRestrictedFiles ) are optional, except for dataset.yml , when creating a new dataset.","title":"Description"},{"location":"description/#description","text":"Service for ingesting datasets into Dataverse via the API.","title":"DESCRIPTION"},{"location":"description/#deposit-directories","text":"The datasets are prepared as deposit directories (or \"deposits\" for short) in the ingest area. A deposit is a directory with the following structure: 087920d1-e37d-4263-84c7-1321e3ecb5f8 \u251c\u2500\u2500 bag \u2502 \u251c\u2500\u2500 bag-info.txt \u2502 \u251c\u2500\u2500 bagit.txt \u2502 \u251c\u2500\u2500 data \u2502 \u2502 \u251c\u2500\u2500 file1.txt \u2502 \u2502 \u251c\u2500\u2500 file2.txt \u2502 \u2502 \u2514\u2500\u2500 subdirectory \u2502 \u2502 \u2514\u2500\u2500 file3.txt \u2502 \u251c\u2500\u2500 dataset.yml \u2502 \u2514\u2500\u2500 manifest-sha1.txt \u2514\u2500\u2500 deposit.properties The name of the deposit directory must be a UUID. The deposit directory contains the following files: File Description deposit.properties Contains instructions for dd-dataverse-ingest on how to ingest the dataset. bag/ A bag, i.e. a directory with the files to be ingested, laid out according to the BagIt specification. The name of the bag does not have to be \"bag\"; it may be any valid filename. Instead of one bag multiple bags may be included, see below .","title":"Deposit directories"},{"location":"description/#metadata-and-instructions","text":"In the root of the bag, the following files can be included to provide metadata and instructions for the ingest process. The files are in YAML format and closely follow the JSON that is passed to the Dataverse API. File Description dataset.yml Dataset level metadata. edit-files.yml Instructions for deleting, replacing or moving files, or updating the file metadata; also included: restricting and embargoing files edit-metadata.yml Edit dataset level metadata, including metadata value deletions edit-permissions.yml Role assignments to create or delete on the dataset update-state.yml Whether to publish the dataset version or submit it for review","title":"Metadata and instructions"},{"location":"description/#datasetyml","text":"The format is the same as the JSON that is passed to the createDataset endpoint of the Dataverse API. Note that the files field is not used. It will be set to the empty list by the service, because otherwise Dataverse will reject the request. datasetVersion: license: name: \"CC0 1.0\" uri: \"http://creativecommons.org/publicdomain/zero/1.0\" fileAccessRequest: true metadataBlocks: citation: displayName: \"Citation Metadata\" name: \"citation\" fields: - typeName: \"title\" multiple: false typeClass: \"primitive\" value: \"My dataset\" # Add more metadata fields and blocks as needed","title":"dataset.yml"},{"location":"description/#edit-filesyml","text":"editFiles: deleteFiles: - 'file1.txt' - 'subdirectory/file3.txt' replaceFiles: - 'file2.txt' addRestrictedFiles: - 'file4.txt' - 'subdirectory/file5.txt' addUnrestrictedFiles: - 'file6.txt' moveFiles: - from: 'file6.txt' # Old location in the dataset to: 'subdirectory/file6.txt' # New location in the dataset updateFileMetas: - description: \"This is the first file\" label: \"file1.txt\" directoryLabel: \"subdirectory\" restricted: false categories: [ 'Testlabel' ] autoRenameFiles: - from: \"Unsanitize'd/file?\" # Local file name to: \"Sanitize_d/file_\" # The file name assigned in the dataset addEmbargoes: - filePaths: [ 'file1.txt' ] # All other files will NOT be embargoed dateAvailable: '2030-01-01' reason: 'Pending publication' - filePaths: [ 'file2.txt' ] # All other files will be embargoed dateAvailable: '2040-01-01' reason: 'Pending publication' The actions specified in this file correspond roughly to the actions available in the dropdown menu in the file view of a dataset in Dataverse. The replacement file is looked up in the bag, under the data directory under the same path as the original file has in the dataset. Note that files in replaceFiles will automatically be skipped in the add files step, the deleted files, however, will not. In other words, it is also possible to remove a file and add a file back to the same location in one deposit. In that case, there will be no continuous history of the file in the dataset. The addRestrictedFiles action is included, because it allows you to add a large number of restricted files in a more efficient way than by updating the file metadata of each file individually after adding them unrestricted first. The default action is to add files unrestricted, so there is no explicit action for that. updateFileMetas contains items in the format of the JSON that is passed to the updateFileMetadata endpoint of the Dataverse API.","title":"edit-files.yml"},{"location":"description/#edit-metadatayml","text":"editMetadata: addFieldValues: - typeName: \"subject\" typeClass: \"controlledVocabulary\" multiple: true value: - 'Astronomy and Astrophysics' replaceFieldValues: - typeName: \"producer\" typeClass: \"compound\" multiple: true value: - producerName: typeName: \"producerName\" value: \"John Doe\" - producerAffiliation: typeName: \"producerAffiliation\" value: \"University of Somewhere\" deleteFieldValues: - typeName: \"subject\" typeClass: \"controlledVocabulary\" multiple: true value: - 'Astronomy and Astrophysics' Allows you to selectively delete, add or replace metadata field values. The format is the based on the JSON that is passed to the editDatasetMetadata and deleteDatasetMetadata endpoints of the Dataverse API. However, unlike in the JSON accepted by Dataverse, the typeClass and `multiple fields are not optional in the YAML file. This is due to the library used to parse the YAML files, which uses a deserializer that was designed to parse the JSON that is returned by the Dataverse API (which does not include these fields). The only difference between addFieldValues and replaceFieldValues is that the latter will pass the replace=true parameter to the API. See the API documentation for the exact behavior. Unlike in the editing of files, deletion of field values takes place at the end of the process, so that we don't create a situation where a required field is temporarily empty and Dataverse refuses to save the metadata.","title":"edit-metadata.yml"},{"location":"description/#edit-permissionsyml","text":"editPermissions: deleteRoleAssignments: - role: 'admin' assignee: '@user1' addRoleAssignments: - role: 'admin' assignee: '@user2' Allows you to selectively delete or add role assignments on the dataset. The format is the same as the JSON that is passed to the assignNewRole and deleteRoleAssignments endpoints of the Dataverse API.","title":"edit-permissions.yml"},{"location":"description/#update-stateyml","text":"action: 'submit-for-review' # One of the following actions: # - 'leave-draft' (default) # - 'publish-major-version' # - 'publish-minor-version' # - 'submit-for-review'","title":"update-state.yml"},{"location":"description/#new-versions-of-existing-datasets","text":"A deposit can also be used to create a new version of an existing dataset. In this case, the deposit.properties file must contain the following property: updates-dataset: 'doi:10.5072/FK2/ABCDEF' in which the value is the DOI of the dataset to be updated. Instead of one bag directory, the deposit may contain multiple bags. In this case the directories are processed in lexicographical order, so you should name the bags accordingly, e.g. 1-bag , 2-bag , 3-bag , etc. , or 001-bag , 002-bag , 003-bag , etc., depending on the number of bags.","title":"New versions of existing datasets"},{"location":"description/#dans-bag","text":"A DANS bag is a directory in the BagIt format, that also conforms to the DANS bag profile . This is a legacy format that is used by the DANS SWORD2 service. The service can convert a DANS deposit to the standard one described above.","title":"DANS bag"},{"location":"description/#processing","text":"The deposit area is a directory with the following structure: imports \u251c\u2500\u2500 inbox \u2502 \u2514\u2500\u2500 path \u2502 \u2514\u2500\u2500 to \u2502 \u251c\u2500\u2500 batch1 \u2502 \u2502 \u251c\u2500\u2500 0223914e-c053-4ee8-99d8-a9135fa4db4a \u2502 \u2502 \u251c\u2500\u2500 1b5c1b24-de40-4a40-9c58-d4409672229e \u2502 \u2502 \u2514\u2500\u2500 9a47c5be-58c0-4295-8409-8156bd9ed9e1 \u2502 \u2514\u2500\u2500 batch2 \u2502 \u251c\u2500\u2500 5e42a936-4b90-4cac-b3c1-798b0b5eeb0b \u2502 \u2514\u2500\u2500 9c2ce5a5-b836-468a-89d4-880efb071d9d \u2514\u2500\u2500 outbox \u2514\u2500\u2500 path \u2514\u2500\u2500 to \u2514\u2500\u2500 batch1 \u251c\u2500\u2500 failed \u251c\u2500\u2500 processed \u2502 \u2514\u2500\u2500 7660539b-6ddb-4719-aa31-a3d1c978081b \u2514\u2500\u2500 rejected","title":"Processing"},{"location":"description/#processing-a-batch","text":"The deposits to be processed are to be placed under inbox . All the files in it must be readable and writable by the service. When the service is requested to process a batch, it will do the following: Sort the deposits in the batch by their creation.timestamp property in deposit.properties , in ascending order. Process each deposit in the batch in order.","title":"Processing a batch"},{"location":"description/#processing-a-deposit","text":"Sort the bags in the deposit by lexicographical order. Process each bag in the deposit in order. Move the deposit to: outbox/path/to/batch/processed if the all versions were published successfully, or to outbox/path/to/batch/rejected if one or more of the versions were not valid, or to outbox/path/to/batch/failed if some other error occurred. Note that the relative path of the processed deposits in outbox is the same as in the inbox, except for an extra level of directories for the status of the deposit.","title":"Processing a deposit"},{"location":"description/#processing-a-bag","text":"The actions described in the Yaml files will be executed in same order as they are listed above. Note that changing the order of the actions in the Yaml files has no effect on the order in which they are executed. All files and all action fields (e.g., addRestrictedFiles ) are optional, except for dataset.yml , when creating a new dataset.","title":"Processing a bag"},{"location":"dev/","text":"Development \u00b6 Local debugging \u00b6 To locally debug you need to have the following services running: A dataverse instance. Internal DANS developers can use the vagrant boxes with development versions of the Data Stations for this. You will need to configure access to the admin interface to use the unblock-key: curl -X PUT -d s3kretKey http://localhost:8080/api/admin/settings/:BlockedApiKey curl -X PUT -d unblock-key http://localhost:8080/api/admin/settings/:BlockedApiPolicy # When done debugging, you can reset the policy to localhost-only: curl -X PUT -d localhost-only http://localhost:8080/api/admin/settings/:BlockedApiPolicy/?unblock-key=s3kretKey dd-validate-dans-bag . Note that its validation.baseFolder configuration property should point to the deposit area or an ancestor of it. Calling dd-dataverse-ingest is most conveniently done through the dd-dataverse-ingest-cli command line tool.","title":"Overview"},{"location":"dev/#development","text":"","title":"Development"},{"location":"dev/#local-debugging","text":"To locally debug you need to have the following services running: A dataverse instance. Internal DANS developers can use the vagrant boxes with development versions of the Data Stations for this. You will need to configure access to the admin interface to use the unblock-key: curl -X PUT -d s3kretKey http://localhost:8080/api/admin/settings/:BlockedApiKey curl -X PUT -d unblock-key http://localhost:8080/api/admin/settings/:BlockedApiPolicy # When done debugging, you can reset the policy to localhost-only: curl -X PUT -d localhost-only http://localhost:8080/api/admin/settings/:BlockedApiPolicy/?unblock-key=s3kretKey dd-validate-dans-bag . Note that its validation.baseFolder configuration property should point to the deposit area or an ancestor of it. Calling dd-dataverse-ingest is most conveniently done through the dd-dataverse-ingest-cli command line tool.","title":"Local debugging"},{"location":"install/","text":"INSTALLATION AND CONFIGURATION \u00b6 Currently, this project is built as an RPM package for RHEL8/Rocky8 and later. The RPM will install the binaries to /opt/dans.knaw.nl/dd-dataverse-ingest and the configuration files to /etc/opt/dans.knaw.nl/dd-dataverse-ingest . For installation on systems that do no support RPM and/or systemd: Build the tarball (see next section). Extract it to some location on your system, for example /opt/dans.knaw.nl/dd-dataverse-ingest . Start the service with the following command /opt/dans.knaw.nl/dd-dataverse-ingest/bin/dd-dataverse-ingest server /opt/dans.knaw.nl/dd-dataverse-ingest/cfg/config.yml BUILDING FROM SOURCE \u00b6 Prerequisites: Java 17 or higher Maven 3.3.3 or higher RPM Steps: git clone https://github.com/DANS-KNAW/dd-dataverse-ingest.git cd dd-dataverse-ingest mvn clean install If the rpm executable is found at /usr/local/bin/rpm , the build profile that includes the RPM packaging will be activated. If rpm is available, but at a different path, then activate it by using Maven's -P switch: mvn -Pprm install . Alternatively, to build the tarball execute: mvn clean install assembly:single","title":"Installation"},{"location":"install/#installation-and-configuration","text":"Currently, this project is built as an RPM package for RHEL8/Rocky8 and later. The RPM will install the binaries to /opt/dans.knaw.nl/dd-dataverse-ingest and the configuration files to /etc/opt/dans.knaw.nl/dd-dataverse-ingest . For installation on systems that do no support RPM and/or systemd: Build the tarball (see next section). Extract it to some location on your system, for example /opt/dans.knaw.nl/dd-dataverse-ingest . Start the service with the following command /opt/dans.knaw.nl/dd-dataverse-ingest/bin/dd-dataverse-ingest server /opt/dans.knaw.nl/dd-dataverse-ingest/cfg/config.yml","title":"INSTALLATION AND CONFIGURATION"},{"location":"install/#building-from-source","text":"Prerequisites: Java 17 or higher Maven 3.3.3 or higher RPM Steps: git clone https://github.com/DANS-KNAW/dd-dataverse-ingest.git cd dd-dataverse-ingest mvn clean install If the rpm executable is found at /usr/local/bin/rpm , the build profile that includes the RPM packaging will be activated. If rpm is available, but at a different path, then activate it by using Maven's -P switch: mvn -Pprm install . Alternatively, to build the tarball execute: mvn clean install assembly:single","title":"BUILDING FROM SOURCE"}]} \ No newline at end of file diff --git a/search/worker.js b/search/worker.js new file mode 100644 index 0000000..8628dbc --- /dev/null +++ b/search/worker.js @@ -0,0 +1,133 @@ +var base_path = 'function' === typeof importScripts ? '.' : '/search/'; +var allowSearch = false; +var index; +var documents = {}; +var lang = ['en']; +var data; + +function getScript(script, callback) { + console.log('Loading script: ' + script); + $.getScript(base_path + script).done(function () { + callback(); + }).fail(function (jqxhr, settings, exception) { + console.log('Error: ' + exception); + }); +} + +function getScriptsInOrder(scripts, callback) { + if (scripts.length === 0) { + callback(); + return; + } + getScript(scripts[0], function() { + getScriptsInOrder(scripts.slice(1), callback); + }); +} + +function loadScripts(urls, callback) { + if( 'function' === typeof importScripts ) { + importScripts.apply(null, urls); + callback(); + } else { + getScriptsInOrder(urls, callback); + } +} + +function onJSONLoaded () { + data = JSON.parse(this.responseText); + var scriptsToLoad = ['lunr.js']; + if (data.config && data.config.lang && data.config.lang.length) { + lang = data.config.lang; + } + if (lang.length > 1 || lang[0] !== "en") { + scriptsToLoad.push('lunr.stemmer.support.js'); + if (lang.length > 1) { + scriptsToLoad.push('lunr.multi.js'); + } + if (lang.includes("ja") || lang.includes("jp")) { + scriptsToLoad.push('tinyseg.js'); + } + for (var i=0; i < lang.length; i++) { + if (lang[i] != 'en') { + scriptsToLoad.push(['lunr', lang[i], 'js'].join('.')); + } + } + } + loadScripts(scriptsToLoad, onScriptsLoaded); +} + +function onScriptsLoaded () { + console.log('All search scripts loaded, building Lunr index...'); + if (data.config && data.config.separator && data.config.separator.length) { + lunr.tokenizer.separator = new RegExp(data.config.separator); + } + + if (data.index) { + index = lunr.Index.load(data.index); + data.docs.forEach(function (doc) { + documents[doc.location] = doc; + }); + console.log('Lunr pre-built index loaded, search ready'); + } else { + index = lunr(function () { + if (lang.length === 1 && lang[0] !== "en" && lunr[lang[0]]) { + this.use(lunr[lang[0]]); + } else if (lang.length > 1) { + this.use(lunr.multiLanguage.apply(null, lang)); // spread operator not supported in all browsers: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Spread_operator#Browser_compatibility + } + this.field('title'); + this.field('text'); + this.ref('location'); + + for (var i=0; i < data.docs.length; i++) { + var doc = data.docs[i]; + this.add(doc); + documents[doc.location] = doc; + } + }); + console.log('Lunr index built, search ready'); + } + allowSearch = true; + postMessage({config: data.config}); + postMessage({allowSearch: allowSearch}); +} + +function init () { + var oReq = new XMLHttpRequest(); + oReq.addEventListener("load", onJSONLoaded); + var index_path = base_path + '/search_index.json'; + if( 'function' === typeof importScripts ){ + index_path = 'search_index.json'; + } + oReq.open("GET", index_path); + oReq.send(); +} + +function search (query) { + if (!allowSearch) { + console.error('Assets for search still loading'); + return; + } + + var resultDocuments = []; + var results = index.search(query); + for (var i=0; i < results.length; i++){ + var result = results[i]; + doc = documents[result.ref]; + doc.summary = doc.text.substring(0, 200); + resultDocuments.push(doc); + } + return resultDocuments; +} + +if( 'function' === typeof importScripts ) { + onmessage = function (e) { + if (e.data.init) { + init(); + } else if (e.data.query) { + postMessage({ results: search(e.data.query) }); + } else { + console.error("Worker - Unrecognized message: " + e); + } + }; +} diff --git a/sitemap.xml b/sitemap.xml new file mode 100644 index 0000000..73fd5fa --- /dev/null +++ b/sitemap.xml @@ -0,0 +1,28 @@ + +