Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import dataset #8

Merged
merged 7 commits into from
Dec 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 39 additions & 8 deletions docs/description.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,12 +39,41 @@ closely follow the JSON that is passed to the Dataverse API.

| File | Description |
|------------------------|------------------------------------------------------------------------------------------------------------------------------------------|
| `init.yml` | Preconditions and instructions for creating a new dataset. |
| `dataset.yml` | Dataset level metadata. |
| `edit-files.yml` | Instructions for deleting, replacing or moving files, or updating the file metadata;<br> also included: restricting and embargoing files |
| `edit-metadata.yml` | Edit dataset level metadata, including metadata value deletions |
| `edit-permissions.yml` | Role assignments to create or delete on the dataset |
| `update-state.yml` | Whether to publish the dataset version or submit it for review |

##### init.yml

The init file initializes the ingest process. It can be used to verify that an expected precondition is met:

```yaml
init:
expect:
state: 'released' # or 'draft', 'absent'.
```

If the state of the dataset does not match the expected state, the ingest procedure will be aborted. The state can be either `released`, `draft` or `absent`
(meaning that the dataset should not exist). By default, no check will be performed.

It can also be used to instruct the service to import the bag as a dataset with an existing DOI:

```yaml
init:
create:
importPid: 'doi:10.5072/FK2/ABCDEF'
```

In this case the `updates-dataset` property in `deposit.properties` should not be set. It will be ignored if it is. By default, a new dataset will be created,
whose persistent identifier will be assigned by Dataverse.

The user is responsible for providing expectations and instructions that do not conflict with each other. For example, if the `importPid` property is set, and
the `state` property is set to `released`, the service will either abort because the dataset already exists, or it will fail to import the dataset, because
the dataset already exists.

##### dataset.yml

The format is the same as the JSON that is passed to the [createDataset]{:target=_blank} endpoint of the Dataverse API. Note that the `files` field is not used.
Expand Down Expand Up @@ -83,7 +112,7 @@ editFiles:
- 'file4.txt'
- 'subdirectory/file5.txt'
addUnrestrictedFiles:
- 'file6.txt'
- 'file6.txt'
moveFiles:
- from: 'file6.txt' # Old location in the dataset
to: 'subdirectory/file6.txt' # New location in the dataset
Expand All @@ -107,7 +136,7 @@ editFiles:

The actions specified in this file correspond roughly to the actions available in the dropdown menu in the file view of a dataset in Dataverse.

The replacement file is looked up in the bag, under the `data` directory under the same path as the original file has in the dataset. Note that files in
The replacement file is looked up in the bag, under the `data` directory under the same path as the original file has in the dataset. Note that files in
`replaceFiles` will automatically be skipped in the add files step, the deleted files, however, will not. In other words, it is also possible to remove a
file and add a file back to the same location in one deposit. In that case, there will be no continuous history of the file in the dataset.

Expand Down Expand Up @@ -183,14 +212,16 @@ Allows you to selectively delete or add role assignments on the dataset. The for
##### update-state.yml

```yaml
action: 'submit-for-review'
# One of the following actions:
# - 'leave-draft' (default)
# - 'publish-major-version'
# - 'publish-minor-version'
# - 'submit-for-review'
updateState:
publish: major # or 'minor'
```

```yaml
updateState:
releaseMigrated: 2021-01-01
```


#### New versions of existing datasets

A deposit can also be used to create a new version of an existing dataset. In this case, the `deposit.properties` file must contain the following property:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -91,8 +91,8 @@ public void run(final DdDataverseIngestConfiguration configuration, final Enviro
DansDepositSupportFactory dansDepositSupportFactoryImport = new DansDepositSupportDisabledFactory();
if (dansDepositConversionConfig != null) {
var dansBagMappingServiceImport = createDansBagMappingService(false, dansDepositConversionConfig, dataverseService);
var validateDansBagImportImport = new ValidateDansBagServiceImpl(dansDepositConversionConfig.getValidateDansBag(), false);
dansDepositSupportFactoryImport = new DansDepositSupportFactoryImpl(validateDansBagImportImport, dansBagMappingServiceImport, dataverseService, yamlService);
var validateDansBagImport = new ValidateDansBagServiceImpl(dansDepositConversionConfig.getValidateDansBag(), false);
dansDepositSupportFactoryImport = new DansDepositSupportFactoryImpl(validateDansBagImport, dansBagMappingServiceImport, dataverseService, yamlService);
}
var depositTaskFactoryImport = new DepositTaskFactoryImpl(bagProcessorFactory, dansDepositSupportFactoryImport);
var importJobFactory = new ImportJobFactoryImpl(dataverseIngestDepositFactory, depositTaskFactoryImport);
Expand All @@ -105,9 +105,9 @@ public void run(final DdDataverseIngestConfiguration configuration, final Enviro
*/
DansDepositSupportFactory dansDepositSupportFactoryMigration = new DansDepositSupportDisabledFactory();
if (dansDepositConversionConfig != null) {
var dansBagMappingService = createDansBagMappingService(true, dansDepositConversionConfig, dataverseService);
var validateDansBagImport = new ValidateDansBagServiceImpl(dansDepositConversionConfig.getValidateDansBag(), true);
dansDepositSupportFactoryMigration = new DansDepositSupportFactoryImpl(validateDansBagImport, dansBagMappingService, dataverseService, yamlService);
var dansBagMappingServiceMigration = createDansBagMappingService(true, dansDepositConversionConfig, dataverseService);
var validateDansBagMigration = new ValidateDansBagServiceImpl(dansDepositConversionConfig.getValidateDansBag(), true);
dansDepositSupportFactoryMigration = new DansDepositSupportFactoryImpl(validateDansBagMigration, dansBagMappingServiceMigration, dataverseService, yamlService);
}
var depositTaskFactoryMigration = new DepositTaskFactoryImpl(bagProcessorFactory, dansDepositSupportFactoryMigration);
var migrationJobFactory = new ImportJobFactoryImpl(dataverseIngestDepositFactory, depositTaskFactoryMigration);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,11 @@
import nl.knaw.dans.dvingest.core.yaml.EditMetadataRoot;
import nl.knaw.dans.dvingest.core.yaml.EditPermissions;
import nl.knaw.dans.dvingest.core.yaml.EditPermissionsRoot;
import nl.knaw.dans.dvingest.core.yaml.Init;
import nl.knaw.dans.dvingest.core.yaml.InitRoot;
import nl.knaw.dans.dvingest.core.yaml.UpdateAction;
import nl.knaw.dans.dvingest.core.yaml.UpdateState;
import nl.knaw.dans.dvingest.core.yaml.UpdateStateRoot;
import nl.knaw.dans.lib.dataverse.model.dataset.Dataset;

import java.io.IOException;
Expand All @@ -35,6 +39,7 @@
public class DataverseIngestBag implements Comparable<DataverseIngestBag> {
private final YamlServiceImpl yamService;

public static final String INIT_YML = "init.yml";
public static final String DATASET_YML = "dataset.yml";
public static final String EDIT_FILES_YML = "edit-files.yml";
public static final String EDIT_METADATA_YML = "edit-metadata.yml";
Expand All @@ -56,6 +61,14 @@ public boolean looksLikeDansBag() {
return Files.exists(bagDir.resolve("metadata/dataset.xml"));
}

public Init getInit() throws IOException, ConfigurationException {
if (!Files.exists(bagDir.resolve(INIT_YML))) {
return null;
}
var initRoot = yamService.readYaml(bagDir.resolve(INIT_YML), InitRoot.class);
return initRoot.getInit();
}

public Dataset getDatasetMetadata() throws IOException, ConfigurationException {
if (!Files.exists(bagDir.resolve(DATASET_YML))) {
return null;
Expand Down Expand Up @@ -89,11 +102,12 @@ public EditPermissions getEditPermissions() throws IOException, ConfigurationExc
return editPermissionsRoot.getEditPermissions();
}

public UpdateState getUpdateState() throws IOException, ConfigurationException {
public UpdateAction getUpdateState() throws IOException, ConfigurationException {
if (!Files.exists(bagDir.resolve(UPDATE_STATE_YML))) {
return null;
}
return yamService.readYaml(bagDir.resolve(UPDATE_STATE_YML), UpdateState.class);
var updateStateRoot = yamService.readYaml(bagDir.resolve(UPDATE_STATE_YML), UpdateStateRoot.class);
return updateStateRoot.getUpdateState();
}

@Override
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@
import nl.knaw.dans.dvingest.core.DataverseIngestBag;
import nl.knaw.dans.dvingest.core.service.DataverseService;
import nl.knaw.dans.dvingest.core.service.UtilityServices;
import nl.knaw.dans.dvingest.core.yaml.UpdateState;
import nl.knaw.dans.dvingest.core.yaml.UpdateStateRoot;
import nl.knaw.dans.lib.dataverse.DataverseException;

import java.io.IOException;
Expand All @@ -39,7 +41,7 @@ public class BagProcessor {

@Builder
private BagProcessor(UUID depositId, DataverseIngestBag bag, DataverseService dataverseService, UtilityServices utilityServices) throws IOException, ConfigurationException {
this.datasetVersionCreator = new DatasetVersionCreator(depositId, dataverseService, bag.getDatasetMetadata());
this.datasetVersionCreator = new DatasetVersionCreator(depositId, dataverseService, bag.getInit(), bag.getDatasetMetadata());
this.filesEditor = new FilesEditor(depositId, bag.getDataDir(), bag.getEditFiles(), dataverseService, utilityServices);
this.metadataEditor = new MetadataEditor(depositId, bag.getEditMetadata(), dataverseService);
this.permissionsEditor = new PermissionsEditor(depositId, bag.getEditPermissions(), dataverseService);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@
import lombok.NonNull;
import lombok.extern.slf4j.Slf4j;
import nl.knaw.dans.dvingest.core.service.DataverseService;
import nl.knaw.dans.dvingest.core.yaml.Expect;
import nl.knaw.dans.dvingest.core.yaml.Init;
import nl.knaw.dans.lib.dataverse.DataverseException;
import nl.knaw.dans.lib.dataverse.model.dataset.Dataset;

Expand All @@ -36,15 +38,27 @@ public class DatasetVersionCreator {
@NonNull
private final DataverseService dataverseService;

private final Init init;

private final Dataset dataset;

public String createDatasetVersion(String targetPid) throws IOException, DataverseException {
if (init != null && init.getExpect() != null) {
checkExpectations(init.getExpect(), targetPid);
}

var pid = targetPid;
if (targetPid == null) {
if (dataset == null) {
throw new IllegalArgumentException("Must have dataset metadata to create a new dataset.");
}
pid = createDataset();
if (init != null && init.getCreate() != null && init.getCreate().getImportPid() != null) {
importDataset(init.getCreate().getImportPid());
pid = init.getCreate().getImportPid();
}
else {
pid = createDataset();
}
}
// Even if we just created the dataset, we still need to update the metadata, because Dataverse ignores some things
// in the create request.
Expand All @@ -54,6 +68,46 @@ public String createDatasetVersion(String targetPid) throws IOException, Dataver
return pid;
}

private void checkExpectations(Expect expect, String targetPid) throws DataverseException, IOException {
if (targetPid == null) {
return;
}

if (expect != null && expect.getState() != null) {
switch (expect.getState()) {
case draft:
case released:
var state = dataverseService.getDatasetState(targetPid);
if (expect.getState().name().equals(state.toLowerCase())) {
log.debug("Expected state {} found for dataset {}", expect.getState(), targetPid);
}
else {
throw new IllegalStateException("Expected state " + expect.getState() + " but found " + state + " for dataset " + targetPid);
}
break;
case absent:
try {
dataverseService.getDatasetState(targetPid);
throw new IllegalStateException("Expected state absent but found for dataset " + targetPid);
}
catch (DataverseException e) {
if (e.getMessage().contains("404")) {
log.debug("Expected state absent found for dataset {}", targetPid);
}
else {
throw e;
}
}
}
}
}

private void importDataset(String pid) throws IOException, DataverseException {
log.debug("Start importing dataset for deposit {}", depositId);
dataverseService.importDataset(pid, dataset);
log.debug("End importing dataset for deposit {}", depositId);
}

private String createDataset() throws IOException, DataverseException {
log.debug("Start creating dataset for deposit {}", depositId);
var pid = dataverseService.createDataset(dataset);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -122,16 +122,34 @@ private void deleteFiles() throws IOException, DataverseException {
log.debug("End deleting files for deposit {}", depositId);
}

private void replaceFiles() throws IOException, DataverseException {
private void replaceFiles() throws IOException {
log.debug("Start replacing {} files for deposit {}", depositId, editFiles.getReplaceFiles().size());
for (var filepath : editFiles.getReplaceFiles()) {
log.debug("Replacing file: {}", filepath);
var fileMeta = filesInDatasetCache.get(filepath);
dataverseService.replaceFile(pid, fileMeta, dataDir.resolve(filepath));
utilityServices.wrapIfZipFile(dataDir.resolve(filepath)).ifPresentOrElse(
zipFile -> {
replaceFileOrThrow(pid, fileMeta, zipFile);
FileUtils.deleteQuietly(zipFile.toFile());
},
() -> {
var fileToUpload = dataDir.resolve(filepath);
replaceFileOrThrow(pid, fileMeta, fileToUpload);
}
);
}
log.debug("End replacing files for deposit {}", depositId);
}

private void replaceFileOrThrow(String pid, FileMeta fileMeta, Path fileToUpload) {
try {
dataverseService.replaceFile(pid, fileMeta, fileToUpload);
}
catch (IOException | DataverseException e) {
throw new RuntimeException(e);
}
}

private void addRestrictedFiles() throws IOException, DataverseException {
log.debug("Start adding {} restricted files for deposit {}", editFiles.getAddRestrictedFiles().size(), depositId);
var iterator = new PathIterator(getRestrictedFilesToUpload());
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,9 @@
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import nl.knaw.dans.dvingest.core.service.DataverseService;
import nl.knaw.dans.dvingest.core.yaml.UpdateState;
import nl.knaw.dans.dvingest.core.yaml.PublishAction;
import nl.knaw.dans.dvingest.core.yaml.ReleaseMigratedAction;
import nl.knaw.dans.dvingest.core.yaml.UpdateAction;
import nl.knaw.dans.lib.dataverse.DataverseException;
import nl.knaw.dans.lib.dataverse.model.dataset.UpdateType;

Expand All @@ -29,29 +31,18 @@
@RequiredArgsConstructor
public class StateUpdater {
private final UUID depositId;
private final UpdateState updateState;
private final UpdateAction updateAction;
private final DataverseService dataverseService;

private String pid;

public void updateState(String pid) throws DataverseException, IOException {
this.pid = pid;
if (updateState == null) {
log.debug("No update state found. Skipping update state processing.");
return;
if (updateAction instanceof PublishAction) {
publishVersion(((PublishAction) updateAction).getUpdateType());
}
if ("publish-major".equals(updateState.getAction())) {
publishVersion(UpdateType.major);
}
else if ("publish-minor".equals(updateState.getAction())) {
publishVersion(UpdateType.minor);
}
else if ("submit-for-review".equals(updateState.getAction())) {
// TODO: Implement submit for review
throw new UnsupportedOperationException("Submit for review not yet implemented");
}
else {
throw new IllegalArgumentException("Unknown update state action: " + updateState.getAction());
else if (updateAction instanceof ReleaseMigratedAction) {
releaseMigrated(((ReleaseMigratedAction) updateAction).getReleaseDate());
}
}

Expand All @@ -62,4 +53,10 @@ private void publishVersion(UpdateType updateType) throws DataverseException, IO
log.debug("End publishing version for deposit {}", depositId);
}

public void releaseMigrated(String date) throws DataverseException, IOException {
log.debug("Start releasing migrated version for deposit {}", depositId);
dataverseService.releaseMigratedDataset(pid, date);
dataverseService.waitForState(pid, "RELEASED");
log.debug("End releasing migrated version for deposit {}", depositId);
}
}
Loading
Loading