You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Expected behaviour
When an AIP is re-ingested, the original AIP is replaced in its entirety in storage.
Current behaviour
We are using DuraCloud for storage and are storing AIPs as compressed 7z packages using bzip2. Some re-ingested AIPs are coming out smaller than the original AIPs, which seems understandable since we use the fastest compression level (is a 869.92 MB difference too much to chalk up to the compression level?).
In some cases, the size is dropping to a smaller GB range (e.g. 3.05GB to 2.98GB, as opposed to 3.05GB to 3.01GB). Since DuraCloud stores packages in 1GB chunks, this is leaving us with orphaned chunks of the original AIP in storage instead of replacing the original AIP entirely.
For example:
Original ingest size (2024-03-04): 36155.86 MB (35.3 GB)
number of associated chunks per dura-manifest: 38 (dura-chunk-0000 to dura-chunk-0037)
Reingest size (2024-04-10): 35450.7 MB (34.62 GB)
number of associated chunks per dura-manifest: 37 (dura-chunk-0000 to dura-chunk-0036)
number of associated chunks in storage: 38 (dura-chunk-0000 to dura-chunk-0037)
last modified date of dura-chunk-0037: 2024-03-04
last modified date of all other dura-chunks and dura-manifest: 2024-04-10
From the DuraCloud audit log, it looks like the original AIP is being overwritten by the re-ingested AIP so additional chunks from the original AIP are not accounted for.
Steps to reproduce
Set the processing configuration to "compression algorithm: 7z using bzip2" and "compress level: 1 - fastest level"
Ingest a package that is multiple GB (since DuraCloud chunks in 1GB segments)
Observe the size, number of chunks, and last modified date associated with the ingested package (e.g. in the browser interface or via the dura-manifest or audit log)
Re-ingest the package using either the metadata-only or partial (normalize for access only) workflow
Observe the size, number of chunks, and last modified date associated with the re-ingested package (e.g. in the browser interface or via the dura-manifest or audit log)
Note: it may take a few tries to create a package that will see a change in GB (e.g. 3.05GB to 2.98GB, as opposed to 3.05GB to 3.01GB) after re-ingest
Your environment (version of Archivematica, operating system, other relevant details)
AM 1.14, SS 0.20
DuraCloud 7.1
CentOS 7
For Artefactual use:
Before you close this issue, you must check off the following:
All pull requests related to this issue are properly linked
All pull requests related to this issue have been merged
A testing plan for this issue has been implemented and passed (testing plan information should be included in the issue body or comments)
Documentation regarding this issue has been written and merged (if applicable)
Details about this issue have been added to the release notes (if applicable)
The text was updated successfully, but these errors were encountered:
Expected behaviour
When an AIP is re-ingested, the original AIP is replaced in its entirety in storage.
Current behaviour
We are using DuraCloud for storage and are storing AIPs as compressed 7z packages using bzip2. Some re-ingested AIPs are coming out smaller than the original AIPs, which seems understandable since we use the fastest compression level (is a 869.92 MB difference too much to chalk up to the compression level?).
In some cases, the size is dropping to a smaller GB range (e.g. 3.05GB to 2.98GB, as opposed to 3.05GB to 3.01GB). Since DuraCloud stores packages in 1GB chunks, this is leaving us with orphaned chunks of the original AIP in storage instead of replacing the original AIP entirely.
For example:
Original ingest size (2024-03-04): 36155.86 MB (35.3 GB)
number of associated chunks per dura-manifest: 38 (dura-chunk-0000 to dura-chunk-0037)
Reingest size (2024-04-10): 35450.7 MB (34.62 GB)
number of associated chunks per dura-manifest: 37 (dura-chunk-0000 to dura-chunk-0036)
number of associated chunks in storage: 38 (dura-chunk-0000 to dura-chunk-0037)
From the DuraCloud audit log, it looks like the original AIP is being overwritten by the re-ingested AIP so additional chunks from the original AIP are not accounted for.
Steps to reproduce
Note: it may take a few tries to create a package that will see a change in GB (e.g. 3.05GB to 2.98GB, as opposed to 3.05GB to 3.01GB) after re-ingest
Your environment (version of Archivematica, operating system, other relevant details)
For Artefactual use:
Before you close this issue, you must check off the following:
The text was updated successfully, but these errors were encountered: