A Bash script to download an Archivematica normalization report in CSV format.
If you normalize your files for preservation and/or access, it is good practice to click on the report icon next to the Approve Normalization
job to review the normalization report. This report opens in a new tab and presents 10 entries at a time, summarizing the normalization attempts and outcomes, among other format-based information. If you are processing large transfers, it may be challenging to review files 10 at a time.
There are only two commands that need to be installed:
curl
xmllint
./report_downloader.sh ${REPORT_URL} ${NUMBER_PAGES} csrftoken=${CSRF_TOKEN} sessionid=${SESSION_ID} > report.csv
-
REPORT_URL
: the URL for the first page of the normalization report. -
NUMBER_PAGES
: the total number of pages in the report. -
CSRF_TOKEN
: a token stored in your browser's cookies that prevents unauthorized commands from being sent by trusted users. For security, treat this cookie like a password: it should not be shared. -
SESSION_ID
: a unique identifier given to indicate a user's web session. For security, treat this cookie like a password: it should not be shared.
Note: If you log out or are logged out of Archivematica, you will need to retrieve a new CSRF token and session ID.
After the script is done, you will find that a file named report.csv
was created in the directory where you ran the
script. You can change the name of the file by editing the name at the end of the command.
If running the command returns a "Permission denied" error, you may need to first run chmod +x report_downloader.sh
to grant permission to execute the script.
The URL must end in /
and would be structured as: https://your-Archivematica-instance.scholarsportal.info/ingest/normalization-report/uuid-of-the-package-in-the-Ingest-tab/
.
To calculate, divide the total number of items in the report by 10
. If there is a remainder, round up
(e.g., 1143/10 = 114.3
, then total pages = 115
).
Note: (1) there is currently an upper limit of 10,000,000
for the page count, (2) if no page count is given, the
script will only return the last page of results.
- Log into Archivematica
- In that tab, open the Developer Console for your browser and navigate to the Storage tab (example, below)
- Select the Archivematica entry under
Cookies
- Copy the values for
csrftoken
andsessionid
- Download the script and note the path to the folder where the script is stored.
- Gather your variable information.
- Open your preferred CLI tool.
- Navigate to the folder where the script is stored:
cd /path/to/folder
- Call the script using your gathered variables:
./report_downloader.sh ${REPORT_URL} ${NUMBER_OF_PAGES} csrftoken=${CSRF_TOKEN} sessionid=${SESSION_ID} > report.csv
- Locate
report.csv
in the folder where the script is stored and review!