This library enables the packaging of files and metadata from StudentWeb into Bags and the unpackaging of these Bags for use in Duo-DSpace.
The Bags have the following structure:
<base directory>/
| bagit.txt
| manifest-md5.txt
| tagmanifest-md5.txt
\--- data/
\--- final
| [final version files]
\--- supporting
| [supporting files]
\--- licence/
| licence.txt
\--- metadata/
| metadata.xml
\--- tagfiles/
| supporting.access.txt
| formats.txt
| final.sequence.txt
| supporting.sequence.txt
More information is available in the technical overview of the StudentWeb integration available at: https://docs.google.com/a/cottagelabs.com/document/d/17Iiswcz_LkSMgdhEZIesV1BTrijRNsPWAyH29-L1rQg/edit
This library has no significant external dependencies, so it can be cloned from the repository and compiled straight away as follows:
git clone https://github.com/nye-duo/BagItLibrary.git
then in the resulting directory:
mvn clean package
to install into your local maven repository just use:
mvn install
###Construction
A BagIt object must be constructed with the path to a zip file; this path may point to an existing file from which the library should construct an object, or it may point to a non-existent file, in which case the BagIt will be constructed by-reference and then ultimately serialised to that location.
BagIt bag = new BagIt("/path/to/bag.zip");
Note that you cannot read a BagIt file from an existing zip file and then re-write it back to that same file (due to the low-memory approach the library uses to read and write files). Attempting to do so will result in an exception.
###Adding Files
The library supports the semantics required by StudentWeb and DUO for the structure of the BagIt objects created. As such there are a number of methods which can be used to add content files to the Bag, so that they appear in the correct place in the package:
addFinalFile
addSupportingFile
addMetadataFile
addLicenceFile
Each of these maps on to the appropriate part of the package structure as defined above. For example, addFinalFile will add files to the data/final directory.
####addFinalFile
Add a File object to the BagIt in the specified sequence position as a final file
public void addFinalFile(File file, int sequence)
public void addFinalFile(File file)
If no sequence is supplied, then the file will be appended to the list of existing files with an automatically assigned sequence number.
When the BagIt is eventually serialised, the sequences for these files will be written to
tagfiles/final.sequence.txt
####addSupportingFile
Add a File object to the BagIt in the specified sequence position as a supporting file, with the specified level of access.
public void addSupportingFile(File file, int sequence, String access)
public void addSupportingFile(File file, String access)
If no sequence is supplied, then the file will be appended to the list of existing files with an automatically assigned sequence number.
The access is a free string, but SHOULD be either "open" or "closed" for suitable interpretation by the DUO implementation.
When the BagIt is eventually serialised, the sequences for these files will be written to
tagfiles/supporting.sequence.txt
####addMetadataFile
Add a File object to the BagIt which will represent the metadata for the packaged item.
public void addMetadataFile(File file)
When the BagIt is serialised, this file will be written to
data/metadata/metadata.xml
You may instead wish to add metadata using the Metadata object - see below.
####addLicenceFile
Add a File object to the BagIt which will represent the licence for the packaged item
public void addLicenceFile(File file)
When the BagIt is serialised, this file will be written to
data/licence/licence.txt
###Adding Metadata Explicitly
Metadata can be added by giving the BagIt object a metadata file as described above, using
public void addMetadataFile(File file)
But you may also add metadata using the Metadata object:
public void addMetadata(Metadata metadata)
This will have the same overall effect as adding a metadata file, but guarantees that the file format is in-line with the BagIt specification for the StudentWeb/DUO integration. See the section below on Metadata for more details.
###Serialising
Once you have constructed a BagIt object which represents the content you wish to pack, it can be serialised. The serialisation takes place at the file location provided in the constructor, and will always output a ZIP file
// if /path/to/bag.zip does not exist, the Bag will only exist in-memory
BagIt bag = new BagIt("/path/to/bag.zip");
// add all the content to the Bag. This happens in-memory
bag.addFinalFile(finalFile)
bag.addSupportingFile(supportingFile)
bag.addMetadata(metadata)
bag.addLicence(licence)
// write the contents of the Bag to /path/to/bag.zip
bag.writeToFile()
In the process of writing the file, all the relevant tagfiles and manifests will be automatically created.
###Reading from the Bag
When extracting content from the bag, a new BagIt object should be constructed over an existing ZIP file. Once this has been done we can use the following methods to read the data from the bag
getSequencedFinals
getSequencedSecondaries
getMetadataFile
getLicenceFile
####getSequencedFinals
This returns a TreeMap (which is intrinsically ordered by the natural ordering of the key object) of the final files in the BagIt.
public TreeMap<Integer, BaggedItem> getSequencedFinals()
The value of the map at any sequence point is a BaggedItem which is a simple object which wraps the filename, the format, the sequence number and an input stream for retrieving the content. The key in the TreeMap is the sequence number of the item, so they can easily be processed in intended sequence order by the caller.
####getSequencedSecondaries
This returns a TreeMap (which is intrinsically ordered by the natural ordering of the key object) of the final files in the BagIt. The accessRights argument filters the returned BaggedItems by whether they support that access condition (the values of the accessRights argument SHOULD be "open" or "closed").
public TreeMap<Integer, BaggedItem> getSequencedSecondaries(String accessRights)
The value of the map at any sequence point is a BaggedItem which is a simple object which wraps the filename, the format, the sequence number and an input stream for retrieving the content. The key in the TreeMap is the sequence number of the item, so they can easily be processed in intended sequence order by the caller.
####getMetadataFile
This returns a BaggedItem representation of the Metadata file in
data/metadata/metadata.xml
A BaggedItem is a simple object which wraps the filename, the format, the sequence number and an input stream for retrieving the content.
####getLicenceFile
This returns a BaggedItem representation of the Licence file in
data/licence/licence.txt
A BaggedItem is a simple object which wraps the filename, the format, the sequence number and an input stream for retrieving the content.
###Other operations
The BagIt object has a number of other operations (see the Javadoc) which are not documented here. They are mostly used internally or by the StudentWeb library to integrate the BagIt with SWORDv2.
###Introduction
The metadata format for use in the BagIt package when transferring content from StudentWeb to DUO is defined in the functional overview https://docs.google.com/a/cottagelabs.com/document/d/17Iiswcz_LkSMgdhEZIesV1BTrijRNsPWAyH29-L1rQg/edit
The Metadata object in this library provides an interface to create and read documents which meet this specification to ensure that they are correctly constructed for the package.
###Construction
A blank metadata object can be constructed thus:
Metadata metadata = new Metadata();
If you have a pre-existing XOM XML Element, the metadata object can be initialised around it thus:
Metadata metadata = new Metadata(xomElement);
###Static Constants
The metadata object provides a list of static constants which represent the metadata fields supported by the StudentWeb/DUO schema. These are:
Metadata.NAME
Metadata.GIVEN_NAME
Metadata.FAMILY_NAME
Metadata.STUDENT_NUMBER
Metadata.UID
Metadata.FOEDSELSNUMMER
Metadata.POSTAL_ADDRESS
Metadata.EMAIL
Metadata.TELEPHONE_NUMBER
Metadata.SUBJECT
Metadata.SUBJECT_CODE
Metadata.SUBJECT_TITLE
Metadata.UNITCODE
Metadata.UNIT_NAME
Metadata.EMBARGO_TYPE
Metadata.EMBARGO_END_DATE
Metadata.GRADE
Metadata.TITLE
Metadata.LANGUAGE
Metadata.ABSTRACT
Metadata.TYPE
These MUST be used when setting metadata fields, as the Metadata object will be able to correctly interpret the field names and namespaces, and ensure that the resulting document is valid.
###Adding Metadata
There are 3 methods for adding metadata
addField
addSubject
setEmbargo
"addField" is a generic method which should be used in all cases except when setting the subject or the embargo metadata, in which case the relevant one of the other two methods should be used.
####addField
Add a field to the metadata document with the given value (and optionally, language).
public void addField(String fieldName, String value)
public void addField(String fieldName, String value, String language)
For example:
Metadata metadata = new Metadata();
metadata.addField(Metadata.TITLE, "The title")
metadata.addField(Metadata.TITLE, "Titelen", "nob");
####addSubject
Add a subject code and its associated title to the metadata document
public void addSubject(String code, String title)
For example
Metadata metadata = new Metadata();
metadata.addSubject("A001", "Computing");
metadata.addSubject("B023", "Social Sciences");
####setEmbargo
Set the embargo metadata in the document
public void setEmbargo(String type, Date date)
For example:
Metadata metadata = new Metadata();
metadata.setEmbargo("restricted", new Date());
###Reading Metadata
There are 2 methods for retrieving metadata from the object
getField
getSubjects
Note that there is no getEmbargo method, as this can be achieved by
String end = metadata.getField(Metadata.EMBARGO_END_DATE).get(0);
String type = metadata.getField(Metadata.EMBARGO_TYPE).get(0);
####getField
Get the list of values which are in the metadata document for the requested field
public List<String> getField(String fieldName)
public List<String> getField(String fieldName, String language)
If a language is supplied, only fields which have the xml:lang property set to that language will be returned.
For example:
Metadata metadata = new Metadata(xomElement);
List<String> titles = metadata.getField(Metadata.TITLE);
List<String> nobTitles = metadata.getField(Metadata.TITLE, "nob");
####getSubjects
Get a list of 2 element String arrays which represent the subject code and subject title for each subject in the metadata document.
public List<String[]> getSubjects()
The List is a list of all of the subjects, and each String[] is a 2 element array where the first element is the code and the second element is the title.
For example:
Metadata metadata = new Metadata(xomElement);
List<String[]> subjects = metadata.getSubjects();
for (String[] codeTitle : subjects)
{
String code = codeTitle[0];
String title = codeTitle[1];
}
###Serialisation
To serialise the metadata document, just call toXML():
Metadata metadata = new Metadata();
metadata.addField(Metadata.TITLE, "The title")
String xml = metadata.toXML();
If you wish to access the inner XOM Element to serialise in your own way, you can just use
Element element = metadata.getElement();