This repository contains a Python library to download and process the SSH Open Marketplace dataset, and a set of notebooks providing examples and use cases to use this library.
The libary has been designed to be used by the SSH Open Marketplace Editorial Team and provides a set of ad hoc functions that can be used in Python Notebooks or programs. The various notebooks included in this repository allow any user to gain an overview of the SSH Open Marketplace (notebook 2) and authenticated users to write back to the SSH Open Marketplace specific curation information. See the SSH Open Marketplace user documentation for more details.
To use the library functionalities:
A - Create an instance of mplib.MPData, and load locally the MP data. The function:
downloads MP dataset and store it locally. The data is provided as a Data Frame i.e. data is organized in a tabular fashion and columns are labeled with the names of the attribites in MP datamodel.
Example:
from sshmarketplacelib import MPData as mpd
mpdata = mpd()
ts_df=mpdata.getMPItems ("pubblications", True)
the data is returned as a Data Frame:
id | category | label | persistentId | lastInfoUpdate | status | description | contributors | properties | externalIds | |
---|---|---|---|---|---|---|---|---|---|---|
10414 | publication | 3D-ICONS -- 3D Digitisation of Icons of Europe... | jOum8c | 2021-06-23T17:03:55+0000 | approved | 3D-ICONS was a pilot project funded under the ... | [] | [{'id': 41261, 'type': {'code': 'language', 'l... | [] | |
7454 | publication | 4 Default Text Structure - The TEI Guidelines | Y3Vmhy | 2021-06-22T13:30:43+0000 | approved | No description provided. | [] | [{'id': 41094, 'type': {'code': 'language', 'l... | [] | |
10738 | publication | 9 Dictionaries - The TEI Guidelines | vQ7Bvs | 2021-06-23T17:04:34+0000 | approved | No description provided. | [] | [{'id': 41163, 'type': {'code': 'language', 'l... | [] |
B - Use the helper functions to analyse the Market Place data, for example the function below returns the number of null values for all propertes in each item category:
Example:
from sshmarketplacelib import helper as hlpr
utils = hlpr.Util()
nv_df=utils.getNullValues()
Returns:
category property: missed values | dataset | publication | tool-or-service | training-material | workflow |
---|---|---|---|---|---|
accessibleAt | 1 | 7 | 475 | 14 | 1 |
composedOf | 305 | 137 | 1671 | 321 | 0 |
concept.candidate | 46 | 5 | 157 | 0 | 0 |
... | ... | ... | ... | ... | ... |
The notebook LibTest.ipynb shows how to use the Library in a notebook.
The complete documentation is being created... [TBD]
It is recommended to install library in a virtual environment to avoid dependency clash. To install the library enter cloned directory and install it via pip with explicit requirements.txt from the project:
- Clone the repository, enter the directory and install requirements:
git clone https://github.com/SSHOC/marketplace-curation.git
cd marketplace-curation
pip install ./ -r ./requirements.txt
-
Edit the config.yaml.template file and set the values, then rename the file as config.yaml
-
Create a folder called 'data' in the same folder of your notebooks/programs