Skip to content

data submission workflow

Brinda Vallat edited this page Oct 22, 2024 · 37 revisions

Detail of what scripts need to be called and what tables each script need to update.

mmCIF file processing (Step 2)

Installation

Installation Instruction for python-ihm

  1. pip install ihm

Installation Instruction for biopython (no longer needed)

  1. pip install biopython

Installation Instruction for mmcif

  1. yum install cmake
  2. pip install mmcif

Installation Instruction for rcsb utils

  1. pip install rcsb.utils.io
  2. pip install rcsb.utils.chemref
  3. pip install rcsb.utils.ec
  4. pip install rcsb.utils.seq
  5. pip install rcsb.utils.struct
  6. pip install rcsb.utils.taxonomy
  7. pip install rcsb.utils.multiproc
  8. pip install rcsb.utils.validation
  9. pip install rcsb.utils.config

Note: The above packages can be installed from PyPi.

In addition, the latest version of py-rcsb_db.tar.gz file which contains the rcsb/db directory already configured to run the required scripts is available on the salilab server (managed by Arthur).

Installation Instruction for CifCheck

  • Following the instruction from RCSB Software Tools (only the first two steps for download and build) OR
  • Install from source through git clone --recurse-submodules https://github.com/rcsb/cpp-dict-pack.git then
  cd cpp-dict-pack
  mkdir build
  cd build
  cmake .. -DMINIMAL_DICTS=ON
  make
  # This processing will generate a bin folder under build 

Installation Instruction for IHMValidation

The deployment of IHMValidation pipeline requires several actions:

  1. Download pre-built binary image with 3rd party dependencies
  2. Pull IHMValidation code from github repo
  3. Create a neccesary directory structure

The exact commands are available in IHMValidation deployment script and were already incorporated in the dev and prod deployment scripts.

Workflow detail

  1. Convert partial mmCIF (user uploaded file) to mmCIF using python-ihm:
# From the scripts/make-mmCIF directory run:

python3 make-mmcif.py input.cif

Note: This package is used for converting mmCIF that can be converted to JSON and loaded into ermRest. This is not used to create mmCIF in the submission workflow.

Requirements for this step:

  • Biopython
  • make-mmcif.py (provided by Brinda)
  • Input CIF file (e.g., input.cif) uploaded by user
  1. Copy output.cif from the previous step to py-rcsb_db/rcsb/db/tests-validate/test-output/ihm-files
  2. Convert mmCIF to JSON using py-rcsb_db:
# From the scripts/make-json/py-rcsb_db directory run:

python3 rcsb/db/tests-validate/testSchemaDataPrepValidate-ihm.py

Note: Output JSON files in rcsb/db/tests-validate/test-output

Requirements for this step:

  • Brinda will provide the followings files that need to be properly installed:
    • a python script i.e. rcsb/db/tests-validate/testSchemaDataPrepValidate-ihm.py
    • a yml file i.e., rcsb/db/config/exdb-config-example-ihm.yml
    • a json file i.e., CACHE/data_type_and_coverage/scan-ihm_dev-type-map.json
    • IHM dictionary file i.e., ihm-extension.dic in CACHE/dictionaries
  1. Use JSON file to populate tables
  • struct (editable)
  • entity (editable)
  • entity_poly (not editable)
  • entity_poly_seq (not editable)
  • pdbx_poly_seq_scheme (not editable)
  • chem_comp (not editable)
  • atom_type (not editable)
  • struct_asym (not editable)
  • ihm_entity_poly_segment (editable)
  • ihm_struct_assembly (editable)
  • ihm_struct_assembly_details (editable)
  • ihm_model_representation (editable)
  • ihm_model_representation_details (editable)
  • ihm_modeling_protocol (editable)
  • ihm_model_list (not editable)
  • ihm_model_group (editable)
  • ihm_model_group_link (editable)

Upload File processing (Step 4)

  • Check out a file from Entry_Related_File table that hasn't been processed.
  • Retrieves the file from hatrac.
  • Populates the file's corresponding table (using the File_Type) with the file content. Make sure that a foreign key for each individual row to the Entry_Related_File is added.

Export entry into mmCIF File (export)

mmCIF validator

  • Get mmCIF dictionary software suite from RCSB software tools website.
  • Follow steps 1 and 2 in the instructions for installation.
  • The serialized sdb file (mmcif_ihm_vx.xx.sdb) can be obtained from the IHM-dictionary Git repository. Brinda will provide the version that needs to be used, since the Deriva data model is a few versions behind the current dictionary version.
  • Execute command for validating mmCIF file (step 4): ./bin/CifCheck -f mmCIF_filename -dictSdb sdb_filename

Generate validation report

To generate a validation report run the following command as the pdbihm user from /mnt/vdb1/pdbihm folder:

singularity exec --pid --bind IHMValidation/:/opt/IHMValidation,input:/ihmv/input,output:/ihmv/output,cache:/ihmv/cache ihmv_20231222.sif /opt/IHMValidation/ihm_validation/ihm_validator.py --output-root /ihmv/output --cache-root /ihmv/cache --force -f input/mmCIF_filename