Improve ma_qa_metric_local_pairwise description #19

aozalevsky · 2024-09-24T20:55:29Z

Right now ma_qa_metric_local_pairwise doesn't have any details about how complete the data should be. For instance, some metrics are supposed to have symmetric square matrices. Thus, only an upper triangular matrix should be enough. I guess the initial definition was intentionally generic, but maybe we can extend the category description with some case-specific (like PAE) details. To give some guidance for software developers.

This came up as a part of the discussion in chaidiscovery/chai-lab#52

The text was updated successfully, but these errors were encountered:

gtauriello · 2024-09-26T17:29:25Z

Given that mmCIF generally does not guarantee that data is provided for each atom, each residue, each residue-pair or anything else, I am not sure how one would stress this in the description here.

It really depends on the model itself on whether the information for all pairs is provided or not. On the other hand, the description could benefit from a comment that the whole category can be extracted into a separate file. Here is a possible addition to the description:

In cases where the metric is symmetric, it is enough to store just one value per pair. For asymmetric metrics, the order of residues is expected to be meaningful (e.g. PAE where PAE_ij is defined by aligning residue i (label_*_1) and measuring the error on residue j (label_*_2)). In all cases, it is perfectly valid to only provide values for a subset of residue pairs.
Data in this category is expected to be very large and can hence be extracted into a separate file which is linked to the main file using the categories ma_associated_archive_file_details or ma_entry_associated_files with file_content set to "local pairwise QA scores".

Would this work?

aozalevsky · 2024-09-26T19:18:00Z

I think it's good! Also, as @benmwebb pointed out, to properly read an external file, its content has to be concatenated with the main file. But it ought to be more complicated than that because the external file (at least in the ma-dm-hisrep-003 example you mentioned) has an additional header

data_ma-dm-hisrep-003
_entry.id ma-dm-hisrep-003
_entry.ma_collection_id ma-dm-hisrep

which causes the following error:

     84 def _check_residue(r):
     85     """Make sure that a residue is not out of range of its Entity"""
---> 86     if r.seq_id > len(r.entity.sequence) or r.seq_id < 1:
     87         raise IndexError("Residue %d out of range for %s (1-%d)"
     88                          % (r.seq_id, r.entity, len(r.entity.sequence)))

AttributeError: 'NoneType' object has no attribute 'sequence'

after deleting the duplicated line

data_ma-dm-hisrep-003

i was able to parse concatenated file. I wonder if it's possible to make the process slightly more user-friendly and cover it in the input section of the python-modelcif docs.

benmwebb · 2024-09-26T19:30:41Z

Also, as @benmwebb pointed out, to properly read an external file, its content has to be concatenated with the main file.

You can't just glue the two files together, because the external file might be BinaryCIF for example, not mmCIF. By "concatenation" I meant that logically the two files work on the same data model; IDs in one file can refer to the other.

the external file (at least in the ma-dm-hisrep-003 example you mentioned) has an additional header
data_ma-dm-hisrep-003
_entry.id ma-dm-hisrep-003
_entry.ma_collection_id ma-dm-hisrep
which causes the following error:

Right, the Python library assumes that a new data block corresponds to a new System object, so it'll get confused by the IDs there (e.g. any entity IDs will point to empty entities since they are defined in a different system). One simple fix would be to assume that if the names of the two data blocks are the same, it is the same system.

gtauriello · 2024-09-27T09:18:22Z

If I remember correctly, the idea was that the extra file for local pairwise QA scores should by itself be a valid mmCIF file (i.e. include a data block and all parent data items). That's why there is a bit of redundancy between the files.

In terms of reading a main ModelCIF file together with an accompanying file in python-modelcif, this may better be handled in ihmwg/python-modelcif#10 ?

brindakv · 2024-11-28T19:28:24Z

Addressed in #25.

This was referenced Sep 24, 2024

Making CIF metrics usable chaidiscovery/chai-lab#52

Open

Improving ma_qa_metric_local_pairwise description ihmwg/python-modelcif#39

Closed

brindakv self-assigned this Nov 27, 2024

brindakv mentioned this issue Nov 28, 2024

Updates addressing multiple issues #25

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve ma_qa_metric_local_pairwise description #19

Improve ma_qa_metric_local_pairwise description #19

aozalevsky commented Sep 24, 2024

gtauriello commented Sep 26, 2024

aozalevsky commented Sep 26, 2024

benmwebb commented Sep 26, 2024

gtauriello commented Sep 27, 2024

brindakv commented Nov 28, 2024

Improve ma_qa_metric_local_pairwise description #19

Improve ma_qa_metric_local_pairwise description #19

Comments

aozalevsky commented Sep 24, 2024

gtauriello commented Sep 26, 2024

aozalevsky commented Sep 26, 2024

benmwebb commented Sep 26, 2024

gtauriello commented Sep 27, 2024

brindakv commented Nov 28, 2024