-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extension of ModelCIF for AF3 quality estimates #21
Comments
Notes from discussions with @benmwebb , @brindakv and @aozalevsky (on Oct. 16):
Example AF3 output (cut to only include one model instead of 5): fold_test_fold_job_number_one_cut.zip. Info on content:
Suggested ModelCIF extension:
|
@gtauriello, I just wanted to follow up on this. With AF3 code and weights being released and with the recent addition of restraints to Chai-1, we can expect rapid growth in the number of deposited models. Would be nice to have the scores in those models. |
I agree. @brindakv was waiting for me to decide on a separate issue that we wanted to address in the same ModelCIF update and now I added that here as issue #23 . Hence, I think that she can now do the updates according to the open issues here. Afterwards, we can try to suggest changes in alphafold3/model/mmcif_metadata.py to include this (and check if other things are invalid in their files). |
@gtauriello please clarify my questions below.
|
The main use case for it is to be able to handle pairs between an atom and a residue in
This would make the main existing use case in AF3 more verbose than necessary (we need a feature for each polymer residue to handle the PAE matrix) while I currently do not have a use case for contiguous residue ranges. If we need those ranges in the future, I would prefer to have them in a separate table.
The default ranking score in AF3 is calculated as |
Thanks for clarifying @gtauriello.
Should the enumeration for Never mind. Boolean is good. |
@gtauriello I suggest we add enumerations to It can be generic ( |
For For |
Thanks @gtauriello. Updates have been committed, please see #25. |
Related to #20 and the issues mentioned in there, I would suggest to extend ModelCIF to capture all new types of quality estimates introduced with AlphaFold 3 (AF3). I also had a look at RoseTTAFold-AllAtom and the suggestions below would also capture anything needed there. I also believe that this should cover anything needed for chaidiscovery/chai-lab#52. Here is my suggested additions:
_ma_qa_metric.type
to include:_ma_qa_metric.mode
to include "per-chain", "per-chain-pairwise", "per-atom" and "per-atom-pairwise" (and yes I know it's a bit unfortunate that we used "local" for "per-residue" but ok...)_ma_qa_metric_per_chain
same as_ma_qa_metric_local
but withoutlabel_comp_id
andlabel_seq_id
_ma_qa_metric_per_chain_pairwise
same as_ma_qa_metric_local_pairwise
but withoutlabel_comp_id*
andlabel_seq_id*
_ma_qa_metric_per_atom
same as_ma_qa_metric_local
but using atom_id (linked to_atom_site.id
) instead ofmodel_id
andlabel_*
_ma_qa_metric_per_atom_pairwise
same as_ma_qa_metric_local_pairwise
but but using atom_id_1 and atom_id_2 (linked to_atom_site.id
) instead ofmodel_id
andlabel_*
Concretely for AF3 output (e.g. looking at the JSON files in one of their examples) here is how each of the scores would map to a
_ma_qa_metric.mode
and.type
:fraction_disordered
: "global", "normalized score"has_clash
: "global", "boolean"iptm
: "global", "ipTM"ptm
: "global", "pTM"ranking_score
: "global", "normalized score"chain_ptm
: "per-chain", "pTM"chain_iptm
: "per-chain", "ipTM"chain_pair_iptm
: "per-chain-pairwise", "ipTM"chain_pair_pae_min
: "per-chain-pairwise", "PAE"atom_plddts
: "per-atom", "pLDDT to polymer"contact_probs
: "per-atom-pairwise", "contact probability"pae
: "per-atom-pairwise", "PAE"Some caveats to consider:
contact_probs
andpae
above are defined per "token" pair, where a token is either a full residue (for standard amino and nucleic acids) or a single atom otherwise. In AF3, the per-residue tokens have a well defined "token centre atom" (CA for standard amino acids, C1' for standard nucleotides) which could be used in per-atom scores but this may be confusing.label_asym_id
and do not have alabel_seq_id
and one could also change that to giving them separatelabel_asym_id
in ModelCIF to fix this.Alternative to the above (which simplifies some things and handles the per token scores):
_ma_qa_metric_local
and_ma_qa_metric_local_pairwise
to includelabel_atom_id
(linked to_atom_site.label_atom_id
) which can be set to '.' for per-residue scores.label_comp_id
andlabel_seq_id
to be set to '.'._ma_qa_metric_local
and_ma_qa_metric_local_pairwise
tables and no additional tables or_ma_qa_metric.mode
values would be necessary.@brindakv what are your thoughts on this?
The text was updated successfully, but these errors were encountered: