Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Esmfold #114

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
Open

Esmfold #114

wants to merge 14 commits into from

Conversation

a-r-j
Copy link
Contributor

@a-r-j a-r-j commented Dec 8, 2022

Description

Adds support for retrieving predicted structures from ESMFold.

Pull Request Checklist

  • Added a note about the modification or contribution to the ./docs/sources/CHANGELOG.md file (if applicable)
  • Added appropriate unit test functions in the ./biopandas/*/tests directories (if applicable)
  • Modify documentation in the corresponding Jupyter Notebook under biopandas/docs/sources/ (if applicable)
  • Ran PYTHONPATH='.' pytest ./biopandas -sv and make sure that all unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g., PYTHONPATH='.' pytest ./biopandas/classifier/tests/test_stacking_cv_classifier.py -sv)
  • Checked for style issues by running flake8 ./biopandas

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@@ -14,7 +27,8 @@ The CHANGELOG for the current development version is available at

##### New Features

- Added support for [AlphaFolds 200M+ structures](https://www.deepmind.com/blog/alphafold-reveals-the-structure-of-the-protein-universe) via `PandasMmcif().fetch_mmcif(uniprot_id='Q5VSL9', source='alphafold2-v3')` and `PandasPdb().fetch_pdb(uniprot_id='Q5VSL9', source='alphafold2-v3')`.
- Added support for [AlphaFolds 200M+ structures](https://www.deepmind.com/blog/alphafold-reveals-the-structure-of-the-protein-universe) via `PandasMmcif().fetch_mmcif(uniprot_id='Q5VSL9', source='alphafold2-v3')` and `PandasPdb().fetch_pdb(uniprot_id='Q5VSL9', source='alphafold2-v3')`. (Via [Arian Jamasb](https://github.com/a-r-j), PR #[102](https://github.com/rasbt/biopandas/pull/102/files))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome PR! Btw in the changelog here, it should mention the new ESMFold support alongside alphafold, correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we could, but I think the AF2 support has already been released so kept them separate

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, I didn't mean to change the comment but to add the ESMfold note below.

"""Test retrieving a structure from ESMFold."""
sequence = "MTYGLY"
res_ids: Set[str] = {"A:MET:1", "A:THR:2", "A:TYR:3", "A:GLY:4", "A:LEU:5", "A:TYR:6"}
ppdb = PandasPdb().fetch_pdb(sequence=sequence)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this include ESMfold as the source? Just as a contingency for when new versions of ESMfold come out. Or if alternatives come up. (But I suppose right now fetching an AlphaFold structure via sequence is not possible via AlphaFold, right?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, no API for AF2 inference AFAIK

There is a version arg for the ESMFold function but it remains to see exactly how the API will change for subsequent releases so may be okay to leave for the time being.

@rasbt
Copy link
Member

rasbt commented Jan 7, 2023

@a-r-j It looks all good to me! Just added source='esmfold-v1' as a requirement if someone fetches a sequence. Just so that people now what they are fetching it from.

I noticed that the PDB jupyter notebook is corrupted though. Couldn't open it on my computer, and I also see that that it's the same for Gh rendering: https://github.com/rasbt/biopandas/blob/4515e39861a422c10a8aff7677ecd5aacb031508/docs/tutorials/Working_with_PDB_Structures_in_DataFrames.ipynb

Do you have a working version on your computer still? Othewise I will restore it from an earlier commit.

@a-r-j
Copy link
Contributor Author

a-r-j commented Jan 8, 2023

Oh gosh, let me see what I can do. Keeping notebooks under version control is always a nightmare.

@a-r-j
Copy link
Contributor Author

a-r-j commented Jan 8, 2023

Seems I hadn't resolved all the conflicts when merging. Should be fixed now :)

@rasbt
Copy link
Member

rasbt commented Jan 8, 2023

Thanks, I will try to check that out today! I will also take care of the docs. But just if you are curious, there are a few notes here: https://biopandas.github.io/biopandas/CONTRIBUTING/#notes-for-the-developers

PS: I transferred the repo to a new BioPandas org. Looks like everything worked pretty smoothly so far

@rasbt
Copy link
Member

rasbt commented Jan 10, 2023

I think you undid my commits where I added source='esmfold-v1' as a requirement including unit tests 😢

@a-r-j
Copy link
Contributor Author

a-r-j commented Jan 10, 2023

Oh gosh, my bad. Looks like I showed up to amateur hour 😓

I can't find your changes in the commit history - any chance you have a local copy?

@rasbt
Copy link
Member

rasbt commented Jan 10, 2023

No worries, thinks happen. Unfortunately, I destroyed them when I pulled your recent branch ... arg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants