Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Uniprot -> Ensembl ID mapping file #157

Merged
merged 4 commits into from
Nov 20, 2024

Conversation

jaclynbeck-sage
Copy link
Contributor

@jaclynbeck-sage jaclynbeck-sage commented Nov 15, 2024

This PR addresses AG-1389 and creates a notebook in data_analysis/agora that:

  1. Collects all Ensembl IDs used in ADT, except for druggability and what's in the gene_metadata file (because it includes druggability genes)
  2. Queries UniProt for corresponding accession numbers
  3. Saves the results in a 2-column table with columns UniProtKB_accession and RESOURCE_IDENTIFIER, as requested in the JIRA ticket.

The output file can be accessed here on Synapse.
The human-readable notebook can be found here on Github.

@jaclynbeck-sage jaclynbeck-sage requested a review from a team as a code owner November 15, 2024 20:28
Copy link
Member

@beatrizsaldana beatrizsaldana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! I also found that multiple uniprot ids map to a single ensembl id, and that not all ensembl ids have uniprot ids. I appreciated the sanity check!

@jaclynbeck-sage
Copy link
Contributor Author

Looks good to me! I also found that multiple uniprot ids map to a single ensembl id, and that not all ensembl ids have uniprot ids. I appreciated the sanity check!

Yeah I think the multiple <-> multiple relationship should be ok, and at least mult. uniprot -> single Ensembl ID makes sense biologically. Not sure about mult. Ensembl ID -> single Uniprot but I think that has to do with how Ensembl labels things like pseudogenes and predicted genes so, it's probably all ok.

Copy link
Contributor

@JessterB JessterB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@jaclynbeck-sage jaclynbeck-sage merged commit 7014d21 into dev Nov 20, 2024
9 checks passed
@jaclynbeck-sage jaclynbeck-sage deleted the jbeck/AG-1388/uniprot_mapping_file branch November 20, 2024 00:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants