You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Our goal is to reproduce the ontology pretraining on a protein-related task. For this, we have already implemented a GO dataset (see #36). The next step would be to add a pretraining task to that. This would give us the following alignment:
stage
chemistry
proteins
unsupervised pretraining
mask pretraining (ELECTRA)
mask pretraining (ESM2, optional)
ontology pretraining
ChEBI
SCOPe
finetuning task
Toxicity, Solubility, ...
GO (MF, BP, CC branches)
SCOPe is a good fit since it is mostly structure-based (unlike GO, which has more complex functional classes). It also has a manageable size (~140,000 entries, similar to ChEBI).
Goal
Add a SCOPe dataset to our pipeline. The data should be processed so that it can be used in the same way as, e.g., the GO data (just with different labels).
Our goal is to reproduce the ontology pretraining on a protein-related task. For this, we have already implemented a GO dataset (see #36). The next step would be to add a pretraining task to that. This would give us the following alignment:
SCOPe is a good fit since it is mostly structure-based (unlike GO, which has more complex functional classes). It also has a manageable size (~140,000 entries, similar to ChEBI).
Goal
Add a SCOPe dataset to our pipeline. The data should be processed so that it can be used in the same way as, e.g., the GO data (just with different labels).
Links
The text was updated successfully, but these errors were encountered: