Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build a 3-star ChEBI dataset #61

Open
sfluegel05 opened this issue Oct 18, 2024 · 0 comments
Open

Build a 3-star ChEBI dataset #61

sfluegel05 opened this issue Oct 18, 2024 · 0 comments
Assignees

Comments

@sfluegel05
Copy link
Collaborator

sfluegel05 commented Oct 18, 2024

Status

Currently, we use all of chebi in our dataset. However, not all ChEBI data is equal. ChEBI distinguishes between 2-star and 3-star entities (see ChEBI user manual). 3-star entities are manually added by the ChEBI team, while other entities have been added by external parties.

Goal

The goal is to investigate which effect using only 3-star data has on the classification task. The hypothesis is that, for some classes, the 2-star entities are not classified correctly or completely. For example, tripeptide has 220 subclasses, 195 of which are 3-star. But there are about 8,000 peptides that should be classified as tripeptide, but are not. Most (if not all) of them are 2-star.

Task

Create a chebi dataset that selects only 3-star classes. Selecting the classes should be rather simple. The complicated part are the relations. Since we don't know which relations are 2-star (or if the relation of two 3-star classes can be trusted if they are held together by a 2-star class), we need to make compromises. The easiest solution would be to treat all relations as 3-star.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants