Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[challenge] Aspergillus terpenoids #8

Open
Adafede opened this issue Nov 25, 2021 · 16 comments
Open

[challenge] Aspergillus terpenoids #8

Adafede opened this issue Nov 25, 2021 · 16 comments
Labels
serendipity you start without knowing where it leads to

Comments

@Adafede
Copy link
Contributor

Adafede commented Nov 25, 2021

Hi again!

Small question in the form of a challenge:

Would yaccl be able to perform a query that allows reproducing the listed compounds in https://doi.org/10.1016/j.phytochem.2021.113011 (without the need of npclassifier or classyfire)?

As starting point those two existing queries might help:
https://w.wiki/4ShY
https://w.wiki/3HMD

Best,

@rwst
Copy link
Owner

rwst commented Nov 25, 2021

In general, yaccl is taxon-agnostic. What we do is take a list of compounds and make sure they are classified correctly. In this case we don't have access to the paper, and the paper is also not in Wikidata. But the following query gets all InChI strings of compounds from Aspergillus species:

SELECT DISTINCT ?inchi
WHERE 
{
  ?item wdt:P703 ?tax.
  ?tax wdt:P171 wd:Q335130.
  ?item wdt:P234 ?inchi.
}

Saving the list of 4,349 compounds in a file aspergillus.txt, we want to check if any are not recognized as natural products. There is no ready option for this but the bash script
for i in `cat aspergillus.txt`; do python3 classify.py -d ./ -j -m "$i"; done
can give a first impression which would be about half of compounds are classified with the current version. The list contains all Aspergillus metabolites, so, if your question is specifically which of the compounds in that paper are recognized as terpenoids we would first need that list. Can you help?

@rwst
Copy link
Owner

rwst commented Nov 25, 2021

I may have misnderstood. Did you mean to extract all terpenoids from the list of 4,349 compounds? That should be doable.

@Adafede
Copy link
Contributor Author

Adafede commented Nov 25, 2021

Sorry, my question was badly formulated. I did not really want to reproduce what is in the article, rather generate the WD+yaccl equivalent. Rather...yes...you were faster

@Adafede
Copy link
Contributor Author

Adafede commented Nov 25, 2021

Here is a slightly adapted query:

https://w.wiki/4T7k

@rwst
Copy link
Owner

rwst commented Nov 25, 2021

I have pushed a new version of the classify script such that JSON output also includes the molecule. This would allow more comfortable processing of the output of my small bash script given above.

@rwst
Copy link
Owner

rwst commented Nov 25, 2021

For the sake of speed there should be better handling when both -j and -t are given. Noted.

@Adafede
Copy link
Contributor Author

Adafede commented Nov 25, 2021

Beautiful. Thanks!

@rwst
Copy link
Owner

rwst commented Nov 25, 2021

Alternatively, if you are satisfied with what is already in WD, going without yaccl should work too:
https://w.wiki/4T9V
But it times out...

@rwst
Copy link
Owner

rwst commented Nov 25, 2021

Pushed the addition of InChI key too...

@Adafede
Copy link
Contributor Author

Adafede commented Nov 25, 2021

https://w.wiki/4T9n
no time out :)

Wooops... forgot to filter:

https://w.wiki/4T9t

@rwst
Copy link
Owner

rwst commented Nov 25, 2021

These are only the subclasses, you need to include P31/P279* in order to get all. Either with UNION or using the pipe symbol.

@Adafede
Copy link
Contributor Author

Adafede commented Nov 25, 2021

Oh, indeed nice catch
https://w.wiki/4T9y

@rwst
Copy link
Owner

rwst commented Nov 25, 2021

I knew because the yaccl run found 360, as well. Now, for the interpretation, the subclasses might contain duplicates where the stereochemistry is unspecified.

@rwst
Copy link
Owner

rwst commented Nov 25, 2021

This one is interesting https://www.wikidata.org/wiki/Q77573987
a cyclic farnesan that was misrecognized as macrolide (not trivial).

@Adafede
Copy link
Contributor Author

Adafede commented Nov 25, 2021

Well...this one is in the end a real challenge! x)

I am not sure a lot of humans would do better

@rwst rwst added the serendipity you start without knowing where it leads to label Nov 25, 2021
@rwst
Copy link
Owner

rwst commented Nov 25, 2021

So, you see, I nearly always add P31/P279 to WD compounds at the same time I add SMARTS to classes. Exceptions: I still need to add P31/P279 for unspecified alkaloids and macrolides in WD.

Having it all in WD simplifies searches as this one. The downside is that as followup the WD entries need to be maintained, e.g. by frequent scanning.

This issue also demands improvements in yaccl/WD integration. I'll leave it open until I think it is resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
serendipity you start without knowing where it leads to
Projects
None yet
Development

No branches or pull requests

2 participants