You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I was checking the code behind the PubChem fingerprint generation.
I did some comparisons between fingerprints calculated with your code and those calculated with PyFingerprint which uses the cdk library and noticed some differences.
I noticed that for bits in the range 0-98, smarts are not used and therefore when carbons are counted for example, only aliphatic carbons are considered since the corresponding key is C.
As a result the counting and encoding are incorrect.
The second point concerns the bits in the range 115-231: in this case there are two conditions to be met such as bits 116 and 117 mention ">= 1 saturated or aromatic carbon-only ring size 3 " and ">= 1 saturated or aromatic nitrogen-containing ring size 3" respectively. In this case a cyclopropane ring should be detected by bit 116 but not by bit 117. Instead with your code it is encoded for both bits.
I hope the bugs I reported are corrected otherwise I would be glad to have an explanation of my mistake
Thank you for your helpfulness
Salvatore
The text was updated successfully, but these errors were encountered:
I did some comparisons between fingerprints calculated with your code and those calculated with PyFingerprint which uses the cdk library and noticed some differences.
If this is your observation, consider to provide the data used to perform the
test to attempt a replication of your findings. Then, the output by pubchempy
and pyfingerprint are easier to compare with each other (e.g., a diff view of
the corresponding logs) to resolve discrepancies and correct errors.
If rising an issue in GitHub, you may substantiate your findings by attaching
a file; to get familiar with this option, hoover the mouse at the lower rim of
the frame of the input mask. This may be a text file, a log, or e.g., a python
script -- as long as it gets the file extension .txt, GitHub will permit it.
Especially if it is a larger file (e.g., a .sdf container-like file about many
molecular structures), or a collection of files, an often useful alternative
is a .zip archive. Out of courtesy, include a brief descriptive readme (what
setup was used [OS, which version of Python, pubchempy and pyfingerprint
engaged, etc), too.
Hi,
I was checking the code behind the PubChem fingerprint generation.
I did some comparisons between fingerprints calculated with your code and those calculated with PyFingerprint which uses the cdk library and noticed some differences.
I noticed that for bits in the range 0-98, smarts are not used and therefore when carbons are counted for example, only aliphatic carbons are considered since the corresponding key is C.
As a result the counting and encoding are incorrect.
The second point concerns the bits in the range 115-231: in this case there are two conditions to be met such as bits 116 and 117 mention ">= 1 saturated or aromatic carbon-only ring size 3 " and ">= 1 saturated or aromatic nitrogen-containing ring size 3" respectively. In this case a cyclopropane ring should be detected by bit 116 but not by bit 117. Instead with your code it is encoded for both bits.
I hope the bugs I reported are corrected otherwise I would be glad to have an explanation of my mistake
Thank you for your helpfulness
Salvatore
The text was updated successfully, but these errors were encountered: