A Python library and script collection for identifying, quantifying and comparing interrelations between arbitrary boolean features (e.g. presence of structural motifs within a molecule, the molecule exhibiting a specific type of biological activity) from their co-occurrences in feature vectors (e.g. features of individual chemical structures) within a given feature vector set (e.g. a chemical database or its subset).
The rationale behind feature interrelation profiling and its example application is further described in Profiling and analysis of chemical compounds using pointwise mutual information.
- Pandas: needed for core functionality as well as any subsequent interrelation profile analysis
- RDKit for all chemistry-related functionality
- Recommended:
- Jupyter notebook for interactive work
- Seaborn for visualization
- NetworkX for interrelation network representation
- The Sphinx documentation is available at the project
GitHub page.
It can also be generated by
make html
command in thedocs/source
folder. - A Jupyter notebook with example use is also available.
- This library is Python-only, so just installing the core dependencies into the
environment, cloning this repository and adding it to the
PYTHONPATH
should work fine. - Building a co-occurrence profile is then simply a matter of:
>>> from fip.profiles import CooccurrenceProfile
# Some dummy feature sets
>>> FEATURE_TUPLES = (('a', 'b', 'c', 'd'), ('a', 'b', 'x'), ('c', 'd'))
# Create a co-occurrence profile instance
>>> p = CooccurrenceProfile.from_feature_lists(FEATURE_TUPLES)
# Unlike any prior implementations of interrelation profiling, FIP3 is a full rework
# that uses sparse data_mount representation, with lazy, on-demand imputation of missing or
# insignificant pair values. Any explicit interrelations are mapped pairwise
# in a MultiIndex Pandas DataFrame, and can be accessed and handled as such:
>>> p.df
value
feature1 feature2
a a 2
b 2
c 1
d 1
b b 2
c 1
d 1
c c 2
d 2
d d 2
a x 1
b x 1
x x 1
>>> from fip.profiles import CooccurrenceProbabilityProfile
>>> q = CooccurrenceProbabilityProfile.from_cooccurrence_profile(p)
>>> q
<fip.profiles.CooccurrenceProbabilityProfile object at 0x7f3222e05290>
>>> from fip.profiles import PointwiseMutualInformationProfile
>>> r = PointwiseMutualInformationProfile.from_cooccurrence_probability_profile(q)
>>> r
<fip.profiles.PointwiseMutualInformationProfile object at 0x7f3212047210>
>>> r.df
value
feature1 feature2
a a 0.000000
b 0.584963
c -0.415037
d -0.415037
b b 0.000000
c -0.415037
d -0.415037
c c 0.000000
d 0.584963
d d 0.000000
a x 0.584963
b x 0.584963
x x 0.000000
>>> r.select_raw_interrelations_involving('c')
value
feature1 feature2
c d 0.584963
a c -0.415037
b c -0.415037
# Export to explicit matrix DataFrame is also possible, with imputation:
>>> r.to_explicit_matrix()
a b c d x
a 0.0 0.584963 -0.415037 -0.415037 0.584963
b 0.584963 0.0 -0.415037 -0.415037 0.584963
c -0.415037 -0.415037 0.0 0.584963 -1.754888
d -0.415037 -0.415037 0.584963 0.0 -1.754888
x 0.584963 0.584963 -1.754888 -1.754888 0.0
# much more in documentation :)
Available through the MIT License.
Supported by Junior Internal Grant of the UCT Prague (2021, #2103)