FIP3 - feature interrelation profiling

A Python library and script collection for identifying, quantifying and comparing interrelations between arbitrary boolean features (e.g. presence of structural motifs within a molecule, the molecule exhibiting a specific type of biological activity) from their co-occurrences in feature vectors (e.g. features of individual chemical structures) within a given feature vector set (e.g. a chemical database or its subset).

The rationale behind feature interrelation profiling and its example application is further described in Profiling and analysis of chemical compounds using pointwise mutual information.

Dependencies

Pandas: needed for core functionality as well as any subsequent interrelation profile analysis
RDKit for all chemistry-related functionality
Recommended:
- Jupyter notebook for interactive work
- Seaborn for visualization
- NetworkX for interrelation network representation

Getting started

The Sphinx documentation is available at the project GitHub page. It can also be generated by make html command in the docs/source folder.
A Jupyter notebook with example use is also available.
This library is Python-only, so just installing the core dependencies into the environment, cloning this repository and adding it to the PYTHONPATH should work fine.
Building a co-occurrence profile is then simply a matter of:

>>> from fip.profiles import CooccurrenceProfile

# Some dummy feature sets
>>> FEATURE_TUPLES = (('a', 'b', 'c', 'd'), ('a', 'b', 'x'), ('c', 'd'))

# Create a co-occurrence profile instance
>>> p = CooccurrenceProfile.from_feature_lists(FEATURE_TUPLES)

# Unlike any prior implementations of interrelation profiling, FIP3 is a full rework 
# that uses sparse data_mount representation, with lazy, on-demand imputation of missing or
# insignificant pair values. Any explicit interrelations are mapped pairwise
# in a MultiIndex Pandas DataFrame, and can be accessed and handled as such:
>>> p.df
                   value
feature1 feature2       
a        a             2
         b             2
         c             1
         d             1
b        b             2
         c             1
         d             1
c        c             2
         d             2
d        d             2
a        x             1
b        x             1
x        x             1

>>> from fip.profiles import CooccurrenceProbabilityProfile
>>> q = CooccurrenceProbabilityProfile.from_cooccurrence_profile(p)
>>> q
<fip.profiles.CooccurrenceProbabilityProfile object at 0x7f3222e05290>

>>> from fip.profiles import PointwiseMutualInformationProfile
>>> r = PointwiseMutualInformationProfile.from_cooccurrence_probability_profile(q)
>>> r
<fip.profiles.PointwiseMutualInformationProfile object at 0x7f3212047210>

>>> r.df
                      value
feature1 feature2          
a        a         0.000000
         b         0.584963
         c        -0.415037
         d        -0.415037
b        b         0.000000
         c        -0.415037
         d        -0.415037
c        c         0.000000
         d         0.584963
d        d         0.000000
a        x         0.584963
b        x         0.584963
x        x         0.000000

>>> r.select_raw_interrelations_involving('c')
                      value
feature1 feature2          
c        d         0.584963
a        c        -0.415037
b        c        -0.415037


# Export to explicit matrix DataFrame is also possible, with imputation:
>>> r.to_explicit_matrix()
          a         b         c         d         x
a       0.0  0.584963 -0.415037 -0.415037  0.584963
b  0.584963       0.0 -0.415037 -0.415037  0.584963
c -0.415037 -0.415037       0.0  0.584963 -1.754888
d -0.415037 -0.415037  0.584963       0.0 -1.754888
x  0.584963  0.584963 -1.754888 -1.754888       0.0

# much more in documentation :)

Available through the MIT License.

Supported by Junior Internal Grant of the UCT Prague (2021, #2103)

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
.idea		.idea
docs		docs
fip		fip
scripts		scripts
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
fip3.py		fip3.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FIP3 - feature interrelation profiling

Dependencies

Getting started

About

Releases

Packages

Languages

License

cmeloi/fip3

Folders and files

Latest commit

History

Repository files navigation

FIP3 - feature interrelation profiling

Dependencies

Getting started

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages