Skip to content

cmeloi/fip3

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FIP3 - feature interrelation profiling

A Python library and script collection for identifying, quantifying and comparing interrelations between arbitrary boolean features (e.g. presence of structural motifs within a molecule, the molecule exhibiting a specific type of biological activity) from their co-occurrences in feature vectors (e.g. features of individual chemical structures) within a given feature vector set (e.g. a chemical database or its subset).

The rationale behind feature interrelation profiling and its example application is further described in Profiling and analysis of chemical compounds using pointwise mutual information.

Dependencies

  • Pandas: needed for core functionality as well as any subsequent interrelation profile analysis
  • RDKit for all chemistry-related functionality
  • Recommended:

Getting started

  • The Sphinx documentation is available at the project GitHub page. It can also be generated by make html command in the docs/source folder.
  • A Jupyter notebook with example use is also available.
  • This library is Python-only, so just installing the core dependencies into the environment, cloning this repository and adding it to the PYTHONPATH should work fine.
  • Building a co-occurrence profile is then simply a matter of:
>>> from fip.profiles import CooccurrenceProfile

# Some dummy feature sets
>>> FEATURE_TUPLES = (('a', 'b', 'c', 'd'), ('a', 'b', 'x'), ('c', 'd'))

# Create a co-occurrence profile instance
>>> p = CooccurrenceProfile.from_feature_lists(FEATURE_TUPLES)

# Unlike any prior implementations of interrelation profiling, FIP3 is a full rework 
# that uses sparse data_mount representation, with lazy, on-demand imputation of missing or
# insignificant pair values. Any explicit interrelations are mapped pairwise
# in a MultiIndex Pandas DataFrame, and can be accessed and handled as such:
>>> p.df
                   value
feature1 feature2       
a        a             2
         b             2
         c             1
         d             1
b        b             2
         c             1
         d             1
c        c             2
         d             2
d        d             2
a        x             1
b        x             1
x        x             1

>>> from fip.profiles import CooccurrenceProbabilityProfile
>>> q = CooccurrenceProbabilityProfile.from_cooccurrence_profile(p)
>>> q
<fip.profiles.CooccurrenceProbabilityProfile object at 0x7f3222e05290>

>>> from fip.profiles import PointwiseMutualInformationProfile
>>> r = PointwiseMutualInformationProfile.from_cooccurrence_probability_profile(q)
>>> r
<fip.profiles.PointwiseMutualInformationProfile object at 0x7f3212047210>

>>> r.df
                      value
feature1 feature2          
a        a         0.000000
         b         0.584963
         c        -0.415037
         d        -0.415037
b        b         0.000000
         c        -0.415037
         d        -0.415037
c        c         0.000000
         d         0.584963
d        d         0.000000
a        x         0.584963
b        x         0.584963
x        x         0.000000

>>> r.select_raw_interrelations_involving('c')
                      value
feature1 feature2          
c        d         0.584963
a        c        -0.415037
b        c        -0.415037


# Export to explicit matrix DataFrame is also possible, with imputation:
>>> r.to_explicit_matrix()
          a         b         c         d         x
a       0.0  0.584963 -0.415037 -0.415037  0.584963
b  0.584963       0.0 -0.415037 -0.415037  0.584963
c -0.415037 -0.415037       0.0  0.584963 -1.754888
d -0.415037 -0.415037  0.584963       0.0 -1.754888
x  0.584963  0.584963 -1.754888 -1.754888       0.0

# much more in documentation :)

Available through the MIT License.

Supported by Junior Internal Grant of the UCT Prague (2021, #2103)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages