ProteinGNN

This project is under active developement.

Protein structure in PDB	Protein in graph representation

This package intends to facilitate graph neural network projects on protein structure. It supports graph data conversion from PDB to PyTorch format, and optionally streamlines model building, training and inference.

The project depends on PyRosetta to recgonize bonds in protein and other bio-molecules. Although this package supports pip dependency in setup.cfg, users are recommended to install PyRosetta, PyTorch, PyTorch Geometric (PyG) and PyTorch_Lightning < 1.4 independently for the best customization.

An example script is available as example.py.

Installation

pip install .

Basic usage

Vanilla PyRosetta initialization is straightforward. It supports most protein structures.

import pyrosetta
pyrosetta.init()

ProteinGNN supports versatile graph data building with DatasetFactory class which accepts user-defined node filtering, node featurization and edge featurization.

import proteingnn as pnn
from proteingnn.data import BaseNodeFilter, AtomNameNodeFilter, AtomtypeNodeFeaturizer, DistanceEdgeFeaturizer

fa_factory = pnn.data.DatasetFactory(name='CADatasetFactory')

# define node filter
fa_factory.node_filter = BaseNodeFilter()  # no filtering

# define node featurizaion 
fa_factory.node_featurizer = AtomtypeNodeFeaturizer()  # by default includes a set of PyRosetta atom names

# define edge featurization
fa_factory.edge_featurizer = DistanceEdgeFeaturizer(max_distance=3, is_edge_only=True)  # no edge features

Your protein graph is just one-line away.

# construct your protein graph!
graph_data = fa_factory.process_graph(path_to_pdb)

# or simply save with save_graph method
fa_factory.save_graph(path_to_pdb, path_to_graph)

Advanced usage

DatasetFactory supports parallelized graph data generation with tdqm progress bar for batch processing.

fa_factory.predataset_path = pdb_directory
fa_factory.dataset_path = graph_directory
fa_factory.create_dataset(n_processes=NUM_PROCESSES)

To support non-standard molecules, PyRosetta requires optional flags and parameter files.

flags_str = pnn.data.get_pyrosetta_flags(path_to_your_flags)
pyrosetta.init(flags_str)

A library of node and edge featurizers is available under proteingnn.data. However, user-defined node and edge featurizers can be defined by inheriting BaseNodeFilter, BaseNodeFeaturizer or BaseEdgeFeaturizer.

from proteingnn.data import SeqEmbNodeFeaturizer, BondedEdgeFeaturizer, DistanceEdgeFeaturizer, \
    HbondEdgeFeaturizer, CompositeEdgeFeaturizer 

# only accept CA atoms
fa_factory.node_filter = AtomNameNodeFilter(atom_name_pass=['CA'], name='CA_filter')
  
# assign sequence embedding on each residue with SeqEmbNodeFeaturizer
fa_factory.node_featurizer = SeqEmbNodeFeaturizer(emb_dir=embedding_directory)

# concatenate edge features with CompositeEdgeFeaturizer
featurizers = [
        BondedEdgeFeaturizer(is_edge_only=False),
        DistanceEdgeFeaturizer(max_distance=3, is_edge_only=False),
        HbondEdgeFeaturizer(is_edge_only=False),
]
edge_featurizer = CompositeEdgeFeaturizer(
        name='CompositeEdgeFeaturizer',
        featurizers=featurizers,
        all_is_edge=False,
)
fa_factory.edge_featurizer = edge_featurizer

License

This package is under MIT license and is not intended for commercial use. Creator(s) is not liable for any potential bugs and implications.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
imgs		imgs
src/proteingnn		src/proteingnn
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
example.py		example.py
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ProteinGNN

Installation

Basic usage

Advanced usage

License

About

Releases

Packages

Languages

License

SimonKitSangChu/ProteinGNN

Folders and files

Latest commit

History

Repository files navigation

ProteinGNN

Installation

Basic usage

Advanced usage

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages