Name		Name	Last commit message	Last commit date
parent directory ..
aaindex		aaindex
avgfp		avgfp
bgl3		bgl3
gb1		gb1
pab1		pab1
ube4b		ube4b
README.md		README.md

README.md

Data

Each dataset has its own directory containing the following:

The dataset .tsv file, a tab-separated dataframe with columns for the variant and functional score.
A pdb file defining the 3D structure of the protein, used for generating protein structure graphs for graph convolutional networks.
The graphs directory containing protein structure graphs. Some graphs have been pre-computed and provided here for reference. You can also generate your own protein structure graphs with different thresholds. See the example notebook structure_graph.ipynb.
A splits directory containing train-tune-test splits that have been saved as files for easy reuse and reproducibility. No splits are provided, so the splits directory does not show up in GitHub, but it will be generated when you generate your first train-tune-test split. See the notebook train_test_split.ipynb for a guide on how to generate a train-tune-test split.

There is also a directory for the processed AAindex features. These were generated with parse_aaindex.py and are used for encoding variants. See the source_data directory for references to the original datasets.

Using your own dataset

We recommend following our format if you want to use your own dataset with this codebase. Create a folder for your dataset and add the corresponding files as described above (see the current datasets for reference). Although not required, we also recommend you define your dataset in constants.py. Doing so will make things easier because you will be able to simply specify the dataset name in several places where otherwise you'd need to specify paths, files, etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

README.md

Data

Using your own dataset

Files

data

Directory actions

More options

Directory actions

More options

Latest commit

History

data

Folders and files

parent directory

README.md

Data

Using your own dataset