Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing labels in res-del dataset #39

Open
meghana-kshirsagar opened this issue Sep 13, 2021 · 0 comments
Open

Missing labels in res-del dataset #39

meghana-kshirsagar opened this issue Sep 13, 2021 · 0 comments
Assignees

Comments

@meghana-kshirsagar
Copy link

meghana-kshirsagar commented Sep 13, 2021

Thanks for the great work on this package!
I downloaded the LMDB dataset for residue deletion which unzipped to the following folder:
/raw/RES/data/
data.mdb lock.mdb

When I look at the dataframes for each protein structure, the labels are missing.
dataset = da.load_dataset(lmdb_path, 'lmdb')

 dataset.get('100d')
{'atoms':      ensemble  subunit structure  model chain hetero insertion_code  ...      x      y       z element  name  fullname  serial_number
0    100d.pdb        0  100d.pdb      0     A                        ... -4.549  5.095   4.262       O   O5'       O5'              1
1    100d.pdb        0  100d.pdb      0     A                        ... -4.176  6.323   3.646       C   C5'       C5'              2

[408 rows x 20 columns], 'id': '100d', 'file_path': '/oak/stanford/groups/rbaltman/aderry/graph-pdb/data/raw/100d.pdb', 'labels': Empty DataFrame
Columns: [subunit, label, x, y, z]
Index: [], 'subunit_indices': [], 'types': {'atoms': "<class 'pandas.core.frame.DataFrame'>", 'id': "<class 'str'>", 'file_path': "<class 'str'>", 'labels': "<class 'pandas.core.frame.DataFrame'>", 'subunit_indices': "<class 'list'>", 'types': "<class 'dict'>"}}

Is the idea that one downloads this slightly reformatted PDB data and then runs some feature generation code (ex: generate voxels for 3D CNN) on top of it? Can you please point to the code that can do this (the current code in this repo still seems to use shards and not the lmdb format)?

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants