You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I am working on the LBA dataset trying to reproduce your results.
I downloaded your LBA dataset in the LMDB format, the download and load dataset function works fine, but the 'seq' value in the dataset is '[]'- empty for each protein.
why is that?
I tried to generate the sequence by myself using your get_chain_sequences function in the sequence.py in the protein folder:
def get_chain_sequences(df):
"""Return list of tuples of (id, sequence) for different chains of monomers in a given dataframe."""
# Keep only CA of standard residues
df = df[df['name'] == 'CA'].drop_duplicates()
df = df[df['resname'].apply(lambda x: Poly.is_aa(x, standard=True))]
df['resname'] = df['resname'].apply(Poly.three_to_one)
chain_sequences = []
for c, chain in df.groupby(['ensemble', 'subunit', 'structure', 'model', 'chain']):
seq = ''.join(chain['resname'])
chain_sequences.append((tuple([str(x) for x in c]), seq))
return chain_sequences
It also returns empty list for sequence, so I think there is a bug here.
I modified the function a little bit, so I can the get the protein sequences. While for some proteins, there are multiple chains, how to process the multiple chains to use for training or which chain to choose to pair with ligand SMILES to be used for training?
Thanks for your help.
The text was updated successfully, but these errors were encountered:
Hi,
I am working on the LBA dataset trying to reproduce your results.
I downloaded your LBA dataset in the LMDB format, the download and load dataset function works fine, but the 'seq' value in the dataset is '[]'- empty for each protein.
def get_chain_sequences(df):
"""Return list of tuples of (id, sequence) for different chains of monomers in a given dataframe."""
# Keep only CA of standard residues
df = df[df['name'] == 'CA'].drop_duplicates()
df = df[df['resname'].apply(lambda x: Poly.is_aa(x, standard=True))]
df['resname'] = df['resname'].apply(Poly.three_to_one)
chain_sequences = []
for c, chain in df.groupby(['ensemble', 'subunit', 'structure', 'model', 'chain']):
seq = ''.join(chain['resname'])
chain_sequences.append((tuple([str(x) for x in c]), seq))
return chain_sequences
It also returns empty list for sequence, so I think there is a bug here.
Thanks for your help.
The text was updated successfully, but these errors were encountered: