Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: '7xas' #12

Open
BinhongLiu opened this issue Mar 6, 2023 · 3 comments
Open

KeyError: '7xas' #12

BinhongLiu opened this issue Mar 6, 2023 · 3 comments

Comments

@BinhongLiu
Copy link

BinhongLiu commented Mar 6, 2023

Hi
I tested with my own .pdb file today, but a similar error appeared, which seems to be caused by the mapping problem between the database and the pdb_metadata.csv file.

The code1:
python search_site.py bai/4is3.pdb B K161 data/datasets/pdb_embeddings.pkl --cutoff 1e-3 --verbose --num_iter 3

Then I chose a different central residue. Still, a similar but different error appeared.
The code2:
python search_site.py bai/4is3.pdb B G94 data/datasets/pdb_embeddings.pkl --cutoff 1e-3 --verbose --num_iter 3

Both 7xas and 5esy that caused the two similar errors are indeed not found in the pdb_metadata.csv file.

In the meantime, I test the script using the demo data again, and no error appeared.
python search_site.py data/examples/1a0h.pdb B H363 data/datasets/pdb_embeddings.pkl --cutoff 1e-3 --verbose --num_iter 3

I'm a little confused why this error will happen when I just used a different .pdb structure or central residue.

The log file of code1:

search_site.py:43: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
pdb_meta = pdb_meta.append(pd.Series(data=['N/A'] * pdb_meta.shape[1], index=pdb_meta.columns, name=query_pdb))
['103l_A' '103l_A' '103l_A' '103l_A' '103l_A']
59445
Database size: 873863
Iteration 1: 458 new results
Iteration 2: 1023 new results
Iteration 3: 41687 new results
Traceback (most recent call last):
File "/work/home/ac1daawz21/miniconda3/envs/COLLAPSE/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3802, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 5745, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 5753, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: '7xas'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "search_site.py", line 126, in
main(args)
File "search_site.py", line 99, in main
results[cols] = results['PDB'].apply(lambda x: pdb_meta.loc[x[:4], cols])
File "/work/home/ac1daawz21/miniconda3/envs/COLLAPSE/lib/python3.8/site-packages/pandas/core/series.py", line 4771, in apply
return SeriesApply(self, func, convert_dtype, args, kwargs).apply()
File "/work/home/ac1daawz21/miniconda3/envs/COLLAPSE/lib/python3.8/site-packages/pandas/core/apply.py", line 1123, in apply
return self.apply_standard()
File "/work/home/ac1daawz21/miniconda3/envs/COLLAPSE/lib/python3.8/site-packages/pandas/core/apply.py", line 1174, in apply_standard
mapped = lib.map_infer(
File "pandas/_libs/lib.pyx", line 2924, in pandas._libs.lib.map_infer
File "search_site.py", line 99, in
results[cols] = results['PDB'].apply(lambda x: pdb_meta.loc[x[:4], cols])
File "/work/home/ac1daawz21/miniconda3/envs/COLLAPSE/lib/python3.8/site-packages/pandas/core/indexing.py", line 1067, in getitem
return self._getitem_tuple(key)
File "/work/home/ac1daawz21/miniconda3/envs/COLLAPSE/lib/python3.8/site-packages/pandas/core/indexing.py", line 1247, in _getitem_tuple
return self._getitem_lowerdim(tup)
File "/work/home/ac1daawz21/miniconda3/envs/COLLAPSE/lib/python3.8/site-packages/pandas/core/indexing.py", line 967, in _getitem_lowerdim
section = self._getitem_axis(key, axis=i)
File "/work/home/ac1daawz21/miniconda3/envs/COLLAPSE/lib/python3.8/site-packages/pandas/core/indexing.py", line 1312, in _getitem_axis
return self._get_label(key, axis=axis)
File "/work/home/ac1daawz21/miniconda3/envs/COLLAPSE/lib/python3.8/site-packages/pandas/core/indexing.py", line 1260, in _get_label
return self.obj.xs(label, axis=axis)
File "/work/home/ac1daawz21/miniconda3/envs/COLLAPSE/lib/python3.8/site-packages/pandas/core/generic.py", line 4056, in xs
loc = index.get_loc(key)
File "/work/home/ac1daawz21/miniconda3/envs/COLLAPSE/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3804, in get_loc
raise KeyError(key) from err
KeyError: '7xas'

The log file of code2:
search_site.py:43: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
pdb_meta = pdb_meta.append(pd.Series(data=['N/A'] * pdb_meta.shape[1], index=pdb_meta.columns, name=query_pdb))
['103l_A' '103l_A' '103l_A' '103l_A' '103l_A']
59445
Database size: 1100118
Iteration 1: 435 new results
Iteration 2: 2979 new results
Iteration 3: 51569 new results
Traceback (most recent call last):
File "/work/home/ac1daawz21/miniconda3/envs/COLLAPSE/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3802, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 5745, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 5753, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: '5esy'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "search_site.py", line 126, in
main(args)
File "search_site.py", line 99, in main
results[cols] = results['PDB'].apply(lambda x: pdb_meta.loc[x[:4], cols])
File "/work/home/ac1daawz21/miniconda3/envs/COLLAPSE/lib/python3.8/site-packages/pandas/core/series.py", line 4771, in apply
return SeriesApply(self, func, convert_dtype, args, kwargs).apply()
File "/work/home/ac1daawz21/miniconda3/envs/COLLAPSE/lib/python3.8/site-packages/pandas/core/apply.py", line 1123, in apply
return self.apply_standard()
File "/work/home/ac1daawz21/miniconda3/envs/COLLAPSE/lib/python3.8/site-packages/pandas/core/apply.py", line 1174, in apply_standard
mapped = lib.map_infer(
File "pandas/_libs/lib.pyx", line 2924, in pandas._libs.lib.map_infer
File "search_site.py", line 99, in
results[cols] = results['PDB'].apply(lambda x: pdb_meta.loc[x[:4], cols])
File "/work/home/ac1daawz21/miniconda3/envs/COLLAPSE/lib/python3.8/site-packages/pandas/core/indexing.py", line 1067, in getitem
return self._getitem_tuple(key)
File "/work/home/ac1daawz21/miniconda3/envs/COLLAPSE/lib/python3.8/site-packages/pandas/core/indexing.py", line 1247, in _getitem_tuple
return self._getitem_lowerdim(tup)
File "/work/home/ac1daawz21/miniconda3/envs/COLLAPSE/lib/python3.8/site-packages/pandas/core/indexing.py", line 967, in _getitem_lowerdim
section = self._getitem_axis(key, axis=i)
File "/work/home/ac1daawz21/miniconda3/envs/COLLAPSE/lib/python3.8/site-packages/pandas/core/indexing.py", line 1312, in _getitem_axis
return self._get_label(key, axis=axis)
File "/work/home/ac1daawz21/miniconda3/envs/COLLAPSE/lib/python3.8/site-packages/pandas/core/indexing.py", line 1260, in _get_label
return self.obj.xs(label, axis=axis)
File "/work/home/ac1daawz21/miniconda3/envs/COLLAPSE/lib/python3.8/site-packages/pandas/core/generic.py", line 4056, in xs
loc = index.get_loc(key)
File "/work/home/ac1daawz21/miniconda3/envs/COLLAPSE/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3804, in get_loc
raise KeyError(key) from err
KeyError: '5esy'

@BinhongLiu
Copy link
Author

It seems to be that not every residue could be chosen as the central residue, right? I'm sorry I'm not good at this field.

@awfderry
Copy link
Owner

Hi @BinhongLiu, it seems like this error appears because there are some deprecated PDBs (such as 7xas) that are in the embedding database but not the PDB metadata. This issue has been fixed so that PDB IDs not in the metadata don't cause an error.

@awfderry
Copy link
Owner

Also, it seems like in these examples you are getting a very large number of results (>40000) by iteration 3, which is likely very slow to run and will result in low specificity. Unless this is what you're looking for, I would suggest generally running with a higher cutoff (e.g. 1e-4) or fewer iterations to improve performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants