Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data preprocessing breaks because of missing PDBs in SAbDab database #4

Open
kmnis opened this issue Nov 2, 2020 · 0 comments
Open

Comments

@kmnis
Copy link

kmnis commented Nov 2, 2020

Sometimes the SAbDab database might be missing some PDBs mentioned in the sabdab_summary.tsv. For example, 5mhr. While trying to download this PDB from the following link(as mentioned in the code)

http://opig.stats.ox.ac.uk/webapps/newsabdab/sabdab/pdb/5mhr/?scheme=chothia`,

it'll fail and the 5mhr.pdb will be populated with the HTTP error instead of the desired data. It'll look something like this:

Not Found
The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again.

Handling of such files is not supported in the pipeline and it'll break the code while trying to create the antibody.h5 file. It can be fixed by adding a simple function to delete those files. The following function can be called just after truncating the pdbs:

def delete_invalid_pdbs(pdb_file):
    try:
        f = open(pdb_file, 'r').read()
    except FileNotFoundError:
        f = ''
    if "The requested URL was not found on the server" in f:
        os.remove(pdb_file)
        print("Deleted invalid pdb file: {}".format(pdb_file))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant