Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Cannot request dataset zip bundle via wget/curl, get 502 error #744

Closed
corneliusroemer opened this issue Mar 2, 2022 · 6 comments · Fixed by nextstrain/nextclade_data#24
Labels
needs triage Mark for review and label assignment nextclade_data Concerning dataset part: reference tree, qc definitions etc t:bug Type: bug, error, something isn't working

Comments

@corneliusroemer
Copy link
Member

corneliusroemer commented Mar 2, 2022

When requesting datasets manually via wget/curl as a workaround for #726 #552 something seems wrong.

When requesting a zip file

wget https://data.clades.nextstrain.org/datasets/sars-cov-2/references/MN908947/versions/2022-02-07T12:00:00Z/zip-bundle/nextclade_dataset_sars-cov-2_MN908947_2022-02-07T12:00:00Z.zip

one gets an error 502 Bad Gateway:

❯ wget https://data.clades.nextstrain.org/datasets/sars-cov-2/references/MN908947/versions/2022-02-07T12:00:00Z/zip-bundle/nextclade_dataset_sars-cov-2_MN908947_2022-02-07T12:00:00Z.zip
--2022-03-02 20:12:07--  https://data.clades.nextstrain.org/datasets/sars-cov-2/references/MN908947/versions/2022-02-07T12:00:00Z/zip-bundle/nextclade_dataset_sars-cov-2_MN908947_2022-02-07T12:00:00Z.zip
Resolving data.clades.nextstrain.org (data.clades.nextstrain.org)... 13.224.89.47, 13.224.89.70, 13.224.89.112, ...
Connecting to data.clades.nextstrain.org (data.clades.nextstrain.org)|13.224.89.47|:443... connected.
HTTP request sent, awaiting response... 502 Bad Gateway
2022-03-02 20:12:07 ERROR 502: Bad Gateway.

When requesting

wget https://data.clades.nextstrain.org/index.json

the result is encrypted compressed, rather than a readable json.

First reported in ncov nextstrain/ncov#875 by @jacaravas

@corneliusroemer corneliusroemer added t:bug Type: bug, error, something isn't working needs triage Mark for review and label assignment nextclade_data Concerning dataset part: reference tree, qc definitions etc labels Mar 2, 2022
@corneliusroemer
Copy link
Member Author

Workarounds:

  1. Use nextclade dataset get if you can
  2. Download datasets from Github instead of data.clades.nextstrain.org, you can find all dataset files here: https://github.com/nextstrain/nextclade_data/tree/master/data/datasets

@tsibley
Copy link
Member

tsibley commented Mar 3, 2022

The JSON data's not encrypted; it's gzip compressed. Use curl's --compressed flag or wget's --compression=auto option on the command line to decompress automatically.

@tsibley
Copy link
Member

tsibley commented Mar 3, 2022

The 502 error is something else and likely a bug/problem with whatever AWS Lambda is servicing the request, e.g. it's likely "our" fault and we need to look at the Lambda's logs and fix it. (I don't have context here but could spin up on it if urgent and @ivan-aksamentov isn't around.)

@corneliusroemer
Copy link
Member Author

Thanks for the explanation @tsibley!

I confirm that the bug seems restricted to the zip bundle which is not used by nextclade dataset get and thus not well tested.

This works:

wget --compression=auto https://data.clades.nextstrain.org/datasets/sars-cov-2/references/MN908947/versions/2022-02-07T12:00:00Z/files/genemap.gff

@corneliusroemer corneliusroemer changed the title BUG: Cannot request datasets via wget/curl, get encrypted response or 502 error BUG: Cannot request zip bundle via wget/curl, get 502 error Mar 3, 2022
@corneliusroemer corneliusroemer changed the title BUG: Cannot request zip bundle via wget/curl, get 502 error BUG: Cannot request dataset zip bundle via wget/curl, get 502 error Mar 3, 2022
@ivan-aksamentov
Copy link
Member

ivan-aksamentov commented Mar 7, 2022

The zip bundles were considered for implementing downloads in the early days of Nextclade datasets. However that feature was dropped and they are never used or exposed anywhere currently other than in index.json and in the datasets repo. The index.json is not meant to be used by end-users directly either.

But fixing this issue may help users behind proxies, in blocked countries and in otherwise challenging networking conditions. I investigated and, as @tsibley noted, it was indeed a bug in a Lambda@Edge function. Fixed in nextstrain/nextclade_data#24, nextstrain/nextclade_data#25

Not related to #726 #552 at all. It just happened that the HTTP clients were reporting similar errors.

@corneliusroemer Please double-check that it works now when you have a second.

Repository owner moved this from Backlog to Done in Nextstrain planning (archived) Mar 7, 2022
@corneliusroemer
Copy link
Member Author

Excellent, it works, now.

Yes, I know this is not related to #726 #552 but it does help with workarounds. I'd suggested it in one of the issues, which is why someone noticed the zip threw an error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs triage Mark for review and label assignment nextclade_data Concerning dataset part: reference tree, qc definitions etc t:bug Type: bug, error, something isn't working
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

3 participants