Suppress or override rule "prepare_nextclade"? "CURLE_PEER_FAILED_VERIFICATION: SSL peer certificate or SSH remote key was not OK." #875

jacaravas · 2022-03-01T19:07:46Z

I'm having difficulty with the latest ncov workflow due to the addition of the automatic update of the nextclade dataset in the "prepare nextclade" rule. The "nextclade dataset get" command does not use the correct certificate store in our conda environment and fails. This issue is documented on the nextclade issues page and no fix is forthcoming.

To bypass this issue in other workflows, I have a script that identifies the latest data release and downloads the files without calling the nextclade datasets get command. It is a showstopper here, though, because that command kills the workflow.

Short of maintaining a separate copy of "main_workflow.smk" that has a customized version of this rule, is there any way to bypass or replace this step?

For searchability, this is the specific error message

[ERROR] Nextclade: When fetching a file "https://data.clades.nextstrain.org/index.json": CURLE_PEER_FAILED_VERIFICATION: SSL peer certificate or SSH remote key was not OK. Please verify the correctness of the address, check your internet connection and try again later. Make sure that you have a permission to access the requested resource.
[Tue Mar 1 14:00:29 2022]
Error in rule prepare_nextclade:
jobid: 31
output: data/sars-cov-2-nextclade-defaults
shell:

    nextclade dataset get --name sars-cov-2 --output-dir data/sars-cov-2-nextclade-defaults

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

The text was updated successfully, but these errors were encountered:

Replaces a reference to the _rule_ that produces Nextclade data needed for alignment with the _literal path_ to the corresponding output. This approach allows users to define custom rules that generate the same output and override the workflow's default rules. This kind of functionality is important, as it allows users to easily modify the workflow DAG from their custom build configuration and avoid maintaining a fork of the main workflow that can get outdated easily. Related to #875

corneliusroemer · 2022-03-02T19:18:33Z

There's a working workaround: download datasets from our Github dataset repo: https://github.com/nextstrain/nextclade_data/tree/master/data/datasets

I know this is not user friendly, but at least it should work until we've fixed nextstrain/nextclade#744

The general issue of nextclade dataset get may be addressed in a future version of Nextclade, but not in the near term since it's difficult with the current C++ CLI implementation.

huddlej · 2022-03-02T19:25:33Z

Thank you for the detailed issue description, @jacaravas! While we consider a solution from the Nextclade side of the problem (as @corneliusroemer mentions above), I do have a workaround that should allow you to define your own Snakemake rule that overrides the main workflow's prepare_nextclade rule with your own logic. This custom Snakemake logic can live in your build profile directory (or anywhere relative to the top-level of the ncov directory), so you don't need to maintain a separate fork of the repo.

If you pull the latest version of the ncov workflow (including the change from #877), you can add the following lines to your builds YAML to include custom rules into the workflow:

custom_rules:
  - "get_nextclade_data.smk"

Then, you can define get_nextclade_data.smk with a custom rule that overrides the main workflow’s rule to get Nextclade data (replacing the shell command below with your custom script):

# Tell Snakemake to prefer the custom rule in this file over the rule in the main workflow
# that produces the same output.
ruleorder: custom_prepare_nextclade > prepare_nextclade

rule custom_prepare_nextclade:
    output:
        nextclade_dataset = directory("data/sars-cov-2-nextclade-defaults"),
    params:
        name = "sars-cov-2",
    conda: config["conda_environment"]
    shell:
        """
        echo "This is a custom rule!"
        nextclade dataset get --name {params.name} --output-dir {output.nextclade_dataset}
        """

Theoretically, you can use this approach to rewire almost any parts of the workflow with your own custom rules. In practice, the workflow still defines inputs using variables that point to the output of specific rules instead of using strings that point to literal paths. This implementation choice on our end prevents easy overriding of rules in the way shown above and is what #877 addressed for this specific Nextclade rule.

Do you want to give this a try and let us know here if the proposed solution above works?

jacaravas · 2022-03-02T19:32:58Z

@huddlej Fantastic! That is exactly what I was looking for. I already have an edited version of the rule in place in main_workflow that does curl based download of the data files, but it will be much easier to maintain the solution as an external custom rule. I was not looking forward to editing "main_workflow.smk" every time I updated.

jacaravas · 2022-03-02T21:07:35Z

I made a quick repo for the script(s) I'm using to download nextclade datasets in case anyone else needs help with this error.
https://github.com/jacaravas/update-nextclade-dataset

corneliusroemer · 2022-03-02T21:18:58Z

Very cool! @jacavaras How did you manage to work around this curl/wget bug both @huddlej and I encountered when trying the manual route?

I'll link to your script from Nextclade, too, as a workaround. This is fab.

huddlej · 2022-03-02T22:45:02Z

Excellent! I'm glad this solution works for your setup, @jacaravas. Thank you also for the example downloads scripts. This does feel like a place that shell + curl is better-suited to the problem at hand than C++.

ivan-aksamentov · 2022-03-07T11:32:50Z

Related: the problem with zip bundles (nextstrain/nextclade#744) was fixed in nextstrain/nextclade_data#24.

These zips are not used by nextclade and neither recommended or supported for end users. Neither parsing index.json is supported. However if nothing else works, zip bundles might be a decent workaround.

The proxy support may be coming, depending on prioritization of other features and dev resources available. Contributions are welcome!

jacaravas added the enhancement New feature or request label Mar 1, 2022

nextstrain-bot added this to Nextstrain planning (archived) Mar 1, 2022

nextstrain-bot moved this to New in Nextstrain planning (archived) Mar 1, 2022

huddlej added bug Something isn't working and removed enhancement New feature or request labels Mar 2, 2022

huddlej self-assigned this Mar 2, 2022

corneliusroemer mentioned this issue Mar 2, 2022

BUG: Cannot request dataset zip bundle via wget/curl, get 502 error nextstrain/nextclade#744

Closed

huddlej mentioned this issue Mar 2, 2022

Use a literal path for Nextclade data input #877

Merged

1 task

huddlej moved this from New to In Progress in Nextstrain planning (archived) Mar 2, 2022

jacaravas closed this as completed Mar 2, 2022

Repository owner moved this from In Progress to Done in Nextstrain planning (archived) Mar 2, 2022

ktmeaton mentioned this issue Jun 1, 2022

Error in nextclade_dataset ktmeaton/ncov-recombinant#16

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suppress or override rule "prepare_nextclade"? "CURLE_PEER_FAILED_VERIFICATION: SSL peer certificate or SSH remote key was not OK." #875

Suppress or override rule "prepare_nextclade"? "CURLE_PEER_FAILED_VERIFICATION: SSL peer certificate or SSH remote key was not OK." #875

jacaravas commented Mar 1, 2022

corneliusroemer commented Mar 2, 2022

huddlej commented Mar 2, 2022

jacaravas commented Mar 2, 2022

jacaravas commented Mar 2, 2022

corneliusroemer commented Mar 2, 2022 •

edited

Loading

huddlej commented Mar 2, 2022

ivan-aksamentov commented Mar 7, 2022 •

edited

Loading

Suppress or override rule "prepare_nextclade"? "CURLE_PEER_FAILED_VERIFICATION: SSL peer certificate or SSH remote key was not OK." #875

Suppress or override rule "prepare_nextclade"? "CURLE_PEER_FAILED_VERIFICATION: SSL peer certificate or SSH remote key was not OK." #875

Comments

jacaravas commented Mar 1, 2022

corneliusroemer commented Mar 2, 2022

huddlej commented Mar 2, 2022

jacaravas commented Mar 2, 2022

jacaravas commented Mar 2, 2022

corneliusroemer commented Mar 2, 2022 • edited Loading

huddlej commented Mar 2, 2022

ivan-aksamentov commented Mar 7, 2022 • edited Loading

corneliusroemer commented Mar 2, 2022 •

edited

Loading

ivan-aksamentov commented Mar 7, 2022 •

edited

Loading