-
Notifications
You must be signed in to change notification settings - Fork 403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suppress or override rule "prepare_nextclade"? "CURLE_PEER_FAILED_VERIFICATION: SSL peer certificate or SSH remote key was not OK." #875
Comments
Replaces a reference to the _rule_ that produces Nextclade data needed for alignment with the _literal path_ to the corresponding output. This approach allows users to define custom rules that generate the same output and override the workflow's default rules. This kind of functionality is important, as it allows users to easily modify the workflow DAG from their custom build configuration and avoid maintaining a fork of the main workflow that can get outdated easily. Related to #875
There's a working workaround: download datasets from our Github dataset repo: https://github.com/nextstrain/nextclade_data/tree/master/data/datasets I know this is not user friendly, but at least it should work until we've fixed nextstrain/nextclade#744 The general issue of |
Thank you for the detailed issue description, @jacaravas! While we consider a solution from the Nextclade side of the problem (as @corneliusroemer mentions above), I do have a workaround that should allow you to define your own Snakemake rule that overrides the main workflow's If you pull the latest version of the ncov workflow (including the change from #877), you can add the following lines to your builds YAML to include custom rules into the workflow: custom_rules:
- "get_nextclade_data.smk" Then, you can define # Tell Snakemake to prefer the custom rule in this file over the rule in the main workflow
# that produces the same output.
ruleorder: custom_prepare_nextclade > prepare_nextclade
rule custom_prepare_nextclade:
output:
nextclade_dataset = directory("data/sars-cov-2-nextclade-defaults"),
params:
name = "sars-cov-2",
conda: config["conda_environment"]
shell:
"""
echo "This is a custom rule!"
nextclade dataset get --name {params.name} --output-dir {output.nextclade_dataset}
""" Theoretically, you can use this approach to rewire almost any parts of the workflow with your own custom rules. In practice, the workflow still defines inputs using variables that point to the output of specific rules instead of using strings that point to literal paths. This implementation choice on our end prevents easy overriding of rules in the way shown above and is what #877 addressed for this specific Nextclade rule. Do you want to give this a try and let us know here if the proposed solution above works? |
@huddlej Fantastic! That is exactly what I was looking for. I already have an edited version of the rule in place in main_workflow that does curl based download of the data files, but it will be much easier to maintain the solution as an external custom rule. I was not looking forward to editing "main_workflow.smk" every time I updated. |
I made a quick repo for the script(s) I'm using to download nextclade datasets in case anyone else needs help with this error. |
Very cool! @jacavaras How did you manage to work around this curl/wget bug both @huddlej and I encountered when trying the manual route? I'll link to your script from Nextclade, too, as a workaround. This is fab. |
Excellent! I'm glad this solution works for your setup, @jacaravas. Thank you also for the example downloads scripts. This does feel like a place that shell + curl is better-suited to the problem at hand than C++. |
Related: the problem with zip bundles (nextstrain/nextclade#744) was fixed in nextstrain/nextclade_data#24. These zips are not used by nextclade and neither recommended or supported for end users. Neither parsing index.json is supported. However if nothing else works, zip bundles might be a decent workaround. The proxy support may be coming, depending on prioritization of other features and dev resources available. Contributions are welcome! |
I'm having difficulty with the latest ncov workflow due to the addition of the automatic update of the nextclade dataset in the "prepare nextclade" rule. The "nextclade dataset get" command does not use the correct certificate store in our conda environment and fails. This issue is documented on the nextclade issues page and no fix is forthcoming.
To bypass this issue in other workflows, I have a script that identifies the latest data release and downloads the files without calling the nextclade datasets get command. It is a showstopper here, though, because that command kills the workflow.
Short of maintaining a separate copy of "main_workflow.smk" that has a customized version of this rule, is there any way to bypass or replace this step?
For searchability, this is the specific error message
[ERROR] Nextclade: When fetching a file "https://data.clades.nextstrain.org/index.json": CURLE_PEER_FAILED_VERIFICATION: SSL peer certificate or SSH remote key was not OK. Please verify the correctness of the address, check your internet connection and try again later. Make sure that you have a permission to access the requested resource.
[Tue Mar 1 14:00:29 2022]
Error in rule prepare_nextclade:
jobid: 31
output: data/sars-cov-2-nextclade-defaults
shell:
The text was updated successfully, but these errors were encountered: