Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

petsc rerender error #2561

Closed
minrk opened this issue May 9, 2024 · 10 comments
Closed

petsc rerender error #2561

minrk opened this issue May 9, 2024 · 10 comments

Comments

@minrk
Copy link
Contributor

minrk commented May 9, 2024

rerenders of petsc have started to fail, I believe starting with the merge of conda-forge/petsc-feedstock#193, which adds cuda builds.

The bot error doesn't show any actual errors and a manual rerender with the migrator succeeds, so I suspect it could be a memory exhaustion issue.

bot error
bot error (
[bot CI job](https://github.com/regro/cf-scripts/actions/runs/8986224807)
): main: Traceback (most recent call last):
  File "/home/runner/work/cf-scripts/cf-scripts/cf-scripts/conda_forge_tick/auto_tick.py", line 1271, in _run_migrator
    migrator_uid, pr_json = run(
                            ^^^^
  File "/home/runner/work/cf-scripts/cf-scripts/cf-scripts/conda_forge_tick/auto_tick.py", line 284, in run
    rerender_msg = rerender_feedstock(feedstock_dir, timeout=900)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/work/cf-scripts/cf-scripts/cf-scripts/conda_forge_tick/rerender_feedstock.py", line 49, in rerender_feedstock
    return rerender_feedstock_containerized(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/work/cf-scripts/cf-scripts/cf-scripts/conda_forge_tick/rerender_feedstock.py", line 102, in rerender_feedstock_containerized
    data = run_container_task(
           ^^^^^^^^^^^^^^^^^^^
  File "/home/runner/work/cf-scripts/cf-scripts/cf-scripts/conda_forge_tick/utils.py", line 229, in run_container_task
    raise RuntimeError(
RuntimeError: Error running rerender-feedstock in container - error RuntimeError("Failed to rerender.\noutput: Adding in variants from internal_defaults\nAdding in variants from /tmp/tmp76lv6jme/conda-smithy/conda_build_config.yaml\nAdding in variants from /tmp/tmpiph3pba7/petsc-feedstock/recipe/conda_build_config.yaml\nAdding in variants from argument_variants\nINFO:conda_smithy.configure_feedstock:Downloading conda-forge-pinning-2024.05.07.12.03.23\nINFO:conda_smithy.configure_feedstock:Extracting conda-forge-pinning to /tmp/tmp76lv6jme/conda-smithy\nINFO:conda_smithy.configure_feedstock:__pycache__ rendering is skipped\nINFO:conda_smithy.configure_feedstock:README rendering is skipped\nINFO:conda_smithy.configure_feedstock:suitesparse7.yaml from feedstock is ignored and upstream version is used\nINFO:conda_smithy.configure_feedstock:mumps_mpi571.yaml from feedstock is ignored and upstream version is used\nWARNING: Setting build platform. This is only useful when pretending to be on another platform, such as for rendering necessary dependencies on a non-native platform. I trust that you know what you're doing.\nWARNING: Setting build arch. This is only useful when pretending to be on another arch, such as for rendering necessary dependencies on a non-native arch. I trust that you know what you're doing.\nWARNING: No numpy version specified in conda_build_config.yaml.  Falling back to default numpy value of 1.23\nINFO:conda_smithy.configure_feedstock:Applying migrations: /tmp/tmpiph3pba7/petsc-feedstock/.ci_support/migrations/cuda120.yaml,/tmp/tmp76lv6jme/conda-smithy/share/conda-forge/migrations/suitesparse7.yaml,/tmp/tmp76lv6jme/conda-smithy/share/conda-forge/migrations/mumps_mpi571.yaml\n/opt/conda/envs/cf-scripts/lib/python3.11/site-packages/conda_build/environ.py:558: UserWarning: The environment variable 'CUDA_CONDA_TARGET_NAME' is being passed through with value 'x86_64-linux'.  If you are splitting build and test phases with --no-test, please ensure that this value is also set similarly at test time.\n  warnings.warn(\nINFO:conda_smithy.configure_feedstock:Applying migrations: /tmp/tmpiph3pba7/petsc-feedstock/.ci_support/migrations/cuda120.yaml,/tmp/tmp76lv6jme/conda-smithy/share/conda-forge/migrations/suitesparse7.yaml,/tmp/tmp76lv6jme/conda-smithy/share/conda-forge/migrations/mumps_mpi571.yaml\n/opt/conda/envs/cf-scripts/lib/python3.11/site-packages/conda_build/environ.py:558: UserWarning: The environment variable 'CUDA_CONDA_TARGET_NAME' is being passed through with value 'sbsa-linux'.  If you are splitting build and test phases with --no-test, please ensure that this value is also set similarly at test time.\n  warnings.warn(\nINFO:conda_smithy.configure_feedstock:Applying migrations: /tmp/tmpiph3pba7/petsc-feedstock/.ci_support/migrations/cuda120.yaml,/tmp/tmp76lv6jme/conda-smithy/share/conda-forge/migrations/suitesparse7.yaml,/tmp/tmp76lv6jme/conda-smithy/share/conda-forge/migrations/mumps_mpi571.yaml\n/opt/conda/envs/cf-scripts/lib/python3.11/site-packages/conda_build/environ.py:558: UserWarning: The environment variable 'CUDA_CONDA_TARGET_NAME' is being passed through with value 'ppc64le-linux'.  If you are splitting build and test phases with --no-test, please ensure that this value is also set similarly at test time.\n  warnings.warn(\n\n") raised:
cmd: ['docker',
 'run',
 '-e',
 'CF_TICK_IN_CONTAINER=true',
 '--security-opt=no-new-privileges',
 '--read-only',
 '--cap-drop=all',
 '--mount',
 'type=tmpfs,destination=/tmp,tmpfs-mode=1777,tmpfs-size=6000000000',
 '-m',
 '6000m',
 '--cpus',
 '1',
 '--ulimit',
 'nofile=1024:1024',
 '--ulimit',
 'nproc=2048:2048',
 '--rm',
 '-i',
 '--mount',
 'type=bind,source=/tmp/tmp2sh_59zq,destination=/cf_tick_dir',
 'ghcr.io/regro/conda-forge-tick:master',
 '/opt/conda/envs/cf-scripts/bin/python',
 '-u',
 '/opt/autotick-bot/docker/run_bot_task.py',
 'rerender-feedstock',
 '--timeout',
 '900',
 '--log-level',
 'info']
stdout: {'data': None,
 'error': 'RuntimeError("Failed to rerender.\\noutput: Adding in variants from '
          'internal_defaults\\nAdding in variants from '
          '/tmp/tmp76lv6jme/conda-smithy/conda_build_config.yaml\\nAdding in '
          'variants from '
          '/tmp/tmpiph3pba7/petsc-feedstock/recipe/conda_build_config.yaml\\nAdding '
          'in variants from '
          'argument_variants\\nINFO:conda_smithy.configure_feedstock:Downloading '
          'conda-forge-pinning-2024.05.07.12.03.23\\nINFO:conda_smithy.configure_feedstock:Extracting '
          'conda-forge-pinning to '
          '/tmp/tmp76lv6jme/conda-smithy\\nINFO:conda_smithy.configure_feedstock:__pycache__ '
          'rendering is skipped\\nINFO:conda_smithy.configure_feedstock:README '
          'rendering is '
          'skipped\\nINFO:conda_smithy.configure_feedstock:suitesparse7.yaml '
          'from feedstock is ignored and upstream version is '
          'used\\nINFO:conda_smithy.configure_feedstock:mumps_mpi571.yaml from '
          'feedstock is ignored and upstream version is used\\nWARNING: '
          'Setting build platform. This is only useful when pretending to be '
          'on another platform, such as for rendering necessary dependencies '
          "on a non-native platform. I trust that you know what you're "
          'doing.\\nWARNING: Setting build arch. This is only useful when '
          'pretending to be on another arch, such as for rendering necessary '
          'dependencies on a non-native arch. I trust that you know what '
          "you're doing.\\nWARNING: No numpy version specified in "
          'conda_build_config.yaml.  Falling back to default numpy value of '
          '1.23\\nINFO:conda_smithy.configure_feedstock:Applying migrations: '
          '/tmp/tmpiph3pba7/petsc-feedstock/.ci_support/migrations/cuda120.yaml,/tmp/tmp76lv6jme/conda-smithy/share/conda-forge/migrations/suitesparse7.yaml,/tmp/tmp76lv6jme/conda-smithy/share/conda-forge/migrations/mumps_mpi571.yaml\\n/opt/conda/envs/cf-scripts/lib/python3.11/site-packages/conda_build/environ.py:558: '
          "UserWarning: The environment variable 'CUDA_CONDA_TARGET_NAME' is "
          "being passed through with value 'x86_64-linux'.  If you are "
          'splitting build and test phases with --no-test, please ensure that '
          'this value is also set similarly at test time.\\n  '
          'warnings.warn(\\nINFO:conda_smithy.configure_feedstock:Applying '
          'migrations: '
          '/tmp/tmpiph3pba7/petsc-feedstock/.ci_support/migrations/cuda120.yaml,/tmp/tmp76lv6jme/conda-smithy/share/conda-forge/migrations/suitesparse7.yaml,/tmp/tmp76lv6jme/conda-smithy/share/conda-forge/migrations/mumps_mpi571.yaml\\n/opt/conda/envs/cf-scripts/lib/python3.11/site-packages/conda_build/environ.py:558: '
          "UserWarning: The environment variable 'CUDA_CONDA_TARGET_NAME' is "
          "being passed through with value 'sbsa-linux'.  If you are splitting "
          'build and test phases with --no-test, please ensure that this value '
          'is also set similarly at test time.\\n  '
          'warnings.warn(\\nINFO:conda_smithy.configure_feedstock:Applying '
          'migrations: '
          '/tmp/tmpiph3pba7/petsc-feedstock/.ci_support/migrations/cuda120.yaml,/tmp/tmp76lv6jme/conda-smithy/share/conda-forge/migrations/suitesparse7.yaml,/tmp/tmp76lv6jme/conda-smithy/share/conda-forge/migrations/mumps_mpi571.yaml\\n/opt/conda/envs/cf-scripts/lib/python3.11/site-packages/conda_build/environ.py:558: '
          "UserWarning: The environment variable 'CUDA_CONDA_TARGET_NAME' is "
          "being passed through with value 'ppc64le-linux'.  If you are "
          'splitting build and test phases with --no-test, please ensure that '
          'this value is also set similarly at test time.\\n  '
          'warnings.warn(\\n\\n")',
 'timeout': 900,
 'traceback': 'Traceback (most recent call last):\n'
              '  File "/opt/autotick-bot/docker/run_bot_task.py", line 118, in '
              '_run_bot_task\n'
              '    data = func(**kwargs)\n'
              '           ^^^^^^^^^^^^^^\n'
              '  File "/opt/autotick-bot/docker/run_bot_task.py", line 239, in '
              '_rerender_feedstock\n'
              '    msg = rerender_feedstock_local(fs_dir, **kwargs)\n'
              '          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n'
              '  File '
              '"/opt/autotick-bot/conda_forge_tick/rerender_feedstock.py", '
              'line 239, in rerender_feedstock_local\n'
              '    raise RuntimeError(f"Failed to rerender.\\noutput: '
              '{ret.stdout}\\n")\n'
              'RuntimeError: Failed to rerender.\n'
              'output: Adding in variants from internal_defaults\n'
              'Adding in variants from '
              '/tmp/tmp76lv6jme/conda-smithy/conda_build_config.yaml\n'
              'Adding in variants from '
              '/tmp/tmpiph3pba7/petsc-feedstock/recipe/conda_build_config.yaml\n'
              'Adding in variants from argument_variants\n'
              'INFO:conda_smithy.configure_feedstock:Downloading '
              'conda-forge-pinning-2024.05.07.12.03.23\n'
              'INFO:conda_smithy.configure_feedstock:Extracting '
              'conda-forge-pinning to /tmp/tmp76lv6jme/conda-smithy\n'
              'INFO:conda_smithy.configure_feedstock:__pycache__ rendering is '
              'skipped\n'
              'INFO:conda_smithy.configure_feedstock:README rendering is '
              'skipped\n'
              'INFO:conda_smithy.configure_feedstock:suitesparse7.yaml from '
              'feedstock is ignored and upstream version is used\n'
              'INFO:conda_smithy.configure_feedstock:mumps_mpi571.yaml from '
              'feedstock is ignored and upstream version is used\n'
              'WARNING: Setting build platform. This is only useful when '
              'pretending to be on another platform, such as for rendering '
              'necessary dependencies on a non-native platform. I trust that '
              "you know what you're doing.\n"
              'WARNING: Setting build arch. This is only useful when '
              'pretending to be on another arch, such as for rendering '
              'necessary dependencies on a non-native arch. I trust that you '
              "know what you're doing.\n"
              'WARNING: No numpy version specified in '
              'conda_build_config.yaml.  Falling back to default numpy value '
              'of 1.23\n'
              'INFO:conda_smithy.configure_feedstock:Applying migrations: '
              '/tmp/tmpiph3pba7/petsc-feedstock/.ci_support/migrations/cuda120.yaml,/tmp/tmp76lv6jme/conda-smithy/share/conda-forge/migrations/suitesparse7.yaml,/tmp/tmp76lv6jme/conda-smithy/share/conda-forge/migrations/mumps_mpi571.yaml\n'
              '/opt/conda/envs/cf-scripts/lib/python3.11/site-packages/conda_build/environ.py:558: '
              "UserWarning: The environment variable 'CUDA_CONDA_TARGET_NAME' "
              "is being passed through with value 'x86_64-linux'.  If you are "
              'splitting build and test phases with --no-test, please ensure '
              'that this value is also set similarly at test time.\n'
              '  warnings.warn(\n'
              'INFO:conda_smithy.configure_feedstock:Applying migrations: '
              '/tmp/tmpiph3pba7/petsc-feedstock/.ci_support/migrations/cuda120.yaml,/tmp/tmp76lv6jme/conda-smithy/share/conda-forge/migrations/suitesparse7.yaml,/tmp/tmp76lv6jme/conda-smithy/share/conda-forge/migrations/mumps_mpi571.yaml\n'
              '/opt/conda/envs/cf-scripts/lib/python3.11/site-packages/conda_build/environ.py:558: '
              "UserWarning: The environment variable 'CUDA_CONDA_TARGET_NAME' "
              "is being passed through with value 'sbsa-linux'.  If you are "
              'splitting build and test phases with --no-test, please ensure '
              'that this value is also set similarly at test time.\n'
              '  warnings.warn(\n'
              'INFO:conda_smithy.configure_feedstock:Applying migrations: '
              '/tmp/tmpiph3pba7/petsc-feedstock/.ci_support/migrations/cuda120.yaml,/tmp/tmp76lv6jme/conda-smithy/share/conda-forge/migrations/suitesparse7.yaml,/tmp/tmp76lv6jme/conda-smithy/share/conda-forge/migrations/mumps_mpi571.yaml\n'
              '/opt/conda/envs/cf-scripts/lib/python3.11/site-packages/conda_build/environ.py:558: '
              "UserWarning: The environment variable 'CUDA_CONDA_TARGET_NAME' "
              "is being passed through with value 'ppc64le-linux'.  If you are "
              'splitting build and test phases with --no-test, please ensure '
              'that this value is also set similarly at test time.\n'
              '  warnings.warn(\n'
              '\n'
              '\n'}
@beckermr
Copy link
Contributor

beckermr commented May 9, 2024

How long does the local rerender take? I wonder if the timeout is the issue. The bot kills the job at 900 seconds.

@minrk
Copy link
Contributor Author

minrk commented May 9, 2024

timeout makes sense, it takes 600 seconds on my m1 mac.

@minrk
Copy link
Contributor Author

minrk commented May 14, 2024

petsc suitesparse migration rerender succeeded after failing for a while, and it looks like it took 630 seconds. mumps also unstuck, so not sure if there's still an action to take here?

It could be really useful if there were a way to manually issue PRs that the migrator recognizes as part of the migration when the migrator fails.

@beckermr
Copy link
Contributor

Yes, or we could have the bot just issue the PR and make a rerender comment in the PR. I don't want to do that in general so that we don't overload GHA, but it might be ok here.

@minrk
Copy link
Contributor Author

minrk commented Jun 21, 2024

FWIW, this is happening again in petsc4py, which currently has 260(!) variants and takes over half an hour on my machine to rerender (it takes less time to build all variants in parallel on CI than to rerender with no changes). Unfortunately, ~all of the time is in render_recipe which happens before evaluating meta.skip(), so adding skips doesn't meaningfully affect the rerender time (I started investigating thinning the matrix for conda-forge/petsc4py-feedstock#96, but learned that doing this via skip doesn't save any rerender time).

petsc4py variants have 5 dimensions now:

  • mpi (2)
  • scalar (2)
  • cuda (3, linux only)
  • platform (5)
  • python (6)

so each linux target has 72 variants to render.

Surprisingly (to me, at least), profiling with pyinstrument suggests that it spends a huge fraction (80% of render_recipe) in copy.deepcopy copying the MetaData objects for the variants. render_recipe for linux-64 spends fully 10 minutes in MetaData.copy() alone, and less than a minute everywhere else combined.

Rendering just 24 variants results in 24 calls to MetaData.copy, but 82,229,145 calls to copy.deepcopy.

So if copying metadata for variants can be optimized, I think overall a massive amount of rerender time can be saved across conda-forge.

@beckermr
Copy link
Contributor

Yep. See this PR: conda/conda-build#5281

We should get that in.

@minrk
Copy link
Contributor Author

minrk commented Jun 21, 2024

Nice! That should help. On investigation, I doubt variants needs to be deep-copied at all, but the only way to be safe there would be to switch each variant it to an immutable data type, e.g. frozendict of tuples instead of dict of lists. I suspect that's probably the way to go in the long run.

conda/conda-build#5281 reduces petsc4py rerender from over 30 minutes to under 5. The next limiting factor is in get_logger which completely reinitializes the logger on every call and accounts for 30% of overall render time (logger.setLevel is more than the new pickle-based deepcopy).

@beckermr
Copy link
Contributor

The next limiting factor is in get_logger which completely reinitializes the logger on every call and accounts for 30% of overall render time (logger.setLevel is more than the new pickle-based deepcopy).

Yes and this function has other odd side effects which make conda-build hard to use. We should open an issue.

@minrk
Copy link
Contributor Author

minrk commented Jun 21, 2024

conda/conda-build#5382

@minrk
Copy link
Contributor Author

minrk commented Jun 28, 2024

With the optimized PRs and conda-forge/conda-smithy#1967 petsc4py renders in under 5 minutes.

The next limiting factor is in the calls to get_loop_vars because it's being rendered with 13,824 variants. I guess because all possible variants are exploded before used_vars are calculated. I think that's unnecessary and the fix may be small (🤞 ).

@minrk minrk closed this as completed Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants