Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seeing timeouts with https://conda.anaconda.org/rapidsai-nightly #906

Open
2 tasks done
jakirkham opened this issue Apr 2, 2024 · 15 comments
Open
2 tasks done

Seeing timeouts with https://conda.anaconda.org/rapidsai-nightly #906

jakirkham opened this issue Apr 2, 2024 · 15 comments
Labels
type::bug describes erroneous operation, use severity::* to classify the type

Comments

@jakirkham
Copy link
Member

Checklist

  • I added a descriptive title
  • I searched open reports and couldn't find a duplicate

What happened?

#11 142.8     Traceback (most recent call last):
#11 142.8       File "/opt/conda/lib/python3.9/site-packages/conda/exception_handler.py", line 17, in __call__
#11 142.8         return func(*args, **kwargs)
#11 142.8       File "/opt/conda/lib/python3.9/site-packages/mamba/mamba.py", line 959, in exception_converter
#11 142.8         raise e
#11 142.8       File "/opt/conda/lib/python3.9/site-packages/mamba/mamba.py", line 952, in exception_converter
#11 142.8         exit_code = _wrapped_main(*args, **kwargs)
#11 142.8       File "/opt/conda/lib/python3.9/site-packages/mamba/mamba.py", line 898, in _wrapped_main
#11 142.8         result = do_call(parsed_args, p)
#11 142.8       File "/opt/conda/lib/python3.9/site-packages/mamba/mamba.py", line 763, in do_call
#11 142.8         exit_code = install(args, parser, "install")
#11 142.8       File "/opt/conda/lib/python3.9/site-packages/mamba/mamba.py", line 558, in install
#11 142.8         transaction.fetch_extract_packages()
#11 142.8     RuntimeError: Download error (28) Timeout was reached [https://conda.anaconda.org/rapidsai-nightly/linux-64/librmm-24.04.00a39-cuda12_240402_g0651edf0_39.tar.bz2]
#11 142.8     Operation too slow. Less than 30 bytes/sec transferred the last 60 seconds

Conda Info

active environment : None
       user config file : /home/rapids/.condarc
 populated config files : /opt/conda/.condarc
          conda version : 24.1.2
    conda-build version : not installed
         python version : 3.9.19.final.0
                 solver : libmamba (default)
       virtual packages : __archspec=1=sapphirerapids
                          __conda=24.1.2=0
                          __glibc=2.35=0
                          __linux=6.5.0=0
                          __unix=0=0
       base environment : /opt/conda  (writable)
      conda av data dir : /opt/conda/etc/conda
  conda av metadata url : None
           channel URLs : https://conda.anaconda.org/rapidsai-nightly/linux-64
                          https://conda.anaconda.org/rapidsai-nightly/noarch
                          https://conda.anaconda.org/dask/label/dev/linux-64
                          https://conda.anaconda.org/dask/label/dev/noarch
                          https://conda.anaconda.org/pytorch/linux-64
                          https://conda.anaconda.org/pytorch/noarch
                          https://conda.anaconda.org/conda-forge/linux-64
                          https://conda.anaconda.org/conda-forge/noarch
                          https://conda.anaconda.org/nvidia/linux-64
                          https://conda.anaconda.org/nvidia/noarch
          package cache : /opt/conda/pkgs
                          /home/rapids/.conda/pkgs
       envs directories : /opt/conda/envs
                          /home/rapids/.conda/envs
               platform : linux-64
             user-agent : conda/24.1.2 requests/2.31.0 CPython/3.9.19 Linux/6.5.0-1016-aws ubuntu/22.04.4 glibc/2.35 solver/libmamba conda-libmamba-solver/24.1.0 libmambapy/1.5.8
                UID:GID : 1001:1000
             netrc file : None
           offline mode : False

Conda Config

==> /opt/conda/.condarc <==
auto_update_conda: False
channels:
  - rapidsai-nightly
  - dask/label/dev
  - pytorch
  - conda-forge
  - nvidia

Conda list

# packages in environment at /opt/conda:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
archspec                  0.2.3              pyhd8ed1ab_0    conda-forge
boltons                   24.0.0             pyhd8ed1ab_0    conda-forge
brotli-python             1.1.0            py39h3d6467e_1    conda-forge
bzip2                     1.0.8                hd590300_5    conda-forge
c-ares                    1.28.1               hd590300_0    conda-forge
ca-certificates           2024.2.2             hbcca054_0    conda-forge
certifi                   2024.2.2           pyhd8ed1ab_0    conda-forge
cffi                      1.16.0           py39h7a31438_0    conda-forge
charset-normalizer        3.3.2              pyhd8ed1ab_0    conda-forge
colorama                  0.4.6              pyhd8ed1ab_0    conda-forge
conda                     24.1.2           py39hf3d152e_0    conda-forge
conda-libmamba-solver     24.1.0             pyhd8ed1ab_0    conda-forge
conda-package-handling    2.2.0              pyh38be061_0    conda-forge
conda-package-streaming   0.9.0              pyhd8ed1ab_0    conda-forge
cryptography              42.0.5           py39hd4f0224_0    conda-forge
distro                    1.9.0              pyhd8ed1ab_0    conda-forge
fmt                       10.2.1               h00ab1b0_0    conda-forge
icu                       73.2                 h59595ed_0    conda-forge
idna                      3.6                pyhd8ed1ab_0    conda-forge
jsonpatch                 1.33               pyhd8ed1ab_0    conda-forge
jsonpointer               2.4              py39hf3d152e_3    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
krb5                      1.21.2               h659d440_0    conda-forge
ld_impl_linux-64          2.40                 h41732ed_0    conda-forge
libarchive                3.7.2                h2aa1ff5_1    conda-forge
libcurl                   8.7.1                hca28451_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 hd590300_2    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 13.2.0               h807b86a_5    conda-forge
libgomp                   13.2.0               h807b86a_5    conda-forge
libiconv                  1.17                 hd590300_2    conda-forge
libmamba                  1.5.8                had39da4_0    conda-forge
libmambapy                1.5.8            py39h10defb6_0    conda-forge
libnghttp2                1.58.0               h47da74e_1    conda-forge
libnsl                    2.0.1                hd590300_0    conda-forge
libsolv                   0.7.28               hfc55251_2    conda-forge
libsqlite                 3.45.2               h2797004_0    conda-forge
libssh2                   1.11.0               h0841786_0    conda-forge
libstdcxx-ng              13.2.0               h7e041cc_5    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libxcrypt                 4.4.36               hd590300_1    conda-forge
libxml2                   2.12.6               h232c23b_1    conda-forge
libzlib                   1.2.13               hd590300_5    conda-forge
lz4-c                     1.9.4                hcb278e6_0    conda-forge
lzo                       2.10              h516909a_1000    conda-forge
mamba                     1.5.8            py39hc5d2bb1_0    conda-forge
menuinst                  2.0.2            py39hf3d152e_0    conda-forge
ncurses                   6.4.20240210         h59595ed_0    conda-forge
openssl                   3.2.1                hd590300_1    conda-forge
packaging                 24.0               pyhd8ed1ab_0    conda-forge
pip                       24.0               pyhd8ed1ab_0    conda-forge
platformdirs              4.2.0              pyhd8ed1ab_0    conda-forge
pluggy                    1.4.0              pyhd8ed1ab_0    conda-forge
pybind11-abi              4                    hd8ed1ab_3    conda-forge
pycosat                   0.6.6            py39hd1e30aa_0    conda-forge
pycparser                 2.22               pyhd8ed1ab_0    conda-forge
pyopenssl                 24.0.0             pyhd8ed1ab_0    conda-forge
pysocks                   1.7.1              pyha2e5f31_6    conda-forge
python                    3.9.19          h0755675_0_cpython    conda-forge
python_abi                3.9                      4_cp39    conda-forge
readline                  8.2                  h8228510_1    conda-forge
reproc                    14.2.4.post0         hd590300_1    conda-forge
reproc-cpp                14.2.4.post0         h59595ed_1    conda-forge
requests                  2.31.0             pyhd8ed1ab_0    conda-forge
ruamel.yaml               0.18.6           py39hd1e30aa_0    conda-forge
ruamel.yaml.clib          0.2.8            py39hd1e30aa_0    conda-forge
setuptools                69.2.0             pyhd8ed1ab_0    conda-forge
tk                        8.6.13          noxft_h4845f30_101    conda-forge
toolz                     0.12.1             pyhd8ed1ab_0    conda-forge
tqdm                      4.66.2             pyhd8ed1ab_0    conda-forge
tzdata                    2024a                h0c530f3_0    conda-forge
urllib3                   2.2.1              pyhd8ed1ab_0    conda-forge
wheel                     0.43.0             pyhd8ed1ab_1    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
yaml-cpp                  0.8.0                h59595ed_0    conda-forge
zstandard                 0.22.0           py39h6e5214e_0    conda-forge
zstd                      1.5.5                hfc55251_0    conda-forge

Additional Context

Here is the full log

@jakirkham jakirkham added the type::bug describes erroneous operation, use severity::* to classify the type label Apr 2, 2024
@github-project-automation github-project-automation bot moved this to 🆕 New in 🧭 Planning Apr 2, 2024
@jakirkham
Copy link
Member Author

cc @raydouglass @mmccarty (for vis)

@jakirkham
Copy link
Member Author

@dholth could you please help us look at this?

Looks like this issue started last week: #899 (comment) (right before a company break)

@raydouglass
Copy link
Member

Just trying to rule out possible issues; we encounter network issues when switching from mamba to conda invocations as well.

https://github.com/rapidsai/docker/actions/runs/8528278564/job/23361587494?pr=650#step:11:236

#11 60.18 CondaHTTPError: HTTP 520 CONNECTION FAILED for url <https://conda.anaconda.org/dask/label/dev/noarch/repodata.json>

https://github.com/rapidsai/docker/actions/runs/8528278564/job/23361586014?pr=650#step:9:1193

#12 156.8 Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='conda.anaconda.org', port=443): Read timed out. (read timeout=60.0)")': /rapidsai-nightly/linux-64/dask-cudf-24.04.00a586-cuda11_py311_240402_g35f818b3e4_586.tar.bz2
#12 343.0 CondaHTTPError: HTTP 000 CONNECTION FAILED for url <https://conda.anaconda.org/rapidsai-nightly/linux-64/dask-cudf-24.04.00a586-cuda11_py311_240402_g35f818b3e4_586.tar.bz2>

@raydouglass
Copy link
Member

raydouglass commented Apr 2, 2024

Also I cannot say with 100% confidence, but so far all of the errors I've seen reviewing logs from the past week involved the rapidsai-nightly or dask/label/dev channels, but could be a coincidence since we download a lot of large packages from rapidsai-nightly and so the odds of a network error is higher.

I have not checked every single error though.

@jakirkham
Copy link
Member Author

cc @mariusvniekerk (for awareness)

@jakirkham
Copy link
Member Author

Also saw this with rapidsai in this CI job

error    libmamba ZSTD decompression error: Unknown frame descriptor
Download error (23) Failed writing received data to disk/application [https://conda.anaconda.org/rapidsai/noarch/repodata.json.zst]
Failure writing output to destination, passed 689 returned 690

@jakirkham
Copy link
Member Author

Have also see this with cf-staging. Snippet below from this GHA job:

E           binstar_client.errors.ServerError: ('?: Undefined error ([GET] https://api.anaconda.org/dist/cf-staging/blah-2696ff/2024.04.03.03.11.49/noarch%2Fblah-2696ff-2024.04.03.03.11.49-py_0.tar.bz2 -> 524)', 524)

@jezdez
Copy link
Member

jezdez commented Apr 4, 2024

Just noting that we're investigating this still

@morremeyer
Copy link

Hey everyone, quick info from the infrastructure side of Anaconda: We're on this and have found an issue in the underlying infrastructure that is likely to cause this. We're going to implement a fix for this suspected cause in the next few hours.

I'll let you know when this has been rolled out.

@morremeyer
Copy link

We have rolled out the configuration changes that should fix these timeouts. Please let us know if you continue to see these issues.

@raydouglass
Copy link
Member

@jezdez & @morremeyer Thanks for the update!

We were consistently encountering the networking errors during one of our workflows over the past week+.

I triggered a rerun and it was able to successfully download all of the conda packages, so seems like things are improved!

@traversaro
Copy link

We have rolled out the configuration changes that should fix these timeouts. Please let us know if you continue to see these issues.

I am continuing seeing these kind of problems in the robotology or robostack-staging channel (posting here as I guess the problem is similar, if you prefer that I open a different issue just let me know). Example CI failure:

Restarting the CI typically solve this issues.

@jakirkham
Copy link
Member Author

Are others still seeing this? If so, a fresh reproducer would help

If not, would suggest we close to clean up the issue tracker and focus efforts on current issues

@traversaro
Copy link

Based on https://github.com/robotology/icub-models/actions/workflows/cxx-ci.yml it sees to me that it is still happening, at a rough rate of 1 in ~20 builds, but much less frequently that happened in early april, when the failure rate was 1 in ~3/4 builds.

Ok for me in closing if keeping the issue open is not useful.

@jakirkham
Copy link
Member Author

Current examples seem worthy of discussion

No strong feelings as to whether that stays in this issue or is moved to a new one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type::bug describes erroneous operation, use severity::* to classify the type
Projects
Status: 🆕 New
Development

No branches or pull requests

5 participants