Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operator fails to down-scale cluster because of "asynchronous connection attempt underway" #100

Closed
chaudum opened this issue Nov 3, 2020 · 4 comments

Comments

@chaudum
Copy link
Contributor

chaudum commented Nov 3, 2020

I attempted to down-scaling a 5node cluster to 3 nodes.
The cluster was empty (no tables).

[2020-11-03 10:52:51,596] kopf.objects         [WARNING ] [60ca4860-05c4-46e2-a7f8-3a90c260f8a1/fdc04dad-52f1-4c1e-b84d-d730a7012402] Failed to connect to cluster
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/crate/operator/cratedb.py", line 148, in wait_for_healthy_cluster
    async with connection_factory() as conn:
  File "/usr/local/lib/python3.8/site-packages/aiopg/utils.py", line 94, in __aenter__
    self._obj = await self._coro
  File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 551, in _connect
    await self._poll(self._waiter, self._timeout)
  File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 209, in _poll
    await asyncio.shield(cancel(), loop=self._loop)
  File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 195, in cancel
    self._conn.cancel()
psycopg2.OperationalError: asynchronous connection attempt underway
[2020-11-03 10:54:51,660] kopf.objects         [WARNING ] [60ca4860-05c4-46e2-a7f8-3a90c260f8a1/fdc04dad-52f1-4c1e-b84d-d730a7012402] Failed to connect to cluster
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/crate/operator/cratedb.py", line 148, in wait_for_healthy_cluster
    async with connection_factory() as conn:
  File "/usr/local/lib/python3.8/site-packages/aiopg/utils.py", line 94, in __aenter__
    self._obj = await self._coro
  File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 551, in _connect
    await self._poll(self._waiter, self._timeout)
  File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 209, in _poll
    await asyncio.shield(cancel(), loop=self._loop)
  File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 195, in cancel
    self._conn.cancel()
psycopg2.OperationalError: asynchronous connection attempt underway
[2020-11-03 10:56:51,713] kopf.objects         [WARNING ] [60ca4860-05c4-46e2-a7f8-3a90c260f8a1/fdc04dad-52f1-4c1e-b84d-d730a7012402] Failed to connect to cluster
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/crate/operator/cratedb.py", line 148, in wait_for_healthy_cluster
    async with connection_factory() as conn:
  File "/usr/local/lib/python3.8/site-packages/aiopg/utils.py", line 94, in __aenter__
    self._obj = await self._coro
  File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 551, in _connect
    await self._poll(self._waiter, self._timeout)
  File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 209, in _poll
    await asyncio.shield(cancel(), loop=self._loop)
  File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 195, in cancel
    self._conn.cancel()
psycopg2.OperationalError: asynchronous connection attempt underway
[2020-11-03 10:58:51,766] kopf.objects         [WARNING ] [60ca4860-05c4-46e2-a7f8-3a90c260f8a1/fdc04dad-52f1-4c1e-b84d-d730a7012402] Failed to connect to cluster
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/crate/operator/cratedb.py", line 148, in wait_for_healthy_cluster
    async with connection_factory() as conn:
  File "/usr/local/lib/python3.8/site-packages/aiopg/utils.py", line 94, in __aenter__
    self._obj = await self._coro
  File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 551, in _connect
    await self._poll(self._waiter, self._timeout)
  File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 209, in _poll
    await asyncio.shield(cancel(), loop=self._loop)
  File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 195, in cancel
    self._conn.cancel()
psycopg2.OperationalError: asynchronous connection attempt underway
[2020-11-03 11:00:51,865] kopf.objects         [WARNING ] [60ca4860-05c4-46e2-a7f8-3a90c260f8a1/fdc04dad-52f1-4c1e-b84d-d730a7012402] Failed to connect to cluster
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/crate/operator/cratedb.py", line 148, in wait_for_healthy_cluster
    async with connection_factory() as conn:
  File "/usr/local/lib/python3.8/site-packages/aiopg/utils.py", line 94, in __aenter__
    self._obj = await self._coro
  File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 551, in _connect
    await self._poll(self._waiter, self._timeout)
  File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 209, in _poll
    await asyncio.shield(cancel(), loop=self._loop)
  File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 195, in cancel
    self._conn.cancel()
psycopg2.OperationalError: asynchronous connection attempt underway
[2020-11-03 11:02:51,923] kopf.objects         [WARNING ] [60ca4860-05c4-46e2-a7f8-3a90c260f8a1/fdc04dad-52f1-4c1e-b84d-d730a7012402] Failed to connect to cluster
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/crate/operator/cratedb.py", line 148, in wait_for_healthy_cluster
    async with connection_factory() as conn:
  File "/usr/local/lib/python3.8/site-packages/aiopg/utils.py", line 94, in __aenter__
    self._obj = await self._coro
  File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 551, in _connect
    await self._poll(self._waiter, self._timeout)
  File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 209, in _poll
    await asyncio.shield(cancel(), loop=self._loop)
  File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 195, in cancel
    self._conn.cancel()
psycopg2.OperationalError: asynchronous connection attempt underway
[2020-11-03 11:04:52,035] kopf.objects         [ERROR   ] [60ca4860-05c4-46e2-a7f8-3a90c260f8a1/fdc04dad-52f1-4c1e-b84d-d730a7012402] Failed to scale cluster
Traceback (most recent call last):
  File "main.py", line 472, in cluster_update
    await with_timeout(
  File "main.py", line 107, in with_timeout
    await awaitable
  File "/usr/local/lib/python3.8/asyncio/tasks.py", line 491, in wait_for
    return fut.result()
  File "/usr/local/lib/python3.8/site-packages/crate/operator/scale.py", line 609, in scale_cluster
    total_nodes = await scale_cluster_data_nodes(
  File "/usr/local/lib/python3.8/site-packages/crate/operator/scale.py", line 377, in scale_cluster_data_nodes
    new_total_nodes = await scale_up_statefulset(
  File "/usr/local/lib/python3.8/site-packages/crate/operator/scale.py", line 226, in scale_up_statefulset
    await wait_for_healthy_cluster(conn_factory, new_total_nodes, logger)
  File "/usr/local/lib/python3.8/site-packages/crate/operator/cratedb.py", line 189, in wait_for_healthy_cluster
    raise e
  File "/usr/local/lib/python3.8/site-packages/crate/operator/cratedb.py", line 148, in wait_for_healthy_cluster
    async with connection_factory() as conn:
  File "/usr/local/lib/python3.8/site-packages/aiopg/utils.py", line 94, in __aenter__
    self._obj = await self._coro
  File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 551, in _connect
    await self._poll(self._waiter, self._timeout)
  File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 209, in _poll
    await asyncio.shield(cancel(), loop=self._loop)
  File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 195, in cancel
    self._conn.cancel()
psycopg2.OperationalError: asynchronous connection attempt underway
[2020-11-03 11:04:52,037] kopf.objects         [INFO    ] [60ca4860-05c4-46e2-a7f8-3a90c260f8a1/fdc04dad-52f1-4c1e-b84d-d730a7012402] POSTing to https://bregenz.a1.cratedb.cloud/mgmt/v1/clusters/operator/ because of WebhookEvent.SCALE on cluster 60ca4860-05c4-46e2-a7f8-3a90c260f8a1/fdc04dad-52f1-4c1e-b84d-d730a7012402
[2020-11-03 11:04:58,317] kopf.objects         [INFO    ] [60ca4860-05c4-46e2-a7f8-3a90c260f8a1/fdc04dad-52f1-4c1e-b84d-d730a7012402] Successfully POSTed to https://bregenz.a1.cratedb.cloud/mgmt/v1/clusters/operator/ because of WebhookEvent.SCALE on cluster 60ca4860-05c4-46e2-a7f8-3a90c260f8a1/fdc04dad-52f1-4c1e-b84d-d730a7012402. Status: 200, Body: '{"success":true}\n'
[2020-11-03 11:04:58,319] kopf.objects         [ERROR   ] [60ca4860-05c4-46e2-a7f8-3a90c260f8a1/fdc04dad-52f1-4c1e-b84d-d730a7012402] Handler 'cluster_update' failed with an exception. Will retry.
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/kopf/reactor/handling.py", line 259, in execute_handler_once
    result = await invoke_handler(
  File "/usr/local/lib/python3.8/site-packages/kopf/reactor/handling.py", line 358, in invoke_handler
    result = await invocation.invoke(
  File "/usr/local/lib/python3.8/site-packages/kopf/reactor/invocation.py", line 125, in invoke
    result = await fn(*args, **kwargs)  # type: ignore
  File "main.py", line 472, in cluster_update
    await with_timeout(
  File "main.py", line 107, in with_timeout
    await awaitable
  File "/usr/local/lib/python3.8/asyncio/tasks.py", line 491, in wait_for
    return fut.result()
  File "/usr/local/lib/python3.8/site-packages/crate/operator/scale.py", line 609, in scale_cluster
    total_nodes = await scale_cluster_data_nodes(
  File "/usr/local/lib/python3.8/site-packages/crate/operator/scale.py", line 377, in scale_cluster_data_nodes
    new_total_nodes = await scale_up_statefulset(
  File "/usr/local/lib/python3.8/site-packages/crate/operator/scale.py", line 226, in scale_up_statefulset
    await wait_for_healthy_cluster(conn_factory, new_total_nodes, logger)
  File "/usr/local/lib/python3.8/site-packages/crate/operator/cratedb.py", line 189, in wait_for_healthy_cluster
    raise e
  File "/usr/local/lib/python3.8/site-packages/crate/operator/cratedb.py", line 148, in wait_for_healthy_cluster
    async with connection_factory() as conn:
  File "/usr/local/lib/python3.8/site-packages/aiopg/utils.py", line 94, in __aenter__
    self._obj = await self._coro
  File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 551, in _connect
    await self._poll(self._waiter, self._timeout)
  File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 209, in _poll
    await asyncio.shield(cancel(), loop=self._loop)
  File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 195, in cancel
    self._conn.cancel()
psycopg2.OperationalError: asynchronous connection attempt underway
...
@MarkusH
Copy link
Contributor

MarkusH commented Nov 3, 2020

This could be related to aio-libs/aiopg#275

@chaudum
Copy link
Contributor Author

chaudum commented Nov 3, 2020

This could be related to aio-libs/aiopg#275

🤔 I was once visiting this issue already. But cannot remember the context.

Replacing localhost with 127.0.0.1 would at least mitigate that issue, even it the root cause of this issue is a different one.

@chaudum
Copy link
Contributor Author

chaudum commented Nov 3, 2020

This could be related to aio-libs/aiopg#275

thinking I was once visiting this issue already. But cannot remember the context.

Replacing localhost with 127.0.0.1 would at least mitigate that issue, even it the root cause of this issue is a different one.

We don't use localhost anywhere, though (only when using the crash command). However, we have the same conditions described in aio-libs/aiopg#579

@SStorm
Copy link
Contributor

SStorm commented Oct 12, 2021

I don't think we've seen this happen for a long time, closing due to inactivity.

@SStorm SStorm closed this as completed Oct 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants