Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure agents improvements #7612

Closed
wants to merge 22 commits into from
Closed
10 changes: 10 additions & 0 deletions changelogs/unreleased/ensure-agents-improvements.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
description: "Made various improvements to the AutostartedAgent._ensure_agents method"
sections:
bugfix: "Fixed a race condition where autostarted agents might become unresponsive for 30s when restarted"
wouterdb marked this conversation as resolved.
Show resolved Hide resolved
issue-nr: 7612
change-type: patch
destination-branches:
- master
- iso7
- iso6

25 changes: 15 additions & 10 deletions src/inmanta/agent/agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -977,16 +977,18 @@ async def _init_agent_map(self) -> None:
self.agent_map = dict(cfg.agent_map.get())

async def _init_endpoint_names(self) -> None:
if self.hostname is not None:
await self.add_end_point_name(self.hostname)
else:
# load agent names from the config file
agent_names = cfg.agent_names.get()
if agent_names is not None:
for name in agent_names:
if "$" in name:
name = name.replace("$node-name", self.node_name)
await self.add_end_point_name(name)
assert self.agent_map is not None
endpoints: Iterable[str] = (
[self.hostname]
if self.hostname is not None
else (
self.agent_map.keys()
if cfg.use_autostart_agent_map.get()
wouterdb marked this conversation as resolved.
Show resolved Hide resolved
else (name if "$" not in name else name.replace("$node-name", self.node_name) for name in cfg.agent_names.get())
)
)
for endpoint in endpoints:
await self.add_end_point_name(endpoint)

async def stop(self) -> None:
await super().stop()
Expand Down Expand Up @@ -1069,6 +1071,9 @@ async def update_agent_map(self, agent_map: dict[str, str]) -> None:
await self._update_agent_map(agent_map)

async def _update_agent_map(self, agent_map: dict[str, str]) -> None:
if "internal" not in agent_map:
raise ValueError("The internal agent must be present in the agent map")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to hard fail on this? Shouldn't we take a more robust approach, i.e. add the internal agent to the agent_map if it's missing and 2) log a warning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. The current behavior was to drop the internal agent altogether, so I don't think we have to worry too much about it being a breaking change (the previous behavior is just breaking in a different way). In that sense I think we can afford it, and if we can afford it I think it's the sensible thing to do. It also corresponds with the environment setting update behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wouterdb what's your opinion on this?

Copy link
Contributor

@wouterdb wouterdb May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is strange for the agent to reject the entire agent map, in case an entry is missing (that it can easily put in there itself).

I would understand if we reject the user input on the server side, here, I find it tricky, in that it can make existing, working setups stop working altogether. (this is code on the agent startup path)

(the more I think about it, the more I am convinced this can't got into a patch release, it is breaking)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll change this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. added a warning, I think this is definitely good.
  2. auto-add it when it's missing. I could also just keep this part to the old behavior if you prefer.

async with self._instances_lock:
self.agent_map = agent_map
# Add missing agents
Expand Down
10 changes: 6 additions & 4 deletions src/inmanta/data/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
import warnings
from abc import ABC, abstractmethod
from collections import abc, defaultdict
from collections.abc import Awaitable, Callable, Iterable, Sequence
from collections.abc import Awaitable, Callable, Iterable, Sequence, Set
from configparser import RawConfigParser
from contextlib import AbstractAsyncContextManager
from itertools import chain
Expand Down Expand Up @@ -1219,7 +1219,7 @@ def get_connection(
"""
if connection is not None:
return util.nullcontext(connection)
# Make pypi happy
# Make mypy happy
assert cls._connection_pool is not None
return cls._connection_pool.acquire()

Expand Down Expand Up @@ -3344,10 +3344,12 @@ def get_valid_field_names(cls) -> list[str]:
return super().get_valid_field_names() + ["process_name", "status"]

@classmethod
async def get_statuses(cls, env_id: uuid.UUID, agent_names: set[str]) -> dict[str, Optional[AgentStatus]]:
async def get_statuses(
cls, env_id: uuid.UUID, agent_names: Set[str], *, connection: Optional[asyncpg.connection.Connection] = None
) -> dict[str, Optional[AgentStatus]]:
result: dict[str, Optional[AgentStatus]] = {}
for agent_name in agent_names:
agent = await cls.get_one(environment=env_id, name=agent_name)
agent = await cls.get_one(environment=env_id, name=agent_name, connection=connection)
if agent:
result[agent_name] = agent.get_status()
else:
Expand Down
Loading