Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create connector record in agent component #2861

Merged
merged 7 commits into from
Oct 7, 2024

Conversation

jedrazb
Copy link
Member

@jedrazb jedrazb commented Oct 1, 2024

Closes https://github.com/elastic/search-team/issues/8265

This change introduces the ConnectorRecordManager, which ensures that connectors associated with a given agent are created on startup using the ensure_connector_records_exist function.

The check-in handler has been updated to inspect both input and output units passed to the component from the agent protocol. The assumption is that each connector component will receive no more than one output of type ES and one input of type connectors-py. I've added logging and additional logic to detect and handle edge cases.

If more then one connector component is defined in the policy user will see a log: Multiple connector inputs detected. The first connector input defined in the agent policy will be used.

Note

We could change this and allow adding multiple connector inputs to the same policy, hence allowing to run multiple connectors on a same agent(less) deployment. This could be more cost efficient, while leaving an option to control what policy is deployed where to manage resource.

Raising this now, but we can revisit later. For now we constrain single connector input <-> single policy <-> single deployment.

Additionally, I implemented logic in the config wrapper to identify changes in the connector configurations. I also removed the configuration that forced the connector service to start in native mode, as this is no longer necessary given that we now have access to the IDs of the connectors we manage.

Validation

E2e tests + unit tests

Pre-Review Checklist

  • this PR does NOT contain credentials of any kind, such as API keys or username/passwords (double check config.yml.example)
  • this PR has a meaningful title
  • this PR links to all relevant github issues that it fixes or partially addresses
  • if there is no GH issue, please create it. Each PR should have a link to an issue
  • this PR has a thorough description
  • Covered the changes with automated tests
  • Tested the changes locally
  • Added a label for each target release version (example: v7.13.2, v7.14.0, v8.0.0)

@jedrazb jedrazb marked this pull request as ready for review October 3, 2024 14:40
@jedrazb jedrazb requested a review from a team as a code owner October 3, 2024 14:40
)

if not connector_name:
connector_name = f"[Agentless] {service_type} connector"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried if I create two gdrive connectors, they'll have duplicate names. Is anything preventing that?

Also, I'm not sure we want "Agentless" to leak out into public-facing terminology. Or at least, not in Search pages (it might in Integrations pages). I'd suggest "Elastic-managed" or "Agent", but don't love either and am open to alternatives.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use Elastic-managed for now then :) this should be compliant with new terminology

Copy link
Member Author

@jedrazb jedrazb Oct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm appending random 4 chars at the end of autogenerate connector name ... this should be fine to stay unique (IDs present in agent policy are toooo long to put in UI)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2024-10-04 at 13 33 10

if not elasticsearch_config:
return False

if "host" not in elasticsearch_config or "api_key" not in elasticsearch_config:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be checking for cloud_id instead of api_key?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or maybe in addition?
We must have either host or cloud_id, and we MUST have api_key. Right?

Copy link
Member Author

@jedrazb jedrazb Oct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are checking elasticsearch config as seen by the connector service.

cloud_id is a first for me, I don't see a single reference to it in the repo https://github.com/search?q=repo%3Aelastic%2Fconnectors%20cloud_id&type=code (also in the example config I don't see it)

Also, agent doesn't specify the cloud_id anywhere https://www.elastic.co/guide/en/fleet/current/elasticsearch-output.html

Can you explain what do you mean with cloud id? I don't think we must check for it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cloud ID is present for all cloud deployments and serverless projects.

Screenshot 2024-10-04 at 2 47 59 PM

And it should be the defacto way to connect to Elasticsearch, see: https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/connecting.html#connect-ec

When connecting to Elastic Cloud with the Python Elasticsearch client you should always use the cloud_id parameter to connect.

I'm surprised we don't support an elasticsearch.cloud_id in our YAML config, IDK how I never noticed that was missing. We should fix that, but I suppose it's a problem for another day. We can disregard it for this PR I suppose, and leave the logic as is, but understand now you're checking to make sure that both host and api key are provided.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for explaining. I had no idea we could use cloud id for connection. I will file an issue to support the cloud id in the config YML!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

connectors/agent/connector_record_manager.py Outdated Show resolved Hide resolved
…om:elastic/connectors-python into create-connector-record-in-agent-component
@jedrazb jedrazb requested a review from seanstory October 4, 2024 13:49
@jedrazb jedrazb merged commit ce61f05 into main Oct 7, 2024
2 checks passed
@jedrazb jedrazb deleted the create-connector-record-in-agent-component branch October 7, 2024 07:40
Comment on lines +53 to +58
await self.connector_index.connector_put(
connector_id=connector_id,
service_type=service_type,
connector_name=connector_name,
)
logger.info(f"Created connector record for {connector_id}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you feel on also filling up RCFs here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be nice especially on serverless as afaik it would take a bit of time until this gets autopopulated, will file an enhancement issue to track it 👍 we might want to test this thoroughly with new connector api _configuration endpoint

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +36 to +37
if not self.connector_index:
self.connector_index = ConnectorIndex(agent_config.get("elasticsearch"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make it a cached property instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, we don't have access to agent_config when instantiating ConnectorRecordManager, the ensure_connector_records_exist is called from checking handler with up to date config, so cached_property can't be used.

jedrazb added a commit that referenced this pull request Oct 25, 2024
@jedrazb jedrazb mentioned this pull request Oct 25, 2024
jedrazb added a commit that referenced this pull request Nov 7, 2024
Co-authored-by: Artem Shelkovnikov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants