-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port ErrorMonitor class from ruby connectors #2671
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love it.
I'm fine to have the configuration be a follow-up. Just as long as it isn't used before it's configurable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few little nites, but largely! looks good!
except ForceCanceledError: | ||
raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also re-raise aio CanceledError?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure it's needed. I'll double check and add if it is
connectors/source.py
Outdated
@@ -767,12 +780,14 @@ async def download_and_extract_file( | |||
doc = await self.handle_file_content_extraction( | |||
doc, source_filename, temp_filename | |||
) | |||
self.error_monitor.track_success() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
won't this end up double-counting? Since sink.py
also uses the monitor for the lazy downloads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, will fix!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really know what to do with that, we have several potential points of failure/success:
Extraction:
- File was successfully downloaded
- File was not successfully downloaded
Ingestion:
- File was successfully ingested
- File was not successfully ingested
It's possible to have all the permutations!
For example, file was not downloaded successfully, but we still choose to upload its metadata to Elasticsearch and everything worked out - is it a failure?
I'll open this up to Slack
connectors/source.py
Outdated
def with_error_monitoring(self): | ||
try: | ||
yield | ||
self.error_monitor.track_success() | ||
except Exception as ex: | ||
self._logger.error(ex) | ||
self.error_monitor.track_error(ex) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something we had in ent-search was the concept of "fatal exceptions". See: https://github.com/elastic/ent-search/blob/a1ddc884ad1543f61a91da7b9511a01090adcac5/connectors/lib/connectors/content_sources/base/extractor.rb#L132-L139
it might be worth adding that here too, for things like aio CanceledError, OOME, etc, to make sure we fail-fast when necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah good call. OOME can't get caught, but stuff like CancelledError + having a custom fatal exception class should help
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, since we don't have code that uses this I'd address it later once we actually use this code somewhere, for now this doesn't make any difference.
I'd also like to separately add handlers for terminal exceptions - for example our download function and some other do catch Exception
that can catch and "handle" cancelled errors in the weird way
Looks like this may need a rebase - I think you've pulled in other commits you may not have meant to. |
Co-authored-by: Sean Story <[email protected]>
Co-authored-by: Sean Story <[email protected]>
4a92c61
to
7dba43e
Compare
Related: #1582 (maybe even "fixes"?) |
Porting the ErrorMonitor class from https://github.com/elastic/connectors-ruby/blob/main/lib/utility/error_monitor.rb
ErrorMonitor class is there to add transient error handling and ignore errors until certain threshold.
For example, default error monitor class can be used from any class implementing
BaseDataSource
with the following code:Alternative way of usage:
What it will do:
What does "too many errors" mean?
with self.with_error_monitoring()
- default number is 10with self.with_error_monitoring()
- by default it's 15%This PR already makes use of ErrorMonitor for generic framework operations.
1 error monitor is created and passed to: Connector, Sink and Extractor. Sink and Extractor already track errors, connectors can track errors on per connector basis.
Sink uses error monitor to track errors while ingesting data - each bulk operation is checked and each document that was not ingested into Elasticsearch triggers an "error" tracking, each successfully ingested document produces "success mapping". In practice with default settings it means, for example, that if 100 documents are sent to Elasticsearch and 16 of them fail, sync will fail.
Same happens for downloads - if a download succeeds, we track is as "success" in Error Monitor, if it fails, it's a "failure" and Error Monitor will transiently skip it unless too many errors happened.
If sync is terminated due to this error, Kibana UI will show reason (too many errors in total, too many consequent errors, error rate is too high) along with the last error that happened.
So, as an example, if during a sync 200 documents fail to be downloaded and 801 documents fail to be ingested to Elasticsearch, sync will fail.