Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎨 Enhanced error handling and troubleshooting logs helpers #6531

Merged
merged 27 commits into from
Oct 16, 2024

Conversation

pcrespov
Copy link
Member

@pcrespov pcrespov commented Oct 14, 2024

What do these changes do?

This pull request (PR) introduces a maintainable mechanism for handling domain-specific errors that propagate through system interfaces (such as REST or RPC APIs) and converting them into API errors.

In this approach, domain errors are represented by custom exceptions, which are raised throughout the codebase. These exceptions need to be caught at service interfaces, where they are converted into appropriate API errors. The proposed mechanism ensures consistency and maintainability in handling this workflow.

Key changes in this PR:

  • ♻️ Moved error_codes from servicelib to models_library to better align with the domain architecture and centralize error handling.
  • ♻️ Enhanced models_library.error_classes.OsparcErrorMixin:
    • Provides automatic support for the error_code and error_context attributes.
    • All domain-specific exceptions must inherit from OsparcErrorMixin to leverage this functionality.

Workflow:

  1. Raise Domain-Specific Errors: When an error occurs in the code, raise a domain-specific exception (inheriting from OsparcErrorMixin). Capture any relevant information in the error_context for later use.

  2. Handle Errors at Service Interfaces: At the boundaries where domain exceptions meet external interfaces (REST or RPC), catch the exceptions and convert them into the appropriate API error type (e.g., HTTPError).

  3. Logging and Troubleshooting:

    • When logging an exception, if it's a domain-specific error, use create_troubleshooting_log_kwargs to gather all error context information and log the error with _logger.exception.
    • The OsparcErrorMixin ensures that all context data related to the error is logged automatically, and additional context can be added at the logging point if necessary.
    • Furthermore, the logging mechanism automatically extracts extra logging information (e.g. log_uid) from the error context that can be parsed and presented as columns in graylog.

This PR provides a clear, structured approach to handle and log errors in a way that enhances maintainability and transparency in API-level error handling across the system.

Usage example

Assume you raise a domain exception in your code. How is it defined, raised and finally handled to produce a log for troubleshooting (optional) and converted to an rest http error.

  • Define your errors
# my/errors.py
class MyError(OsparcErrorMixin, RuntimeError):
    template_msg = "My error {value}"
  • Raising them and reaching REST API
#my/services.py
def get_my_stuff(id: str):
   ... 
   # HERE add all context error you like, not only the one you need for your message!
   raise MyError( value="stuff", id=id, user_id=123, product_name=product_name)

# my/rest.py

@router.get("/stuff/{id}")
@api_error_handler
def get_stuf(request):
     ...
     get_my_stuff(request["id"])
  • Handling the errors at the rest interface. NOTE: this part will be simplified and moved to a middleware in the coming PRs
# my/rest_errors.py
def api_error_handler(request: web.Request):
    ... 
    except MyError as err:
        # I want to log this error because it is a server error
        user_error_msg = "This stuff is not available right now, sorry!"
         logger.exception(**create_troubleshooting_log_kwargs(
             user_error_msg, 
             error=exc
            error_context = {"request": request}
             # can also add extra context and tips here (optional)
             )
         )         
         # NOTE that we convert the domain error MyError into a REST API error `webHTTPServiceUnavailable`
         raise web.HTTPServiceUnavailable(reason=user_error_msg)
  • If this error gets raised, the the front-end will received a 504 with "This stuff is not available right now, sorry!" and in the bacckend it will log something like
ERROR XXXX.py:123 This stuff is not available right now, sorry!
        {
         "exception_details": "My error 123",
        "context": {
            "user_id": 123,
            "product_name": "foo"
        },
     }
     .. 
     # all the traceback here

Related issue/s

How to test

cd packages/service-library
make install-dev
pytest -vv tests/test_logging_errors.py

Dev-ops checklist

None

@pcrespov pcrespov self-assigned this Oct 14, 2024
Copy link

codecov bot commented Oct 14, 2024

Codecov Report

Attention: Patch coverage is 71.96262% with 30 lines in your changes missing coverage. Please review.

Project coverage is 88.0%. Comparing base (cafbf96) to head (8d0bb1a).
Report is 641 commits behind head on master.

Files with missing lines Patch % Lines
...c/simcore_service_webserver/login/_registration.py 25.0% 6 Missing ⚠️
...e_service_webserver/login/handlers_registration.py 37.5% 5 Missing ⚠️
...ce_webserver/studies_dispatcher/_studies_access.py 40.0% 3 Missing ⚠️
...r/src/simcore_service_webserver/users/_handlers.py 25.0% 3 Missing ⚠️
...odels-library/src/models_library/errors_classes.py 50.0% 2 Missing ⚠️
...mcore_service_api_server/services/log_streaming.py 50.0% 2 Missing ⚠️
...dules/dynamic_sidecar/scheduler/_core/_observer.py 71.4% 2 Missing ⚠️
...er/src/simcore_service_webserver/login/_2fa_api.py 50.0% 2 Missing ⚠️
...pi_server/exceptions/handlers/_handlers_factory.py 87.5% 1 Missing ⚠️
.../src/simcore_service_dynamic_sidecar/core/utils.py 66.6% 1 Missing ⚠️
... and 3 more
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #6531      +/-   ##
=========================================
+ Coverage    84.5%   88.0%    +3.4%     
=========================================
  Files          10    1391    +1381     
  Lines         214   59792   +59578     
  Branches       25    1557    +1532     
=========================================
+ Hits          181   52650   +52469     
- Misses         23    6893    +6870     
- Partials       10     249     +239     
Flag Coverage Δ
integrationtests 64.7% <43.5%> (?)
unittests 85.9% <71.9%> (+1.3%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...s/models-library/src/models_library/error_codes.py 100.0% <ø> (ø)
...library/src/servicelib/aiohttp/rest_middlewares.py 75.2% <100.0%> (ø)
...s/service-library/src/servicelib/logging_errors.py 100.0% <100.0%> (ø)
...es/service-library/src/servicelib/logging_utils.py 78.9% <100.0%> (ø)
...e_director_v2/models/dynamic_services_scheduler.py 98.4% <100.0%> (ø)
...imcore_service_payments/services/notifier_email.py 95.0% <100.0%> (ø)
...rc/simcore_service_webserver/invitations/errors.py 100.0% <100.0%> (ø)
...rver/src/simcore_service_webserver/login/errors.py 100.0% <100.0%> (ø)
.../web/server/src/simcore_service_webserver/utils.py 65.4% <100.0%> (ø)
...src/simcore_service_webserver/wallets/_handlers.py 96.9% <100.0%> (ø)
... and 13 more

... and 1353 files with indirect coverage changes

@pcrespov pcrespov changed the title WIP: Mai/error handling 🎨 Enhanced error handling helper functions Oct 15, 2024
@pcrespov pcrespov changed the title 🎨 Enhanced error handling helper functions 🎨 Enhanced error handling and troubleshooting logs helpers Oct 15, 2024
@pcrespov pcrespov added t:enhancement Improvement or request on an existing feature a:services-library issues on packages/service-libs labels Oct 15, 2024
@pcrespov pcrespov added the t:maintenance Some planned maintenance work label Oct 15, 2024
@pcrespov pcrespov added this to the MartinKippenberger milestone Oct 15, 2024
@pcrespov pcrespov marked this pull request as ready for review October 15, 2024 17:11
@pcrespov pcrespov enabled auto-merge (squash) October 15, 2024 17:24
Copy link
Contributor

@matusdrobuliak66 matusdrobuliak66 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! Thanks.

Copy link
Member

@odeimaiz odeimaiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@GitHK GitHK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice 👍

Copy link

@pcrespov pcrespov merged commit d0e66f1 into ITISFoundation:master Oct 16, 2024
57 checks passed
@pcrespov pcrespov deleted the mai/error-handling branch October 16, 2024 20:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a:services-library issues on packages/service-libs t:enhancement Improvement or request on an existing feature t:maintenance Some planned maintenance work
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants