Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(opensearch): capture logs from Dask cluster pods #616

Merged
merged 1 commit into from
Jan 22, 2025

Conversation

Alputer
Copy link
Member

@Alputer Alputer commented Nov 26, 2024

This PR adds the collection of logs for Dask pods, namely scheduler and workers.

Closes #610

@Alputer Alputer self-assigned this Nov 26, 2024
Copy link

codecov bot commented Nov 26, 2024

Codecov Report

Attention: Patch coverage is 68.42105% with 6 lines in your changes missing coverage. Please review.

Project coverage is 75.44%. Comparing base (afd1400) to head (51fad95).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
reana_workflow_controller/opensearch.py 64.28% 5 Missing ⚠️
reana_workflow_controller/rest/utils.py 80.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #616      +/-   ##
==========================================
- Coverage   75.57%   75.44%   -0.13%     
==========================================
  Files          17       17              
  Lines        1883     1898      +15     
==========================================
+ Hits         1423     1432       +9     
- Misses        460      466       +6     
Files with missing lines Coverage Δ
reana_workflow_controller/rest/utils.py 82.50% <80.00%> (-0.09%) ⬇️
reana_workflow_controller/opensearch.py 89.79% <64.28%> (-10.21%) ⬇️

Alputer added a commit to Alputer/reana-workflow-controller that referenced this pull request Nov 26, 2024
Alputer added a commit to Alputer/reana-workflow-controller that referenced this pull request Jan 20, 2025
Alputer added a commit to Alputer/reana-workflow-controller that referenced this pull request Jan 21, 2025
@@ -63,11 +63,13 @@ def __init__(
os_client: OpenSearch | None = None,
job_index: str = "fluentbit-job_log",
workflow_index: str = "fluentbit-workflow_log",
dask_index: str = "fluentbit-dask_log",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: ‏For the release notes, please amend the commit log headline scope to opensearch and capitalise Dask:

feat(opensearch): capture logs from Dask cluster pods (#616)


def fetch_dask_worker_logs(self, workflow_id: str) -> str | None:
"""
Fetch logs of the workers of a Dask cluster.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can document -- here or elsewhere in the code documentation or in the main commit log message body -- that the logs collected from Dask workers are "propagated" to all REANA job logs, so that if there are N jobs using the same cluster, the Dask logs are currently "multiplicated" and they show under each job log. So that we don't forget this by-product of the current approach. (And we could open an issues about this later.)

Alputer added a commit to Alputer/reana-workflow-controller that referenced this pull request Jan 22, 2025
Currently, the logs collected from Dask workers and scheduler are propogated to all REANA job logs. It is not ideal since Dask logs are multiplicated for different steps which use the same Dask cluster and the multiplicated logs are shown for each step which might be confusing and not user-friendly. This change requires a larger architectural change and is deferred to a future commit.

Closes reanahub#610
@Alputer Alputer changed the title feat(dask): capture logs from dask pods feat(opensearch): capture logs from Dask cluster pods Jan 22, 2025
Alputer added a commit to Alputer/reana-workflow-controller that referenced this pull request Jan 22, 2025
Currently, the logs collected from Dask workers and scheduler are propogated to all REANA job logs. It is not ideal since Dask logs are multiplicated for different steps which use the same Dask cluster and the multiplicated logs are shown for each step which might be confusing and not user-friendly. Seperating Dask logs and job logs requires a larger architectural change and is deferred to a future commit.

Closes reanahub#610
Alputer added a commit to Alputer/reana-workflow-controller that referenced this pull request Jan 22, 2025
This commit collects logs from Dask scheduler and workers and propagates\nthem to all REANA jobs that are using the same Dask cluster. This is not\nideal, since Dask logs can become thusly duplicated for different\nworkflow steps of the workflow, which could be confusing for the user.\n\nHowever, when a user uses Dask to parallelise the workflow jobs, usually\nthe workflow steps are defined only within Dask, so this situation does\nnot occur. Hence we can afford doing this in usual real-life conditions.\n\nSeparating Dask scheduler and worker logs from regular Kubernetes job\nlogs would require a larger architectural change and is therefore\ndeferred to a future commit.

Closes reanahub#610
This commit collects logs from Dask scheduler and workers and propagates
them to all REANA jobs that are using the same Dask cluster. This is not
ideal, since Dask logs can become thusly duplicated for different
workflow steps of the workflow, which could be confusing for the user.

However, when a user uses Dask to parallelise the workflow jobs, usually
the workflow steps are defined only within Dask, so this situation does
not occur. Hence we can afford doing this in usual real-life conditions.

Separating Dask scheduler and worker logs from regular Kubernetes job
logs would require a larger architectural change and is therefore
deferred to a future commit.

Closes reanahub#610
Copy link
Member

@tiborsimko tiborsimko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works nicely 👍

@tiborsimko tiborsimko merged commit 51fad95 into reanahub:master Jan 22, 2025
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

dask: capture live logs from running Dask scheduler and worker pods
2 participants