Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add memray plugin #2875

Merged
merged 31 commits into from
Nov 12, 2024
Merged

Add memray plugin #2875

merged 31 commits into from
Nov 12, 2024

Conversation

fiedlerNr9
Copy link
Contributor

@fiedlerNr9 fiedlerNr9 commented Oct 29, 2024

Why are the changes needed?

  • Enables memray profiling on Flyte task level
  • renders memray report into Flytedeck

What changes were proposed in this pull request?

  • Adding memray flytekit plugin

How was this patch tested?

  • unit tests
  • tested local & remote run

Setup process

from flytekit import workflow, task, ImageSpec
from flytekitplugins.memray import memray_profiling
import time


flytekit_hash = "82d5ac739f5f02998edb9538c58cf93c8f6e501b"
flytekitplugins_memray = f"git+https://github.com/flyteorg/flytekit.git@{flytekit_hash}#subdirectory=plugins/flytekit-memray"

image = ImageSpec(
    name="memray_demo",
    python_version="3.11.10",
    apt_packages=["git"],
    packages=[flytekitplugins_memray],
    registry="ghcr.io/fiedlernr9",
)


def generate_data(n: int):
    leak_list = []
    for _ in range(n):  # Arbitrary large number for demonstration
        large_data = " " * 10**6  # 1 MB string
        leak_list.append(large_data)  # Keeps appending without releasing
        time.sleep(0.1)  # Slow down the loop to observe memory changes


@task(container_image=image, enable_deck=True)
@memray_profiling(memray_html_reporter="table")
def memory_usage(n: int) -> str:
    generate_data(n=n)

    return "Well"


@task(container_image=image, enable_deck=True)
@memray_profiling(trace_python_allocators=True, memray_reporter_args=["--leaks"])
def memory_leakage(n: int) -> str:
    generate_data(n=n)

    return "Well"


@workflow
def wf(n: int = 500):
    memory_usage(n=n)
    memory_leakage(n=n)

Screenshots

Flamegraph

image

Table

image

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

Docs link

Copy link

codecov bot commented Oct 29, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 43.89%. Comparing base (3fc51af) to head (3541f5e).
Report is 31 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2875      +/-   ##
==========================================
- Coverage   45.53%   43.89%   -1.65%     
==========================================
  Files         196      199       +3     
  Lines       20418    20820     +402     
  Branches     2647     2676      +29     
==========================================
- Hits         9298     9138     -160     
- Misses      10658    11448     +790     
+ Partials      462      234     -228     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@eapolinario eapolinario left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fiedlerNr9
Copy link
Contributor Author

This is wicked cool!

Can you add flytekit-memray to https://github.com/flyteorg/flytekit/blob/master/.github/workflows/pythonbuild.yml#L319-L364 ?

image = ImageSpec(
name="memray_demo",
packages=["flytekitplugins_memray"],
env={"PYTHONMALLOC": "malloc"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of hard coding this into the environment, can we now trace_python_allocators=True?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested that and its not throwing any warnings but the results look different though:
Thats the task I tested without having the env variable set:

@task(container_image=image, enable_deck=True)
@memray_profiling(trace_python_allocators=True, memray_reporter_args=["--leaks"])
def memory_leakage(n: int) -> str:
    generate_data(n=n)

    return "Well"

Thats the result:
image

Not sure if this is expected

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For completeness, what do you see when you set trace_python_allocators=False?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This, which is expected i guess
image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's weird that it gives two different flamegraphs. The new flamegraph makes more sense to me because the tracker is wrapping the user code and you can clearly see the generete_data.

I can not really see where the generate_data is on your original flamegraph.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh man, I mixed up my demos. Its looking exactly the same with trace_python_allocaters=False and having the env variable set. Sorry for the confusion, I will update the Readme shortly
image

Comment on lines +7 to +8
@task(enable_deck=True)
@memray_profiling
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not actionable for this PR I wish there was a way to ensure that enable_deck=True when using memray_profiling. Otherwise, we just add overhead without any reports.

@eapolinario @pingsutw What do you think of making deck_fields=None and set enable_decks=True?

https://github.com/flyteorg/flytekit/blob/master/flytekit/core/task.py#L203-L210

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we added a way to carry over task kwargs from successive flytekit decorators? Something like #2911.

Comment on lines 2 to 10
.. currentmodule:: flytekitplugins.wandb

This package contains things that are useful when extending Flytekit.

.. autosummary::
:template: custom.rst
:toctree: generated/

wandb_init
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace mentions to wandb

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly is supposed to be in this comment section of __init__.py?
I planned to just get rid of it actually?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exposed as the landing page for the plugin in the docs, e.g. https://docs.flyte.org/en/latest/api/flytekit/plugins/wandb.html

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, in order to show this new plugin in https://docs.flyte.org/en/latest/api/flytekit/plugins/index.html we need to update the index and add the corresponding file under like the one for wandb.

Copy link
Contributor Author

@fiedlerNr9 fiedlerNr9 Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch - updated!
edit: uargh, looks like changes in flytesnacks and flyte are necessary to get the mondocs build to succeed- will follow up

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed adding docs from this PR, since we need a published version of thi plugin on pypi to continue with the process of adding docs to flytesnacks & flyte. Will follow up with docs as soon this is merged.
Ok with you @eapolinario & can you please have a last review if times allows

ppiegaze
ppiegaze previously approved these changes Nov 11, 2024
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
fiedlerNr9 and others added 21 commits November 11, 2024 14:38
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
This reverts commit 621756e.

Signed-off-by: Jan Fiedler <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>
@eapolinario eapolinario enabled auto-merge (squash) November 12, 2024 00:33
@eapolinario eapolinario merged commit 3f0ab84 into master Nov 12, 2024
28 of 29 checks passed
@fiedlerNr9 fiedlerNr9 deleted the add-memray-plugin branch November 12, 2024 00:51
katrogan pushed a commit that referenced this pull request Nov 15, 2024
* wip

Signed-off-by: Jan Fiedler <[email protected]>

* wip

Signed-off-by: Jan Fiedler <[email protected]>

* wip

Signed-off-by: Jan Fiedler <[email protected]>

* wip

Signed-off-by: Jan Fiedler <[email protected]>

* wip

Signed-off-by: Jan Fiedler <[email protected]>

* wip

Signed-off-by: Jan Fiedler <[email protected]>

* wip

Signed-off-by: Jan Fiedler <[email protected]>

* wip

Signed-off-by: Jan Fiedler <[email protected]>

* wip

Signed-off-by: Jan Fiedler <[email protected]>

* wip

Signed-off-by: Jan Fiedler <[email protected]>

* rename memray_profiling

Signed-off-by: Jan Fiedler <[email protected]>

* finish readme

Signed-off-by: Jan Fiedler <[email protected]>

* adjust memray_reporter_args type

Signed-off-by: Jan Fiedler <[email protected]>

* ruff check --fix

Signed-off-by: Jan Fiedler <[email protected]>

* ruff format

Signed-off-by: Jan Fiedler <[email protected]>

* codespell

Signed-off-by: Jan Fiedler <[email protected]>

* add flytekit-memray to pythonbuild workflows

Signed-off-by: Jan Fiedler <[email protected]>

* allow memray.Tracker arguments in profiling

Signed-off-by: Jan Fiedler <[email protected]>

* extend memray_profiling args description

Signed-off-by: Jan Fiedler <[email protected]>

* spelling

Signed-off-by: Jan Fiedler <[email protected]>

* move tests

Signed-off-by: Jan Fiedler <[email protected]>

* move tests again 🤡

Signed-off-by: Jan Fiedler <[email protected]>

* adjust README.md to not use PYMALLOC env variable

Signed-off-by: Jan Fiedler <[email protected]>

* Update plugins/flytekit-memray/flytekitplugins/memray/profiling.py

Co-authored-by: Thomas J. Fan <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>

* Update plugins/flytekit-memray/flytekitplugins/memray/profiling.py

Co-authored-by: Thomas J. Fan <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>

* Update plugins/flytekit-memray/flytekitplugins/memray/profiling.py

Co-authored-by: Thomas J. Fan <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>

* add import sys

Signed-off-by: Jan Fiedler <[email protected]>

* adjust memray __init__.py

Signed-off-by: Jan Fiedler <[email protected]>

* plugin docs

Signed-off-by: Jan Fiedler <[email protected]>

* Revert "plugin docs"

This reverts commit 621756e.

Signed-off-by: Jan Fiedler <[email protected]>

* support from python 3.9

Signed-off-by: Jan Fiedler <[email protected]>

---------

Signed-off-by: Jan Fiedler <[email protected]>
Co-authored-by: Thomas J. Fan <[email protected]>
Signed-off-by: Katrina Rogan <[email protected]>
400Ping pushed a commit to 400Ping/flytekit that referenced this pull request Nov 22, 2024
* wip

Signed-off-by: Jan Fiedler <[email protected]>

* wip

Signed-off-by: Jan Fiedler <[email protected]>

* wip

Signed-off-by: Jan Fiedler <[email protected]>

* wip

Signed-off-by: Jan Fiedler <[email protected]>

* wip

Signed-off-by: Jan Fiedler <[email protected]>

* wip

Signed-off-by: Jan Fiedler <[email protected]>

* wip

Signed-off-by: Jan Fiedler <[email protected]>

* wip

Signed-off-by: Jan Fiedler <[email protected]>

* wip

Signed-off-by: Jan Fiedler <[email protected]>

* wip

Signed-off-by: Jan Fiedler <[email protected]>

* rename memray_profiling

Signed-off-by: Jan Fiedler <[email protected]>

* finish readme

Signed-off-by: Jan Fiedler <[email protected]>

* adjust memray_reporter_args type

Signed-off-by: Jan Fiedler <[email protected]>

* ruff check --fix

Signed-off-by: Jan Fiedler <[email protected]>

* ruff format

Signed-off-by: Jan Fiedler <[email protected]>

* codespell

Signed-off-by: Jan Fiedler <[email protected]>

* add flytekit-memray to pythonbuild workflows

Signed-off-by: Jan Fiedler <[email protected]>

* allow memray.Tracker arguments in profiling

Signed-off-by: Jan Fiedler <[email protected]>

* extend memray_profiling args description

Signed-off-by: Jan Fiedler <[email protected]>

* spelling

Signed-off-by: Jan Fiedler <[email protected]>

* move tests

Signed-off-by: Jan Fiedler <[email protected]>

* move tests again 🤡

Signed-off-by: Jan Fiedler <[email protected]>

* adjust README.md to not use PYMALLOC env variable

Signed-off-by: Jan Fiedler <[email protected]>

* Update plugins/flytekit-memray/flytekitplugins/memray/profiling.py

Co-authored-by: Thomas J. Fan <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>

* Update plugins/flytekit-memray/flytekitplugins/memray/profiling.py

Co-authored-by: Thomas J. Fan <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>

* Update plugins/flytekit-memray/flytekitplugins/memray/profiling.py

Co-authored-by: Thomas J. Fan <[email protected]>
Signed-off-by: Jan Fiedler <[email protected]>

* add import sys

Signed-off-by: Jan Fiedler <[email protected]>

* adjust memray __init__.py

Signed-off-by: Jan Fiedler <[email protected]>

* plugin docs

Signed-off-by: Jan Fiedler <[email protected]>

* Revert "plugin docs"

This reverts commit 621756e.

Signed-off-by: Jan Fiedler <[email protected]>

* support from python 3.9

Signed-off-by: Jan Fiedler <[email protected]>

---------

Signed-off-by: Jan Fiedler <[email protected]>
Co-authored-by: Thomas J. Fan <[email protected]>
Signed-off-by: 400Ping <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants