Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise memory consumption, CPU usage and disk writes #381

Merged

Conversation

nikita-tkachenko-datadog
Copy link
Collaborator

@nikita-tkachenko-datadog nikita-tkachenko-datadog commented Jan 24, 2024

Requirements for Contributing to this repository

  • Fill out the template below. Any pull request that does not include enough information to be reviewed in a timely manner may be closed at the maintainers' discretion.
  • The pull request must only fix one issue at the time.
  • The pull request must update the test suite to demonstrate the changed functionality.
  • After you create the pull request, all status checks must be pass before a maintainer reviews your contribution. For more details, please see CONTRIBUTING.

What does this PR do?

This PR optimises the plugin's resource consumption:

  • heap memory
  • CPU usage
  • disk I/O

The motivation is that there are several user complaints stating that the overhead of the plugin is too high.

The overhead is mostly caused by how Jenkins' durability mechanism interacts with the plugin. By default Jenkins saves to disk the state of the build at every step (e.g. after finishing a pipeline stage) or with every change (e.g. after adding or removing an Action to the build or one of the build's steps). This is needed to ensure durability: if the master node dies, it can be restarted, and the job will resume execution because its up-to-date state will be deserialised from disk.

The problem is that the state saved to disk includes all the data that the plugin associates with the build. This data is added to the build as it executes and is retained until the very end of the build execution. It includes steps/stages execution data, build metadata, etc. The reason it is retained until the end of the build is because the trace for the entire build is submitted once the build is finished.

The problem is aggravated by the fact that the data for individual stages and steps is stored in the build object, so with every step and every change all of that data has to be saved. This results in writing and re-writing to disk multiple times the same data, even the data for stages that have already completed.

This is how it contributes to the overhead:

  • heap memory - for pipelines that take a lot of time to finish, all of their data sits in the heap until the very end of the build, increasing pressure on the GC (made worse by the fact that the data from long-living pipelines is likely to survive multiple garbage collections and end up being promoted to old generation).
  • CPU usage - the data has to be serialised. By default Jenkins uses reflection-based serialisers that have to examine class metadata in order to determine which fields to serialise and how. This is very CPU-intensive
  • disk I/O - as explained above, the data for the entire build is written and re-written to disk many times. Besides, the data contains a lot of duplication, e.g. every step includes its own copy of the pipeline's environment variables.

The points above have been corroborated with several JFR profiles obtained from different customers.

To address the issue, the following is changed:

  • As soon as a stage or a step in a pipeline finishes, its span is serialised and added to the next batch that will be submitted. Once this is done, all of the data associated with that stage/step is removed.
  • Stage/step-specific data is saved in corresponding build nodes, rather than in the build object. As the result, it is serialised to disk only when there are changes related to those specific build nodes. Unrelated changes to the build or other build nodes will not cause that data to be serialised.
  • Custom converters implemented for all data that is serialised. They are more performant than the standard reflection-based converters since there is no need to examine class metadata to determine what needs to be serialised / deserialised.

Since now the plugin retains steps/stages data for the shortest possible period, the way data is propagated from steps to stages to builds (this includes, for example, execution node or git metadata) has been reworked as well.

Description of the Change

Alternate Designs

Possible Drawbacks

Verification Process

Since the changes in the behaviour were minimal, existing tests were used to verify there are no regressions.
Some of the tests had to be adjusted because of the way Git metadata is gathered now: the tests have to do actual SCM checkout to better emulate real-life pipelines.

In addition, manual tests were executed in a dockerized Jenkins instance, covering the following:

Freestyle build trace submitted via webhooks
Freestyle build trace submitted via EVP proxy
Freestyle build trace submitted via APM track
Pipeline trace submitted via webhooks
Pipeline trace submitted via EVP proxy
Pipeline trace submitted via APM track

Additional Notes

Release Notes

Review checklist (to be filled by reviewers)

  • Feature or bug fix MUST have appropriate tests (unit, integration, etc...)
  • PR title must be written as a CHANGELOG entry (see why)
  • Files changes must correspond to the primary purpose of the PR as described in the title (small unrelated changes should have their own PR)
  • PR must have one changelog/ label attached. If applicable it should have the backward-incompatible label attached.
  • PR should not have do-not-merge/ label attached.
  • If Applicable, issue must have kind/ and severity/ labels attached at least.

@nikita-tkachenko-datadog nikita-tkachenko-datadog changed the title Nikita tkachenko/memory consumption optimization Optimise memory consumption and disk writes. Jan 24, 2024
@nikita-tkachenko-datadog nikita-tkachenko-datadog changed the title Optimise memory consumption and disk writes. Optimise memory consumption and disk writes Jan 24, 2024
@nikita-tkachenko-datadog nikita-tkachenko-datadog changed the title Optimise memory consumption and disk writes Optimise memory consumption, CPU usage and disk writes Jan 24, 2024
@nikita-tkachenko-datadog nikita-tkachenko-datadog marked this pull request as ready for review January 24, 2024 11:47
@nikita-tkachenko-datadog nikita-tkachenko-datadog added the changelog/Fixed Fixed features results into a bug fix version bump label Jan 24, 2024
@nikita-tkachenko-datadog nikita-tkachenko-datadog force-pushed the nikita-tkachenko/memory-consumption-optimization branch from d691588 to 11d62f5 Compare January 25, 2024 11:57
@nikita-tkachenko-datadog nikita-tkachenko-datadog force-pushed the nikita-tkachenko/memory-consumption-optimization branch from 11d62f5 to 7eefea6 Compare January 25, 2024 12:28
@nikita-tkachenko-datadog nikita-tkachenko-datadog force-pushed the nikita-tkachenko/memory-consumption-optimization branch from 9199d18 to 52371d7 Compare January 26, 2024 10:37
Copy link
Collaborator

@drodriguezhdez drodriguezhdez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped some minor comments. Huge refactor, nice work!

drodriguezhdez
drodriguezhdez previously approved these changes Jan 30, 2024
Base automatically changed from nikita-tkachenko/ci-visibility-batching to master January 30, 2024 14:21
@nikita-tkachenko-datadog nikita-tkachenko-datadog dismissed drodriguezhdez’s stale review January 30, 2024 14:21

The base branch was changed.

drodriguezhdez
drodriguezhdez previously approved these changes Jan 30, 2024
drodriguezhdez
drodriguezhdez previously approved these changes Jan 30, 2024
@nikita-tkachenko-datadog nikita-tkachenko-datadog merged commit da5bf62 into master Jan 31, 2024
16 checks passed
@nikita-tkachenko-datadog nikita-tkachenko-datadog deleted the nikita-tkachenko/memory-consumption-optimization branch January 31, 2024 09:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog/Fixed Fixed features results into a bug fix version bump
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants