Add pipeline metrics to Node Stats API #16839

kaisecheng · 2024-12-27T18:34:56Z

Release notes

Added three pipeline metrics, workers, batch size and batch delay, to the Node Stats API

What does this PR do?

Add pipeline metrics to Node Stats API

This commit introduces three new metrics under each pipeline in the Node Stats API:

Number of workers
Batch size
Batch delay

Example response from curl localhost:9600/_node/stats/

{
...
pipelines: {
  main: {
    events: {...}, 
    flow: {...}, 
    plugins: {...}, 
    reloads: {...}, 
    queue: {...}, 
    pipeline: {
      workers: 12,
      batch_size: 125,
      batch_delay: 5,
    }, 
  }
}
...
}

Why is it important/What is the impact to the user?

Prior to this change, the /_node/stats only displayed the default pipeline configuration in the global namespace. Pipelines with custom configurations were not represented.

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have made corresponding change to the default configuration files (and/or docker env variables)
I have added tests that prove my fix is effective or that my feature works

Author's Checklist

[ ]

How to test this PR locally

case 1

bin/logstash
curl localhost:9600/_node/stats/
each pipeline should report workers, batch_size, batch_delay

case 2

bin/logstash -r
update the number of workers to trigger the reload of pipeline
node stats should show the updated workers

Related issues

closes: Misleading pipeline information in 'node_stats' API #14805

Use cases

Screenshots

Logs

Add fields to display the number of workers, batch size, and batch delay for each pipeline

github-actions · 2024-12-31T12:40:31Z

📃 DOCS PREVIEW ✨ https://logstash_bk_16839.docs-preview.app.elstc.co/diff

logstash-core/lib/logstash/api/commands/stats.rb

donoghuc

In general I dont understand the top level pipeline in the response. I verified that setting pipeine specific ones reports them correctly, but it is unclear to me what the top level one is indicating. Can you point me to what that actually is meant to describe? Essentially my question becomes, why do we have the top level one when it seems to me that the pipeline specific ones would be the only ones users would care about.

➜  /tmp curl "localhost:9600/_node/stats/pipelines?pretty"

{
  "host" : "cass-MacBook-Pro.local",
  "version" : "9.0.0",
  "http_address" : "127.0.0.1:9600",
  "id" : "452565a1-ab4f-4e85-8fcc-320d88338092",
  "name" : "cass-MacBook-Pro.local",
  "ephemeral_id" : "de2cb434-ccf7-48fa-994b-d347aed1107a",
  "snapshot" : null,
  "status" : "green",
  "pipeline" : {
    "workers" : 12,
    "batch_size" : 125,
    "batch_delay" : 50
  },

elasticmachine · 2025-01-03T18:20:27Z

💛 Build succeeded, but was flaky

Buildkite Build
Commit: 7756d47

Failed CI Steps

History

💚 Build #1997 succeeded 6b0cdc7
💚 Build #1992 succeeded 40d60d2

kaisecheng · 2025-01-03T18:29:50Z

@donoghuc The top level pipeline reflect the the default pipeline setting in logstash.yml. Prior to this change, it is meaningful to know the default number of workers set by user. You may think we have metrics per pipeline now and can remove the top level pipeline. However, for backward compatibility, we need to keep it for Kibana stack monitoring

elastic-sonarqube · 2025-01-03T18:33:54Z

Quality Gate passed

Issues
0 New issues
0 Fixed issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarQube

github-actions · 2025-01-03T18:43:09Z

📃 DOCS PREVIEW ✨ https://logstash_bk_16839.docs-preview.app.elstc.co/diff

donoghuc

Thanks for the background info!

Local test summary:

Set the global defaults and pipeline specifics

➜  myconfig git:(7756d471b2) ✗ tree
.
├── logstash.yml
├── pipeline1.conf
├── pipeline2.conf
└── pipelines.yml

1 directory, 4 files
➜  myconfig git:(7756d471b2) ✗ cat logstash.yml
pipeline.batch.size: 50   
pipeline.batch.delay: 10  
pipeline.workers: 1     
➜  myconfig git:(7756d471b2) ✗ cat pipeline1.conf
input {
  generator {
    count => -1
    message => "pipeline 1"
  }
}
output {
  null { }
}
➜  myconfig git:(7756d471b2) ✗ cat pipeline2.conf
input {
  generator {
    count => -1
    message => "pipeline 2"
  }
}
output {
  null { }
}
➜  myconfig git:(7756d471b2) ✗ cat pipelines.yml
- pipeline.id: pipeline1
  pipeline.workers: 2
  pipeline.batch.size: 100
  pipeline.batch.delay: 25
  path.config: "myconfig/pipeline1.conf"

- pipeline.id: pipeline2
  pipeline.workers: 4
  pipeline.batch.size: 200
  pipeline.batch.delay: 50
  path.config: "myconfig/pipeline2.conf"

Verfiied new objects in response:

➜  /tmp curl "localhost:9600/_node/stats/pipelines?pretty"

{
  "host" : "cass-MacBook-Pro.local",
  "version" : "9.0.0",
  "http_address" : "127.0.0.1:9600",
  "id" : "452565a1-ab4f-4e85-8fcc-320d88338092",
  "name" : "cass-MacBook-Pro.local",
  "ephemeral_id" : "aeae5a67-85dd-4a8d-b2c9-d16843a3c9eb",
  "snapshot" : null,
  "status" : "green",
  "pipeline" : {
    "workers" : 1,
    "batch_size" : 50,
    "batch_delay" : 10
  },
  "pipelines" : {
    "pipeline2" : {
      "events" : {
        "filtered" : 4000,
        "queue_push_duration_in_millis" : 13,
        "out" : 4000,
        "duration_in_millis" : 5,
        "in" : 4712
      },
      "flow" : {
        "filter_throughput" : {
          "current" : 10500.0,
          "lifetime" : 10500.0
        },
        "worker_utilization" : {
          "current" : 0.3721,
          "lifetime" : 0.3721
        },
        "output_throughput" : {
          "current" : 11890.0,
          "lifetime" : 11890.0
        },
        "worker_concurrency" : {
          "current" : 0.01487,
          "lifetime" : 0.01487
        },
        "input_throughput" : {
          "current" : 12180.0,
          "lifetime" : 12180.0
        },
        "queue_backpressure" : {
          "current" : 0.03219,
          "lifetime" : 0.03219
        }
      },
      "plugins" : {
        "inputs" : [ {
          "id" : "7e09fb2522de763eb167fe43ecd7a192d82416aede0fd485c1820d57abffdc7b",
          "events" : {
            "queue_push_duration_in_millis" : 13,
            "out" : 4932
          },
          "flow" : {
            "throughput" : {
              "current" : 12230.0,
              "lifetime" : 12230.0
            }
          },
          "name" : "generator"
        } ],
        "codecs" : [ {
          "id" : "plain_80d41067-014a-446c-ab5f-694e12cfa279",
          "encode" : {
            "writes_in" : 0,
            "duration_in_millis" : 0
          },
          "decode" : {
            "writes_in" : 4951,
            "out" : 4951,
            "duration_in_millis" : 131
          },
          "name" : "plain"
        }, {
          "id" : "plain_a6cf75ca-c0e4-4abc-a88c-66ec2d3201c4",
          "encode" : {
            "writes_in" : 0,
            "duration_in_millis" : 0
          },
          "decode" : {
            "writes_in" : 0,
            "out" : 0,
            "duration_in_millis" : 0
          },
          "name" : "plain"
        } ],
        "filters" : [ ],
        "outputs" : [ {
          "id" : "69d48260d0e48d9a8577a91b9ad999ed17deeaf4b3b1c169eea3dd32cc927a35",
          "events" : {
            "out" : 4800,
            "duration_in_millis" : 3,
            "in" : 4800
          },
          "flow" : {
            "worker_utilization" : {
              "current" : 0.1858,
              "lifetime" : 0.1858
            },
            "worker_millis_per_event" : {
              "current" : 6.25E-4,
              "lifetime" : 6.25E-4
            }
          },
          "name" : "null"
        } ]
      },
      "reloads" : {
        "last_success_timestamp" : null,
        "last_error" : null,
        "last_failure_timestamp" : null,
        "successes" : 0,
        "failures" : 0
      },
      "queue" : {
        "type" : "memory",
        "events_count" : 0,
        "queue_size_in_bytes" : 0,
        "max_queue_size_in_bytes" : 0
      },
      "pipeline" : {
        "workers" : 4,
        "batch_size" : 200,
        "batch_delay" : 50
      },
      "hash" : "ff6948030e96ca0426aff281ee72d82345076a1c513cffb24e69b2989ffd880b",
      "ephemeral_id" : "bfe7776e-d25a-4e5c-8396-41709a7c3536"
    },
    "pipeline1" : {
      "events" : {
        "filtered" : 4800,
        "queue_push_duration_in_millis" : 13,
        "out" : 4800,
        "duration_in_millis" : 11,
        "in" : 4984
      },
      "flow" : {
        "filter_throughput" : {
          "current" : 11850.0,
          "lifetime" : 11850.0
        },
        "worker_utilization" : {
          "current" : 1.359,
          "lifetime" : 1.359
        },
        "output_throughput" : {
          "current" : 12090.0,
          "lifetime" : 12090.0
        },
        "worker_concurrency" : {
          "current" : 0.02716,
          "lifetime" : 0.02716
        },
        "input_throughput" : {
          "current" : 12310.0,
          "lifetime" : 12310.0
        },
        "queue_backpressure" : {
          "current" : 0.03208,
          "lifetime" : 0.03208
        }
      },
      "plugins" : {
        "inputs" : [ {
          "id" : "c76ae1baefa2272a5504e9d9e2d12552d18e6416190ed95fd645dd97b6b9ac0d",
          "events" : {
            "queue_push_duration_in_millis" : 12,
            "out" : 5001
          },
          "flow" : {
            "throughput" : {
              "current" : 12360.0,
              "lifetime" : 12360.0
            }
          },
          "name" : "generator"
        } ],
        "codecs" : [ {
          "id" : "plain_d1ad6a62-949f-4ccb-8e76-12adb2529bbd",
          "encode" : {
            "writes_in" : 0,
            "duration_in_millis" : 0
          },
          "decode" : {
            "writes_in" : 0,
            "out" : 0,
            "duration_in_millis" : 0
          },
          "name" : "plain"
        }, {
          "id" : "plain_9e7004b7-26e8-4c14-ba82-1e85e77014a2",
          "encode" : {
            "writes_in" : 0,
            "duration_in_millis" : 0
          },
          "decode" : {
            "writes_in" : 5006,
            "out" : 5006,
            "duration_in_millis" : 128
          },
          "name" : "plain"
        } ],
        "filters" : [ ],
        "outputs" : [ {
          "id" : "d4efe72c7aa3e2243d1e28cee8ac128ded032b501094f62c7977e936ad66e583",
          "events" : {
            "out" : 4900,
            "duration_in_millis" : 6,
            "in" : 4900
          },
          "flow" : {
            "worker_utilization" : {
              "current" : 0.7414,
              "lifetime" : 0.7414
            },
            "worker_millis_per_event" : {
              "current" : 0.001224,
              "lifetime" : 0.001224
            }
          },
          "name" : "null"
        } ]
      },
      "reloads" : {
        "last_success_timestamp" : null,
        "last_error" : null,
        "last_failure_timestamp" : null,
        "successes" : 0,
        "failures" : 0
      },
      "queue" : {
        "type" : "memory",
        "events_count" : 0,
        "queue_size_in_bytes" : 0,
        "max_queue_size_in_bytes" : 0
      },
      "pipeline" : {
        "workers" : 2,
        "batch_size" : 100,
        "batch_delay" : 25
      },
      "hash" : "60845191f973ae51eb5c7e8a85f2e94dbb00704ec22a7e3c299c0a6a966ffdc8",
      "ephemeral_id" : "5e765276-65aa-4441-84b8-406990dca509"
    }
  }
}

kaisecheng · 2025-01-03T20:50:01Z

@logstashmachine backport 8.x

This commit introduces three new metrics per pipeline in the Node Stats API: - workers - batch_size - batch_delay ``` { ... pipelines: { main: { events: {...}, flow: {...}, plugins: {...}, reloads: {...}, queue: {...}, pipeline: { workers: 12, batch_size: 125, batch_delay: 5, }, } } ... } ``` (cherry picked from commit de6a6c5)

This commit introduces three new metrics per pipeline in the Node Stats API: - workers - batch_size - batch_delay ``` { ... pipelines: { main: { events: {...}, flow: {...}, plugins: {...}, reloads: {...}, queue: {...}, pipeline: { workers: 12, batch_size: 125, batch_delay: 5, }, } } ... } ``` (cherry picked from commit de6a6c5) Co-authored-by: kaisecheng <[email protected]>

kaisecheng · 2025-01-07T14:36:48Z

@logstashmachine backport 8.17

This commit introduces three new metrics per pipeline in the Node Stats API: - workers - batch_size - batch_delay ``` { ... pipelines: { main: { events: {...}, flow: {...}, plugins: {...}, reloads: {...}, queue: {...}, pipeline: { workers: 12, batch_size: 125, batch_delay: 5, }, } } ... } ``` (cherry picked from commit de6a6c5)

This commit introduces three new metrics per pipeline in the Node Stats API: - workers - batch_size - batch_delay ``` { ... pipelines: { main: { events: {...}, flow: {...}, plugins: {...}, reloads: {...}, queue: {...}, pipeline: { workers: 12, batch_size: 125, batch_delay: 5, }, } } ... } ``` (cherry picked from commit de6a6c5) Co-authored-by: kaisecheng <[email protected]>

Enhance Node Stats API metrics report

40d60d2

Add fields to display the number of workers, batch size, and batch delay for each pipeline

kaisecheng marked this pull request as ready for review December 30, 2024 16:36

update doc example

6b0cdc7

donoghuc self-requested a review January 3, 2025 00:25

donoghuc reviewed Jan 3, 2025

View reviewed changes

logstash-core/lib/logstash/api/commands/stats.rb Outdated Show resolved Hide resolved

donoghuc reviewed Jan 3, 2025

View reviewed changes

remove redundant param

7756d47

donoghuc approved these changes Jan 3, 2025

View reviewed changes

kaisecheng merged commit de6a6c5 into elastic:main Jan 3, 2025
7 checks passed

github-actions bot added the v8.18.0 label Jan 3, 2025

github-actions bot mentioned this pull request Jan 3, 2025

Backport PR #16839 to 8.x: Add pipeline metrics to Node Stats API #16850

Merged

5 tasks

github-actions bot mentioned this pull request Jan 7, 2025

Backport PR #16839 to 8.17: Add pipeline metrics to Node Stats API #16865

Merged

5 tasks

github-actions bot added the v8.17.1 label Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pipeline metrics to Node Stats API #16839

Add pipeline metrics to Node Stats API #16839

kaisecheng commented Dec 27, 2024 •

edited

Loading

github-actions bot commented Dec 31, 2024

donoghuc left a comment

elasticmachine commented Jan 3, 2025 •

edited

Loading

kaisecheng commented Jan 3, 2025

elastic-sonarqube bot commented Jan 3, 2025

github-actions bot commented Jan 3, 2025

donoghuc left a comment

kaisecheng commented Jan 3, 2025

kaisecheng commented Jan 7, 2025

Add pipeline metrics to Node Stats API #16839

Add pipeline metrics to Node Stats API #16839

Conversation

kaisecheng commented Dec 27, 2024 • edited Loading

Release notes

What does this PR do?

Why is it important/What is the impact to the user?

Checklist

Author's Checklist

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

github-actions bot commented Dec 31, 2024

donoghuc left a comment

Choose a reason for hiding this comment

elasticmachine commented Jan 3, 2025 • edited Loading

💛 Build succeeded, but was flaky

Failed CI Steps

History

kaisecheng commented Jan 3, 2025

elastic-sonarqube bot commented Jan 3, 2025

Quality Gate passed

github-actions bot commented Jan 3, 2025

donoghuc left a comment

Choose a reason for hiding this comment

kaisecheng commented Jan 3, 2025

kaisecheng commented Jan 7, 2025

kaisecheng commented Dec 27, 2024 •

edited

Loading

elasticmachine commented Jan 3, 2025 •

edited

Loading