Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pipeline metrics to Node Stats API #16839

Merged
merged 3 commits into from
Jan 3, 2025

Conversation

kaisecheng
Copy link
Contributor

@kaisecheng kaisecheng commented Dec 27, 2024

Release notes

Added three pipeline metrics, workers, batch size and batch delay, to the Node Stats API

What does this PR do?

Add pipeline metrics to Node Stats API

This commit introduces three new metrics under each pipeline in the Node Stats API:

  • Number of workers
  • Batch size
  • Batch delay

Example response from curl localhost:9600/_node/stats/

{
...
pipelines: {
  main: {
    events: {...}, 
    flow: {...}, 
    plugins: {...}, 
    reloads: {...}, 
    queue: {...}, 
    pipeline: {
      workers: 12,
      batch_size: 125,
      batch_delay: 5,
    }, 
  }
}
...
}

Why is it important/What is the impact to the user?

Prior to this change, the /_node/stats only displayed the default pipeline configuration in the global namespace. Pipelines with custom configurations were not represented.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files (and/or docker env variables)
  • I have added tests that prove my fix is effective or that my feature works

Author's Checklist

  • [ ]

How to test this PR locally

case 1

  • bin/logstash
  • curl localhost:9600/_node/stats/
  • each pipeline should report workers, batch_size, batch_delay

case 2

  • bin/logstash -r
  • update the number of workers to trigger the reload of pipeline
  • node stats should show the updated workers

Related issues

Use cases

Screenshots

Logs

Add fields to display the number of workers, batch size, and batch delay for each pipeline
@kaisecheng kaisecheng marked this pull request as ready for review December 30, 2024 16:36
Copy link
Contributor

📃 DOCS PREVIEWhttps://logstash_bk_16839.docs-preview.app.elstc.co/diff

@donoghuc donoghuc self-requested a review January 3, 2025 00:25
Copy link
Member

@donoghuc donoghuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I dont understand the top level pipeline in the response. I verified that setting pipeine specific ones reports them correctly, but it is unclear to me what the top level one is indicating. Can you point me to what that actually is meant to describe? Essentially my question becomes, why do we have the top level one when it seems to me that the pipeline specific ones would be the only ones users would care about.

➜  /tmp curl "localhost:9600/_node/stats/pipelines?pretty"

{
  "host" : "cass-MacBook-Pro.local",
  "version" : "9.0.0",
  "http_address" : "127.0.0.1:9600",
  "id" : "452565a1-ab4f-4e85-8fcc-320d88338092",
  "name" : "cass-MacBook-Pro.local",
  "ephemeral_id" : "de2cb434-ccf7-48fa-994b-d347aed1107a",
  "snapshot" : null,
  "status" : "green",
  "pipeline" : {
    "workers" : 12,
    "batch_size" : 125,
    "batch_delay" : 50
  },

@elasticmachine
Copy link
Collaborator

elasticmachine commented Jan 3, 2025

@kaisecheng
Copy link
Contributor Author

@donoghuc The top level pipeline reflect the the default pipeline setting in logstash.yml. Prior to this change, it is meaningful to know the default number of workers set by user. You may think we have metrics per pipeline now and can remove the top level pipeline. However, for backward compatibility, we need to keep it for Kibana stack monitoring

Copy link

Quality Gate passed Quality Gate passed

Issues
0 New issues
0 Fixed issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarQube

Copy link
Contributor

github-actions bot commented Jan 3, 2025

📃 DOCS PREVIEWhttps://logstash_bk_16839.docs-preview.app.elstc.co/diff

Copy link
Member

@donoghuc donoghuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the background info!

Local test summary:

Set the global defaults and pipeline specifics

myconfig git:(7756d471b2) ✗ tree
.
├── logstash.yml
├── pipeline1.conf
├── pipeline2.conf
└── pipelines.yml

1 directory, 4 filesmyconfig git:(7756d471b2) ✗ cat logstash.yml
pipeline.batch.size: 50   
pipeline.batch.delay: 10  
pipeline.workers: 1     myconfig git:(7756d471b2) ✗ cat pipeline1.conf
input {
  generator {
    count => -1
    message => "pipeline 1"
  }
}
output {
  null { }
}myconfig git:(7756d471b2) ✗ cat pipeline2.conf
input {
  generator {
    count => -1
    message => "pipeline 2"
  }
}
output {
  null { }
}myconfig git:(7756d471b2) ✗ cat pipelines.yml
- pipeline.id: pipeline1
  pipeline.workers: 2
  pipeline.batch.size: 100
  pipeline.batch.delay: 25
  path.config: "myconfig/pipeline1.conf"

- pipeline.id: pipeline2
  pipeline.workers: 4
  pipeline.batch.size: 200
  pipeline.batch.delay: 50
  path.config: "myconfig/pipeline2.conf"

Verfiied new objects in response:

/tmp curl "localhost:9600/_node/stats/pipelines?pretty"

{
  "host" : "cass-MacBook-Pro.local",
  "version" : "9.0.0",
  "http_address" : "127.0.0.1:9600",
  "id" : "452565a1-ab4f-4e85-8fcc-320d88338092",
  "name" : "cass-MacBook-Pro.local",
  "ephemeral_id" : "aeae5a67-85dd-4a8d-b2c9-d16843a3c9eb",
  "snapshot" : null,
  "status" : "green",
  "pipeline" : {
    "workers" : 1,
    "batch_size" : 50,
    "batch_delay" : 10
  },
  "pipelines" : {
    "pipeline2" : {
      "events" : {
        "filtered" : 4000,
        "queue_push_duration_in_millis" : 13,
        "out" : 4000,
        "duration_in_millis" : 5,
        "in" : 4712
      },
      "flow" : {
        "filter_throughput" : {
          "current" : 10500.0,
          "lifetime" : 10500.0
        },
        "worker_utilization" : {
          "current" : 0.3721,
          "lifetime" : 0.3721
        },
        "output_throughput" : {
          "current" : 11890.0,
          "lifetime" : 11890.0
        },
        "worker_concurrency" : {
          "current" : 0.01487,
          "lifetime" : 0.01487
        },
        "input_throughput" : {
          "current" : 12180.0,
          "lifetime" : 12180.0
        },
        "queue_backpressure" : {
          "current" : 0.03219,
          "lifetime" : 0.03219
        }
      },
      "plugins" : {
        "inputs" : [ {
          "id" : "7e09fb2522de763eb167fe43ecd7a192d82416aede0fd485c1820d57abffdc7b",
          "events" : {
            "queue_push_duration_in_millis" : 13,
            "out" : 4932
          },
          "flow" : {
            "throughput" : {
              "current" : 12230.0,
              "lifetime" : 12230.0
            }
          },
          "name" : "generator"
        } ],
        "codecs" : [ {
          "id" : "plain_80d41067-014a-446c-ab5f-694e12cfa279",
          "encode" : {
            "writes_in" : 0,
            "duration_in_millis" : 0
          },
          "decode" : {
            "writes_in" : 4951,
            "out" : 4951,
            "duration_in_millis" : 131
          },
          "name" : "plain"
        }, {
          "id" : "plain_a6cf75ca-c0e4-4abc-a88c-66ec2d3201c4",
          "encode" : {
            "writes_in" : 0,
            "duration_in_millis" : 0
          },
          "decode" : {
            "writes_in" : 0,
            "out" : 0,
            "duration_in_millis" : 0
          },
          "name" : "plain"
        } ],
        "filters" : [ ],
        "outputs" : [ {
          "id" : "69d48260d0e48d9a8577a91b9ad999ed17deeaf4b3b1c169eea3dd32cc927a35",
          "events" : {
            "out" : 4800,
            "duration_in_millis" : 3,
            "in" : 4800
          },
          "flow" : {
            "worker_utilization" : {
              "current" : 0.1858,
              "lifetime" : 0.1858
            },
            "worker_millis_per_event" : {
              "current" : 6.25E-4,
              "lifetime" : 6.25E-4
            }
          },
          "name" : "null"
        } ]
      },
      "reloads" : {
        "last_success_timestamp" : null,
        "last_error" : null,
        "last_failure_timestamp" : null,
        "successes" : 0,
        "failures" : 0
      },
      "queue" : {
        "type" : "memory",
        "events_count" : 0,
        "queue_size_in_bytes" : 0,
        "max_queue_size_in_bytes" : 0
      },
      "pipeline" : {
        "workers" : 4,
        "batch_size" : 200,
        "batch_delay" : 50
      },
      "hash" : "ff6948030e96ca0426aff281ee72d82345076a1c513cffb24e69b2989ffd880b",
      "ephemeral_id" : "bfe7776e-d25a-4e5c-8396-41709a7c3536"
    },
    "pipeline1" : {
      "events" : {
        "filtered" : 4800,
        "queue_push_duration_in_millis" : 13,
        "out" : 4800,
        "duration_in_millis" : 11,
        "in" : 4984
      },
      "flow" : {
        "filter_throughput" : {
          "current" : 11850.0,
          "lifetime" : 11850.0
        },
        "worker_utilization" : {
          "current" : 1.359,
          "lifetime" : 1.359
        },
        "output_throughput" : {
          "current" : 12090.0,
          "lifetime" : 12090.0
        },
        "worker_concurrency" : {
          "current" : 0.02716,
          "lifetime" : 0.02716
        },
        "input_throughput" : {
          "current" : 12310.0,
          "lifetime" : 12310.0
        },
        "queue_backpressure" : {
          "current" : 0.03208,
          "lifetime" : 0.03208
        }
      },
      "plugins" : {
        "inputs" : [ {
          "id" : "c76ae1baefa2272a5504e9d9e2d12552d18e6416190ed95fd645dd97b6b9ac0d",
          "events" : {
            "queue_push_duration_in_millis" : 12,
            "out" : 5001
          },
          "flow" : {
            "throughput" : {
              "current" : 12360.0,
              "lifetime" : 12360.0
            }
          },
          "name" : "generator"
        } ],
        "codecs" : [ {
          "id" : "plain_d1ad6a62-949f-4ccb-8e76-12adb2529bbd",
          "encode" : {
            "writes_in" : 0,
            "duration_in_millis" : 0
          },
          "decode" : {
            "writes_in" : 0,
            "out" : 0,
            "duration_in_millis" : 0
          },
          "name" : "plain"
        }, {
          "id" : "plain_9e7004b7-26e8-4c14-ba82-1e85e77014a2",
          "encode" : {
            "writes_in" : 0,
            "duration_in_millis" : 0
          },
          "decode" : {
            "writes_in" : 5006,
            "out" : 5006,
            "duration_in_millis" : 128
          },
          "name" : "plain"
        } ],
        "filters" : [ ],
        "outputs" : [ {
          "id" : "d4efe72c7aa3e2243d1e28cee8ac128ded032b501094f62c7977e936ad66e583",
          "events" : {
            "out" : 4900,
            "duration_in_millis" : 6,
            "in" : 4900
          },
          "flow" : {
            "worker_utilization" : {
              "current" : 0.7414,
              "lifetime" : 0.7414
            },
            "worker_millis_per_event" : {
              "current" : 0.001224,
              "lifetime" : 0.001224
            }
          },
          "name" : "null"
        } ]
      },
      "reloads" : {
        "last_success_timestamp" : null,
        "last_error" : null,
        "last_failure_timestamp" : null,
        "successes" : 0,
        "failures" : 0
      },
      "queue" : {
        "type" : "memory",
        "events_count" : 0,
        "queue_size_in_bytes" : 0,
        "max_queue_size_in_bytes" : 0
      },
      "pipeline" : {
        "workers" : 2,
        "batch_size" : 100,
        "batch_delay" : 25
      },
      "hash" : "60845191f973ae51eb5c7e8a85f2e94dbb00704ec22a7e3c299c0a6a966ffdc8",
      "ephemeral_id" : "5e765276-65aa-4441-84b8-406990dca509"
    }
  }
}

@kaisecheng kaisecheng merged commit de6a6c5 into elastic:main Jan 3, 2025
7 checks passed
@kaisecheng
Copy link
Contributor Author

@logstashmachine backport 8.x

github-actions bot pushed a commit that referenced this pull request Jan 3, 2025
This commit introduces three new metrics per pipeline in the Node Stats API:
- workers
- batch_size
- batch_delay

```
{
  ...
  pipelines: {
    main: {
      events: {...},
      flow: {...},
      plugins: {...},
      reloads: {...},
      queue: {...},
      pipeline: {
        workers: 12,
        batch_size: 125,
        batch_delay: 5,
      },
    }
  }
  ...
}
```

(cherry picked from commit de6a6c5)
kaisecheng added a commit that referenced this pull request Jan 3, 2025
This commit introduces three new metrics per pipeline in the Node Stats API:
- workers
- batch_size
- batch_delay

```
{
  ...
  pipelines: {
    main: {
      events: {...},
      flow: {...},
      plugins: {...},
      reloads: {...},
      queue: {...},
      pipeline: {
        workers: 12,
        batch_size: 125,
        batch_delay: 5,
      },
    }
  }
  ...
}
```

(cherry picked from commit de6a6c5)

Co-authored-by: kaisecheng <[email protected]>
@kaisecheng
Copy link
Contributor Author

@logstashmachine backport 8.17

github-actions bot pushed a commit that referenced this pull request Jan 7, 2025
This commit introduces three new metrics per pipeline in the Node Stats API:
- workers
- batch_size
- batch_delay

```
{
  ...
  pipelines: {
    main: {
      events: {...},
      flow: {...},
      plugins: {...},
      reloads: {...},
      queue: {...},
      pipeline: {
        workers: 12,
        batch_size: 125,
        batch_delay: 5,
      },
    }
  }
  ...
}
```

(cherry picked from commit de6a6c5)
kaisecheng added a commit that referenced this pull request Jan 7, 2025
This commit introduces three new metrics per pipeline in the Node Stats API:
- workers
- batch_size
- batch_delay

```
{
  ...
  pipelines: {
    main: {
      events: {...},
      flow: {...},
      plugins: {...},
      reloads: {...},
      queue: {...},
      pipeline: {
        workers: 12,
        batch_size: 125,
        batch_delay: 5,
      },
    }
  }
  ...
}
```

(cherry picked from commit de6a6c5)

Co-authored-by: kaisecheng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Misleading pipeline information in 'node_stats' API
3 participants