Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prep release: v1.58.0 #6331

Merged
merged 5 commits into from
Nov 26, 2024
Merged

prep release: v1.58.0 #6331

merged 5 commits into from
Nov 26, 2024

Conversation

abernix
Copy link
Member

@abernix abernix commented Nov 26, 2024

Note

When approved, this PR will merge into the 1.58.0 branch which will — upon being approved itself — merge into main.

Things to review in this PR:

  • Changelog correctness (There is a preview below, but it is not necessarily the most up to date. See the Files Changed for the true reality.)
  • Version bumps
  • That it targets the right release branch (1.58.0 in this case!).

🚀 Features

Support DNS resolution strategy configuration (PR #6109)

The router now supports a configurable DNS resolution strategy for the URLs of coprocessors and subgraphs.
The new option is called dns_resolution_strategy and supports the following values:

  • ipv4_only - Only query for A (IPv4) records.
  • ipv6_only - Only query for AAAA (IPv6) records.
  • ipv4_and_ipv6 - Query for both A (IPv4) and AAAA (IPv6) records in parallel.
  • ipv6_then_ipv4 - Query for AAAA (IPv6) records first; if that fails, query for A (IPv4) records.
  • ipv4_then_ipv6(default) - Query for A (IPv4) records first; if that fails, query for AAAA (IPv6) records.

You can change the DNS resolution strategy applied to a subgraph's URL:

traffic_shaping:
  all:
    dns_resolution_strategy: ipv4_then_ipv6

You can also change the DNS resolution strategy applied to a coprocessor's URL:

coprocessor:
  url: http://coprocessor.example.com:8081
  client:
    dns_resolution_strategy: ipv4_then_ipv6

By @IvanGoncharov in #6109

Configuration options for HTTP/1 max headers and buffer limits (PR #6194)

This update introduces configuration options that allow you to adjust the maximum number of HTTP/1 request headers and the maximum buffer size allocated for headers.

By default, the router accepts HTTP/1 requests with up to 100 headers and allocates ~400 KiB of buffer space to store them. If you need to handle requests with more headers or require a different buffer size, you can now configure these limits in the router's configuration file:

limits:
  http1_request_max_headers: 200
  http1_request_max_buf_size: 200kib

If you are using the router as a Rust crate, the http1_request_max_buf_size option requires the hyper_header_limits feature and also necessitates using Apollo's fork of the Hyper crate until the changes are merged upstream.
You can include this fork by adding the following patch to your Cargo.toml file:

[patch.crates-io]
"hyper" = { git = "https://github.com/apollographql/hyper.git", tag = "header-customizations-20241108" }

By @IvanGoncharov in #6194

Compress subgraph operations by generating fragments (PR #6013)

The router now compresses operations sent to subgraphs by default by generating fragment
definitions and using them in the operation.

This change enables generate_query_fragments by default while disabling experimental_reuse_query_fragments. When enabled, experimental_reuse_query_fragments attempts to intelligently reuse the fragment definitions
from the original operation. However, fragment generation with generate_query_fragments is much faster also produces better outputs in most cases.

If you are relying on the shape of fragments in your subgraph operations or tests, you can opt out of the new algorithm with the configuration below.

Note: The subgraph operations generated by the query planner are not guaranteed consistent release over release. We strongly recommend against relying on the shape of planned subgraph operations, as new router features and optimizations will continuously affect it. We plan to remove experimental_reuse_query_fragments in a future release.

supergraph:
  generate_query_fragments: false
  experimental_reuse_query_fragments: true

By @lrlna in #6013

Add subgraph request id (PR #5858)

The router now supports a subgraph request ID that is a unique string identifying a subgraph request and response. It allows plugins and coprocessors to keep some state per subgraph request by matching on this ID. It's available in coprocessors as subgraphRequestId and Rhai scripts as request.subgraph.id and response.subgraph.id.

By @Geal in #5858

Add extensions.service for all subgraph errors (PR #6191)

For improved debuggability, the router now supports adding a subgraph's name as an extension to all errors originating from the subgraph.

If include_subgraph_errors is true for a particular subgraph, all errors originating in this subgraph will have the subgraph's name exposed as a service extension.

You can enable subgraph errors with the following configuration:

include_subgraph_errors:
  all: true # Propagate errors from all subgraphs

Note: This option is enabled by default by the router's dev mode.

Consequently, when a subgraph returns an error, it will have a service extension with the subgraph name as its value. The following example shows the extension for a products subgraph:

{
  "data": null,
  "errors": [
    {
      "message": "Invalid product ID",
      "path": [],
      "extensions": {
        "service": "products"
      }
    }
  ]
}

By @IvanGoncharov in #6191

Add @context support in the native query planner (PR #6310)

The @context feature is now available in the native query planner.
This brings the native query planner to feature parity with the legacy query planner for all Federation v2 graphs.

By @clenfest, @TylerBloom in #6310

🐛 Fixes

Remove noisy demand control logs (PR #6192)

Demand control no longer logs warnings when a subgraph response is missing a requested field.

By @tninesling in #6192

Renamed headers' original values can again be propagated (PR #6281)

PR #4535 introduced a regression where the following header propagation config would not work:

headers:
- propagate:
    named: a
    rename: b
- propagate:
    named: a
    rename: c

The goal of the original PR was to prevent multiple headers from being mapped to a single target header. However, it did not consider renames and instead prevented multiple mappings from the same source header.
The router now propagates headers properly and ensures that a target header is only propagated to once.

By @BrynCooke in #6281

Introspection response deduplication should always produce results (Issue #6249)

To reduce CPU usage, query planning and introspection queries are deduplicated. In some cases, deduplicated introspection queries were not receiving their result. This issue has been fixed, and the router now sends results in all cases.

By @Geal in #6257

Don't log response data upon notification failure for subgraph batching (PR #6150)

For a subgraph batching operation, the router now doesn't log the entire subgraph response when failing to notify a waiting batch participant. This saves the router from logging the large amount of data (PII and/or non-PII data) that a subgraph response may contain.

By @garypen in #6150

Move heavy computation to a thread pool with a priority queue (PR #6247)

The router now avoids blocking threads when executing asynchronous code by using a thread pool with a priority queue.

This improves the performance of the following components can take non-trivial amounts of CPU time:

  • GraphQL parsing
  • GraphQL validation
  • Query planning
  • Schema introspection

In order to avoid blocking threads that execute asynchronous code,
they are now run in a new thread pool with a priority queue. The size of the thread pool is based on the number of available CPU cores.

The thread pool replaces the router's prior implementation that used Tokio’s spawn_blocking.

apollo.router.compute_jobs.queued is a new gauge metric for the number of items in the thread pool's priority queue.

Note: when the native query planner is enabled, the dedicated queue of the legacy query planner is no longer used, so the apollo.router.query_planning.queued metric is no longer emitted.

By @SimonSapin in #6247

Limit the amount of GraphQL validation errors returned per response (PR #6187)

When an invalid query is submitted, the router now returns at most one hundred GraphQL parsing and validation errors in a response. This prevents generating too large of a response for a nonsensical document.

By @goto-bus-stop in #6187

Remove placeholders from file upload query variables (PR #6293)

Previously, file upload query variables in subgraph requests incorrectly contained internal placeholders.
According to the GraphQL Multipart Request Spec, these variables should be set to null.
This issue has been fixed by ensuring that the router complies with the specification and improving compatibility with subgraphs handling file uploads.

By @IvanGoncharov in #6293

Overhead processing metrics should exclude subgraph response time when deduplication is enabled (PR #6207)

The router's calculated overhead processing time has been fixed, where the time spent waiting for the subgraph response of a deduplicated request had been incorrectly included.

By @Geal in #6207

Fix demand control panic for custom scalars that represent non-GraphQL-compliant JSON (PR #6288)

Previously, a panic could be triggered in the router's demand control plugin with the following schema:

scalar ArbitraryJson

type MyInput {
    json: ArbitraryJson
}

type Query {
    fetch(args: MyInput): Int
}

Then, submitting the query

query FetchData(: ArbitraryJson) {
    fetch(args: {
        json: 
    })
}

and variables

{
    "myJsonValue": {
        "field.with.dots": 1
    }
}

During scoring, the demand control plugin would attempt to convert the variable structure into a GraphQL-compliant structure requiring valid GraphQL names as keys. The dot characters in the keys however would cause a panic.

With this fix, only the GraphQL compliant part of the input object is scored, and the arbitrary JSON marked by the custom scalar is scored as an opaque scalar (similar to how built-ins like Int or String are processed).

By @tninesling in #6288

Fix incorrect overriding of concrete type names with interface names when merging responses (PR #6250)

When using @interfaceObject, differing pieces of data can come back with either concrete types or interface types depending on the source. Previously, receiving the data in a particular order could incorrectly result in the interface name of a type overwriting its concrete name.

To make the response merging order-agnostic, the router now checks the schema to ensure concrete types are not overwritten with interfaces or less specific types.

By @tninesling in #6250

🛠 Maintenance

Query planner cache key improvements (Issue #5160)

Important

If you have enabled Distributed query plan caching, this release changes the hashing algorithm used for the cache keys. On account of this, you should anticipate additional cache regeneration cost when updating between these versions while the new hashing algorithm comes into service.

Several performance improvements have been implemented for query plan cache key generation. In particular, the distributed cache's key format has changed, which adds prefixes to the different key segments to help in debugging.

By @Geal in #6206

Add entity caching invalidation configuration metrics (PR #6286)

We've added metrics for our analytics to know if entity caching invalidation is enabled.

By @bnjjj in #6286

Avoid creating stub span for supergraph events if current span exists (PR #6096)

The router optimized its telemetry implementation by not creating a redundant span when it already has a span available to use the span's extensions for supergraph events.

By @bnjjj in #6096

📚 Documentation

Clarify docs for authorization directive composition (PR #6216)

The docs for authorization directive composition have been clarified, including corrected code examples.

By @Meschreiber in #6216

@svc-apollo-docs
Copy link
Collaborator

svc-apollo-docs commented Nov 26, 2024

✅ Docs Preview Ready

No new or changed pages found.

@router-perf
Copy link

router-perf bot commented Nov 26, 2024

CI performance tests

  • connectors-const - Connectors stress test that runs with a constant number of users
  • const - Basic stress test that runs with a constant number of users
  • demand-control-instrumented - A copy of the step test, but with demand control monitoring and metrics enabled
  • demand-control-uninstrumented - A copy of the step test, but with demand control monitoring enabled
  • enhanced-signature - Enhanced signature enabled
  • events - Stress test for events with a lot of users and deduplication ENABLED
  • events_big_cap_high_rate - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity
  • events_big_cap_high_rate_callback - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity using callback mode
  • events_callback - Stress test for events with a lot of users and deduplication ENABLED in callback mode
  • events_without_dedup - Stress test for events with a lot of users and deduplication DISABLED
  • events_without_dedup_callback - Stress test for events with a lot of users and deduplication DISABLED using callback mode
  • extended-reference-mode - Extended reference mode enabled
  • large-request - Stress test with a 1 MB request payload
  • no-tracing - Basic stress test, no tracing
  • reload - Reload test over a long period of time at a constant rate of users
  • step-jemalloc-tuning - Clone of the basic stress test for jemalloc tuning
  • step-local-metrics - Field stats that are generated from the router rather than FTV1
  • step-with-prometheus - A copy of the step test with the Prometheus metrics exporter enabled
  • step - Basic stress test that steps up the number of users over time
  • xlarge-request - Stress test with 10 MB request payload
  • xxlarge-request - Stress test with 100 MB request payload

CHANGELOG.md Outdated Show resolved Hide resolved
Co-authored-by: Renée <[email protected]>
CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
Copy link
Contributor

@BrynCooke BrynCooke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking while perf is checked

Co-authored-by: Renée <[email protected]>
@BrynCooke BrynCooke self-requested a review November 26, 2024 10:30
BrynCooke
BrynCooke previously approved these changes Nov 26, 2024
CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
@abernix abernix merged commit eeb63ec into 1.58.0 Nov 26, 2024
8 of 12 checks passed
@abernix abernix deleted the prep-1.58.0 branch November 26, 2024 14:44
abernix added a commit that referenced this pull request Nov 26, 2024
abernix added a commit that referenced this pull request Nov 27, 2024
abernix added a commit that referenced this pull request Nov 27, 2024
Also includes re-applying changelog editorial from
#6331.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants