Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backport 2.x] Refactor Compressors from CompressorFactory to CompressorRegistry for extensibility (#9262) #9367

Merged
merged 1 commit into from
Aug 21, 2023

Conversation

nknize
Copy link
Collaborator

@nknize nknize commented Aug 15, 2023

Backport c6b67e1 from #9262.

@nknize nknize added enhancement Enhancement or improvement to existing feature or request backport PRs or issues specific to backporting features or enhancments v2.10.0 labels Aug 15, 2023
@peternied
Copy link
Member

A breaking change was merged into 2.x a couple of hours ago [1] - how can we give impacted teams time to stabilize before this change is merged?

@peternied peternied added the >breaking Identifies a breaking change. label Aug 15, 2023
@nknize
Copy link
Collaborator Author

nknize commented Aug 15, 2023

...how can we give impacted teams time to stabilize before this change is merged?

Stabilize what? Main? Or 2.x? The original change was merged to main five days ago. Backport PRs usually come quicker than that.

@nknize
Copy link
Collaborator Author

nknize commented Aug 15, 2023

@peternied peternied added the >breaking label 7 minutes ago

@peternied Out of curiosity, why did you label this breaking?

I agree. The new compress library does technically make this breaking. Thx for labeling accordingly!

@opensearch-trigger-bot
Copy link
Contributor

Compatibility status:



> Task :checkCompatibility
Incompatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/performance-analyzer.git]
Components skipped due to git failures: [https://github.com/opensearch-project/opensearch-oci-object-storage.git]
Compatible components: [https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/performance-analyzer-rca.git]

BUILD SUCCESSFUL in 25m 34s

@nknize
Copy link
Collaborator Author

nknize commented Aug 15, 2023

Actually, it doesn't look like this PR breaks anything in security repo?

It looks like compileOnly "org.opensearch:opensearch:${opensearch_version}" in the build.gradle properly pulls all of the opensearch artifacts which includes things like the new compress library.

image

Then I don't see security directly calling CompressorType or CompressorFactory (which is now replaced by CompressorRegistry). So compile time shouldn't be impacted. The only possible impact I can see by quickly looking would be at test runtime if XContentHelper (e.g., XContentHelper.createParser(...)) is ever called (looks like it's called in a few places like AbstractAuditLog? This will now use the CompressorRegistry. But even this is handled by the Gradle: org.opensearch:opensearch:3.0.0-SNAPSHOT dependency.

So what does this backport break?

@peternied
Copy link
Member

Thanks for looking into the security repo - doing that by hand seems painful. Seems like the compatibility check was broken, how would you feel about waiting till that is operational before we merge changes like this? Maybe we could invest a little their to build confidence in changes like this rather than starting at zero (such as I am).

@nknize
Copy link
Collaborator Author

nknize commented Aug 15, 2023

Seems like the compatibility check was broken,...

It doesn't look broken? The 2.x incompatibility is unrelated to this change. It reported incompatible because the Action class changes were only just merged to 2.x after compatibility checker was run on this PR.

@nknize
Copy link
Collaborator Author

nknize commented Aug 15, 2023

I'm refiring check compatibility just in case.

@peternied
Copy link
Member

Are you wanting to hold this backport until the security repo backports fixes to 2.x?

I don't have the data, so let not hold off because I say so. What do you think about coming up with a prediction the downstream impact of this change, then so long as we [1] are ok with that impact, then lets move forward [2]?

  • [1] who is we? I think that decision making process should be bigger than 1 contributor and 1 approver in OpenSearch, but it shouldn't bog this all down 🤷
  • [2] Move forward might change very little, e.g. merge and downstream's handle if there are breaking changes, maybe there is breaking change days for predictability

How does apache foundation handle cases like this?

@nknize
Copy link
Collaborator Author

nknize commented Aug 15, 2023

How does apache foundation handle cases like this?

It's varies by project. I can only speak for Lucene and Solr (and Elasticsearch, and even OpenSearch). Solr (and Elasticsearch, and OpenSearch core) absorbs upstream changes "on demand" the same way we do it here with Lucene. This is what I have been suggesting for a while.

Instead of upstream core OpenSearch repo force feeding per-merge SNAPSHOT artifacts to all downstreams, the downstreams should manually (or on some automated, e.g., weekly, cron period) trigger a snapshot build and update their ${opensearch_version} in build.gradle to their own created sha (e.g., 2.10.0-SNAPSHOT-4373c3b). This would keep downstream builds stable. The unpleasant side affect would depend on how often those snapshots are updated, but that would be under the control of the downstream repo community. It's not perfect (breaking changes never are) but least then the downstream controls their own pain. This happens a lot on Solr, OpenSearch, Elasticsearch when large changes are made to lucene (e.g., JPMS, gradle change from ant, Endianness changes)

@opensearch-trigger-bot
Copy link
Contributor

Compatibility status:



> Task :checkCompatibility
Incompatible components: [https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/performance-analyzer.git]
Components skipped due to git failures: [https://github.com/opensearch-project/opensearch-oci-object-storage.git]
Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/performance-analyzer-rca.git]

BUILD SUCCESSFUL in 22m 19s

@peternied
Copy link
Member

@nknize Thanks - can you speak to this during our next operations review for discussion and next steps?

While I prefer the structure we currently have in place to prevent 'debt build up' and point in time alignment, the approach you called out to addresses stability that might be the change the project needs.

@nknize
Copy link
Collaborator Author

nknize commented Aug 15, 2023

While I prefer the structure we currently have in place to prevent 'debt build up' and point in time alignment, the approach you called out to addresses stability that might be the change the project needs.

I wonder if we could invent a "toggle" mechanism to allow downstreams to switch between the two approaches at will? Maybe there's a GHA to enable/disable "force fed" snapshots? This could prove useful during large scale refactors to clot the bleeding? Downstreams cut over to the manual snapshot consumption model, and switch back when the change storm has passed?

@nknize nknize force-pushed the backport/backport-9262-to-2.x branch from a5ebcea to ed5e9cc Compare August 17, 2023 14:56
@opensearch-trigger-bot
Copy link
Contributor

Compatibility status:

Checks if related components are compatible with change 4cb5e66

Incompatible components

Incompatible components: [https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/performance-analyzer.git]

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/performance-analyzer-rca.git]

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.index.translog.RemoteFSTranslogTests.testSimpleOperationsUpload

@codecov
Copy link

codecov bot commented Aug 17, 2023

Codecov Report

Merging #9367 (ed5e9cc) into 2.x (1fdc08a) will decrease coverage by 0.06%.
The diff coverage is 79.31%.

❗ Current head ed5e9cc differs from pull request most recent head 00fb53c. Consider uploading reports for the commit 00fb53c to get more accurate results

@@             Coverage Diff              @@
##                2.x    #9367      +/-   ##
============================================
- Coverage     70.92%   70.86%   -0.06%     
+ Complexity    57634    57605      -29     
============================================
  Files          4762     4764       +2     
  Lines        272085   272063      -22     
  Branches      40110    40109       -1     
============================================
- Hits         192975   192800     -175     
- Misses        62564    62751     +187     
+ Partials      16546    16512      -34     
Files Changed Coverage Δ
...n/java/org/opensearch/compress/ZstdCompressor.java 70.58% <ø> (ø)
...a/org/opensearch/core/compress/NoneCompressor.java 42.85% <ø> (ø)
...ensearch/core/compress/NotCompressedException.java 0.00% <ø> (ø)
...opensearch/core/compress/NotXContentException.java 100.00% <ø> (ø)
...epositories/blobstore/ChecksumBlobStoreFormat.java 97.61% <ø> (ø)
...java/org/opensearch/transport/TransportLogger.java 39.32% <0.00%> (ø)
...rg/opensearch/transport/TransportDecompressor.java 74.02% <50.00%> (-1.30%) ⬇️
...g/opensearch/core/compress/CompressorRegistry.java 73.07% <73.07%> (ø)
...opensearch/common/compress/CompressedXContent.java 85.93% <80.00%> (ø)
...org/opensearch/common/xcontent/XContentHelper.java 56.17% <80.00%> (+1.12%) ⬆️
... and 9 more

... and 444 files with indirect coverage changes

@nknize nknize added stalled Issues that have stalled needs-review Label for PRs seeking review. labels Aug 18, 2023
@nknize
Copy link
Collaborator Author

nknize commented Aug 18, 2023

With the need to cut a 2.9.1 patch release we should lean into getting this backport merged so we can easily and quickly track 2.x changes against the BlobStoreRepository Zstd compression logic.

@opensearch-trigger-bot opensearch-trigger-bot bot removed the stalled Issues that have stalled label Aug 18, 2023
@reta
Copy link
Collaborator

reta commented Aug 18, 2023

@peternied are you OK with that?

@peternied
Copy link
Member

@reta Thanks for checking in - speaking for the security plugin team, looks fine for us

… extensibility (opensearch-project#9262)

This commit refactors the CompressorFactory static singleton
class and CompressorType enum to a formal CompressorRegistry and enables
downstream implementations to register their own compression
implementations for use in compressing Blob stores and MediaType data.
This is different from Lucene's Codec compression extension points which
expose different compression implementations for Lucene's Stored Fields.

---------

Signed-off-by: Nicholas Walter Knize <[email protected]>
(cherry picked from commit c6b67e1)
@nknize nknize force-pushed the backport/backport-9262-to-2.x branch from ed5e9cc to 00fb53c Compare August 21, 2023 13:55
@opensearch-trigger-bot
Copy link
Contributor

Compatibility status:

Checks if related components are compatible with change 1fdc08a

Incompatible components

Incompatible components: [https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/performance-analyzer.git]

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/ml-commons.git]

@nknize
Copy link
Collaborator Author

nknize commented Aug 21, 2023

These CI/CD actions are not at all reliable. It's crazy how much time is wasted re-firing all of these checks... It only seems to be getting worse. Can we simplify all of this? Or find something more reliable?

315.00 MiB / 315.00 MiB (100.00%) ? p/stime="2023-08-21T14:22:21Z" level=info msg="Downloaded the image from \"[https://github.com/abiosoft/alpine-lima/releases/download/colima-v0.5.5/alpine-lima-clm-3.18.0-x86_64.iso\](https://github.com/abiosoft/alpine-lima/releases/download/colima-v0.5.5/alpine-lima-clm-3.18.0-x86_64.iso/)""
time="2023-08-21T14:22:25Z" level=info msg="[hostagent] Starting QEMU (hint: to watch the boot progress, see \"/Users/runner/.lima/colima/serial*.log\")"
time="2023-08-21T14:22:25Z" level=info msg="SSH Local Port: 49356"
time="2023-08-21T14:22:25Z" level=info msg="[hostagent] Waiting for the essential requirement 1 of 5: \"ssh\""
time="2023-08-21T14:22:26Z" level=info msg="[hostagent] Driver stopped due to error: \"exit status 255\""
time="2023-08-21T14:22:26Z" level=info msg="[hostagent] Shutting down the host agent"
time="2023-08-21T14:22:26Z" level=warning msg="[hostagent] failed to exit SSH master"
time="2023-08-21T14:22:26Z" level=info msg="[hostagent] Shutting down QEMU with ACPI"
time="2023-08-21T14:22:26Z" level=warning msg="[hostagent] failed to open the QMP socket \"/Users/runner/.lima/colima/qmp.sock\", forcibly killing QEMU"
time="2023-08-21T14:22:26Z" level=info msg="[hostagent] QEMU has already exited"
time="2023-08-21T14:22:26Z" level=fatal msg="exiting, status={Running:false Degraded:false Exiting:true Errors:[] SSHLocalPort:0} (hint: see \"/Users/runner/.lima/colima/ha.stderr.log\")"
time="2023-08-21T14:22:26Z" level=fatal msg="error starting vm: error at 'creating and starting': exit status 1"

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport PRs or issues specific to backporting features or enhancments >breaking Identifies a breaking change. enhancement Enhancement or improvement to existing feature or request needs-review Label for PRs seeking review. v2.10.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants