Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crypto plugin abstractions, encrypted repositories and metadata mgmt #7284

Conversation

vikasvb90
Copy link
Contributor

@vikasvb90 vikasvb90 commented Apr 24, 2023

Description

This change is to add crypto plugin abstractions, add required fieds in put repository request and validations for the same and cluster state changes to manage crypto metadata against a repository.

Issues Resolved

This is one of the tasks of meta issue #7229

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@vikasvb90 vikasvb90 force-pushed the crypto_plugin_abstractions_and_metadata_mgmt branch from ff21b8a to c6d4735 Compare April 24, 2023 05:40
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@vikasvb90 vikasvb90 force-pushed the crypto_plugin_abstractions_and_metadata_mgmt branch from c6d4735 to e9aa5e1 Compare April 24, 2023 14:22
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@vikasvb90 vikasvb90 force-pushed the crypto_plugin_abstractions_and_metadata_mgmt branch from e9aa5e1 to 0f14430 Compare April 24, 2023 14:34
@vikasvb90 vikasvb90 force-pushed the crypto_plugin_abstractions_and_metadata_mgmt branch from 20d2261 to fb3912b Compare June 3, 2023 04:18
@vikasvb90 vikasvb90 requested a review from dbwiddis as a code owner June 3, 2023 04:18
@github-actions
Copy link
Contributor

github-actions bot commented Jun 3, 2023

Gradle Check (Jenkins) Run Completed with:

@vikasvb90 vikasvb90 force-pushed the crypto_plugin_abstractions_and_metadata_mgmt branch from fb3912b to cfb2b32 Compare June 12, 2023 04:33
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@vikasvb90 vikasvb90 force-pushed the crypto_plugin_abstractions_and_metadata_mgmt branch from cfb2b32 to c395256 Compare June 14, 2023 13:42
@vikasvb90 vikasvb90 requested a review from sachinpkale as a code owner June 14, 2023 13:42
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@vikasvb90 vikasvb90 force-pushed the crypto_plugin_abstractions_and_metadata_mgmt branch from c395256 to aff2fb8 Compare June 21, 2023 10:34
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@vikasvb90 vikasvb90 force-pushed the crypto_plugin_abstractions_and_metadata_mgmt branch from aff2fb8 to 6c263a6 Compare June 21, 2023 11:32
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@vikasvb90 vikasvb90 force-pushed the crypto_plugin_abstractions_and_metadata_mgmt branch from 6c263a6 to 66d34f3 Compare June 21, 2023 12:55
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      2 org.opensearch.indices.replication.RemoteStoreReplicationSourceTests.classMethod
      1 org.opensearch.search.backpressure.SearchBackpressureIT.testSearchShardTaskCancellationWithHighCpu
      1 org.opensearch.indices.replication.RemoteStoreReplicationSourceTests.testGetSegmentFiles

public PutRepositoryRequest(StreamInput in) throws IOException {
super(in);
name = in.readString();
type = in.readString();
settings = readSettingsFromStream(in);
verify = in.readBoolean();

if (in.getVersion().onOrAfter(Version.V_3_0_0)) {
encrypted = in.readOptionalBoolean();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this an optional boolean? An optional boolean has three states, so what does null mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is marked as optional to ensure backward compatibility with existing integrations. We don't want existing repository registrations to start failing from this version. We can treat a null encrypted field as a non encrypted repo. If we move it to readBoolean then it will become a breaking change. To add more info, stream based constructors are typically used in node to node communication.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional doesn't give you backwards compatibility; when invoking writeOptionalBoolean you're still unconditionally writing a byte to the stream and the reader must unconditionally read this byte as well. The onOrAfter version check gives you backwards compatibility. I think you can just write a regular boolean and default the field to false if not explicitly set when constructing an instance. I think you can also use a primitive boolean type in the class and avoid the null case all together.

Copy link
Contributor Author

@vikasvb90 vikasvb90 Jun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't referring to version backwards compatibility but the ability for an existing api to work without any change with the new field introduction during let's say upgrading the cluster. Let me check this further if I can move it to non optional field without breaking anything and without enforcing the user to pass encrypted field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this and following are the two cases :

  1. We keep encrypted field Boolean and use writeBoolean or readBoolean methods instead of their optional counterparts. In this case lets say you have 2 nodes cluster. You trigger a put repository request on non active cluster manager node (say node2) without encrypted field. Node2 will try to form the request for cluster manager node using writeBoolean and it will fail with NPE as it expects a non-null boolean value. If we had used an optional Boolean then it would have done a null check first, written a byte value of 2 and sent request to cluster manager node. On cluster manager node, again it would require optional method to read since it needs to handle the possibility of null value. Basically, to keep a non-primitive field and to use non-optional methods, users will have to change their contracts.
  2. We keep primitive version this field. If you do this then you don't have a null case anymore which means that in all the get calls users will start seeing encrypted field. Again, if there are strict contracts on user side to read these responses, then they will start failing as well because of the unknown field. By keeping non-primitive fields users can selectively change contracts for encryption and other users continue to stay unchanged.

Correct me if I am missing something here which you are trying to point!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, you're right that in order to preserve the "not set" state in the repository settings then we'll need the optional boolean to communicate that third possible state.

if (repository.getMetadata() != null
&& (repository.getMetadata().encrypted() != null || repository.getMetadata().cryptoMetadata() != null)) {
listener.onFailure(
new RepositoryException(repository.getMetadata().name(), "Snapshot creation for an encrypted repository is not supported")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit strange to me. This PR introduces client side encryption as a property of the repository, so shouldn't everything that interacts with a repository (via the repository interface) use that encryption? Looking at the next PR (#7337), the logic for using the CryptoPlugin is baked into the remote store business logic, which means that all features using the repository interface will have to do the same integration. It also looks like data is conditionally encrypted in that PR (metadata is left in plain text). That also seems strange given that encryption is a property of the repository. Shouldn't all data in an "encrypted" repository be encrypted?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, with the approach shown in #7337, I think "encryption" there is a property of the remote store feature. Then it could make sense that the remote store feature would decide which data is encrypted and which is left in plain text. If encryption is a property of the repository, then I think all data in that repository should be encrypted, and also the encryption should be transparent to all users of that repository (i.e. baked into the implementation of the repository interface).

Copy link
Contributor Author

@vikasvb90 vikasvb90 Jun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. In this PR, we don't intend to enable encryption for a snapshot repository. To avoid giving an impression of an encrypted repository during a snapshot repo registration, this validation is explicitly failing such request. We will plan to put this into OS documentation as well. Enabling encryption for snapshot repos will require decent amount of changes. We can surely take it up as a separate PR though.
  2. Encryption is a heavy compute operation and has a direct impact on the overall performance. Metadata encryption was intentionally left out for this reason. In fact, I was wondering if we can skip checkpoint encryption as well. Customer data is only exposed in translog and segment files so it makes sense to secure those instead of encrypting everything. Basically, we will take a good hit on performance if we want to ensure transparency. We can probably add this in our OS documentation as well to make it known what exactly is being encrypted (secured).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why I didn't put it under remote_store block is because enabling encryption requires few key provider settings. Crypto clients are created using those settings. Now, if we move it to remote_store block, every time we create an index, we have to explicitly pass those settings. Unless we have a use case to encrypt one index on one repo and keep another index in plain text on same repo, this seems a duplicated effort. If user wants to reuse same s3 bucket for instance, we anyways have this abstraction of repository where we can create a new repo out of the same bucket.
So, now the only option left for us is to do it on a repository. Otherwise we have to expose another REST api to configure encryption and then use its identifier/name during index creation.

Copy link
Contributor Author

@vikasvb90 vikasvb90 Jun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tagging @Bukhtawar @sachinpkale @gbbafna as well for their views on point 2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The decision is data driven only. We have already established that encryption takes huge amount of CPU. Adding more such compute will naturally affect performance. The path we should in fact take is to minimize these computes. We have carried out multiple perf runs and scenarios where CPU is already at its peak, adding any compute had a visible impact on throughput.

Point of adding "encrypted" at repository level is exactly that. Every feature using repository must check if the repository is encrypted or not. Having said that, today we don't have encryption enabled snapshots. With this we will have a way to build or forced to build encryption on top of snapshots as well. Repository is an abstraction and remote store and snapshots are two of its stakeholders. Adding encryption context on top of repository makes sure that any new feature building on top of repository makes sure that it accounts for encryption and if it doesn't support it then it clearly calls out that it produces insecure data. The point of argument here is that whether we should encrypt entire data in specific case of remote store.
I believe that we should be highly cognizant when we think of encrypting data. There are specialized hardware instances available just to minimize the impact of encryption.
We can think of it like this that when a repository used in conjunction with remote store, encrypts only user data. Snapshot has its own way of using a repository. When a file exceeds certain capacity, it doesn't maintain a single file on repository and instead maintains smaller blocks of it. So, the only guy who can make sense of these files is snapshot. Similarly, remote store can take the decision of what should be encrypted.

Why should encryption affect the behavior of BlobContainer? Both are two completely independent tasks. You can take a look at the second PR. There is no change in BlobContainer interface. As proposed in RFC also, I have built it as a loosely coupled module which can be tied to any type of content. It is not just limited to repository features. We are starting to use it for repository but it can be extended to any other content type as well.

Proposal I am looking for is a place to maintain the metadata for encryption. Also, this metadata will go in cluster state so wherever you propose to put it, should have a respective cluster manager action (existing or new) responsible for maintaining it.

Copy link
Member

@andrross andrross Jul 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm advocating for implementing client-side repository encryption at the repository layer so that features that use the repository do not need to change based on whether the repository is using client-side encryption or not. I'm not suggesting that we change any interfaces or tightly couple the encryption logic to any repository implementation. I believe this could be implemented as wrapper of BlobStore and/or BlobContainer that does the encryption and decryption logic before delegating to a concrete repository implementation. The only reason I can see for not doing that is so that features can exempt certain files from encryption for performance reasons. I don't believe it is sufficient to make this call by saying that encryption is CPU intensive. It matters how much this impacts performance. How much overhead does encrypting the metadata add?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason I can see for not doing that is so that features can exempt certain files from encryption for performance reasons.

Just want to make sure my assumption here is correct. Is exempting metadata from encryption the only reason not to make encryption transparent to the users of the repository layer? PR #7337 shows all the undifferentiated work that each feature will have to do regarding wiring the CryptoClient, creating the encrypting stream, etc and it just seems to me that it would be a big win to do this once instead of per-feature.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good analogy to this is dm-crypt that encrypts the block device and all files using the device get to be encrypted as against fs-crypt that allows encryption at a filesystem level meaning each file can be encrypted differently and also choose not be encrypted at all.

So, now we can argue in favour of more flexibility and control, to leave the choice to the consumers of the repo, however we also need to understand if that is something that is needed now or can we build it later based on specific use cases, while keeping the contracts open. As @andrross pointed, this doesn't come for free, the changes on the integration PRs are too deep integrations and would need significant effort for consumers to integrate. Having said that if you can ensure that the integrations are light-weight and the new interfaces do most of the heavy lifting, I would still be glad considering moving on this approach. Essentially what I am looking for is moving what we call the CryptoClient to still be bootstrapped in the repository module/service. Then at the BlobContainer provide mechanisms to encypt and controls to opt-out as needed.

Having said that, I usually would advocate for simplicity and maintainability wherever possible. I am not sure if security/compliance needs metadata to be encrypted as well. Given metadata is usually really small, I doubt that there would be a substantial CPU implication. Even if there is 1-2% moderate CPU increase, it might still be worth the encryption.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can run some perfs to record the difference between selective encryption and repo level encryption. Depending upon the results we can take a call.

Then at the BlobContainer provide mechanisms to encypt and controls to opt-out as needed.

I would like to bring into notice few things so that we can make an informed decision. Repository and encryption have very different behaviors. For e.g. methods like delete, listBlobs, blobExists and similar methods have no relation with encryption. Also, read and writes cannot be used as they are and would require an offset adjustment and file header reads for ops acting on partial content. On top of that, it would require a completely new plugin to be built (encrypted repo plugin) if we push it at repository level. This is because the current layout of repository implementation is RepositoryPlugin -> Repository -> BlobStore -> BlobContainer. For adding crypto around BlobContainer, it would take a completely new delegation layer right to the top till RepositoryPlugin which will force us to build a new plugin unless we think of a different repository design. Please note that the new plugin we are currently building here has no relation with repository as of now. Check this PR.

If we are saying that when we talk about data in OpenSearch, we are always going to associate it with repository then I too agree with the whole approach and in this case we will have to modify existing repository abstractions to accommodate partial encryption and decryption use cases. Otherwise, we will not be able to account for non-repository encryption use cases. For e.g., if tomorrow we want to provide encryption and decryption support for local indexes as well then we will need a non-repo flavor of crypto. This means we would need both encrypted repository plugin to support repository encryption and a separate crypto plugin to support local index encryption. Crypto plugin can probably be forked out for reusability and encrypted repository plugin can depend on crypto plugin. Again, I still think that we are trying to fit two incompatible pieces together where repository is a storage layer and encryption is a transformation layer.

Still, keeping an open mind to hear your thoughts around this. Nevertheless, I think PRs #8465 and #8466 would still be required irrespective of any route we take.

Copy link
Collaborator

@nknize nknize left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR introduces client side encryption as a property of the repository

Enabling encryption for snapshot repos will require decent amount of changes. We can surely take it up as a separate PR though.

+1 what @andrross said. In general the whole design is confusing. The feature proposal and design proposal work backwards from" ...an encryption plugin which can provide capabilities to support encryption/decryption". But this PR is all about encrypting repository data.

I honestly can't grok what the overall objective is? It seems you are focused on encryption at rest? If so. What files are you wanting to encrypt? (feel free to enumerate here). Are you wanting encryption over the wire as well? This PR is just focuses on snapshots, yet it's introducing an entire plugin abstraction framework without a concrete application to anything other than snapshot files and no clear use cases or examples how a downstream plugin leverages this framework. Why introduce a whole new CryptoPlugin and not just use SPI w/ a simple library interface for downstreams to define their encryption mechanism? It seems that makes more sense if the objective is to enable downstreams to have different encryption strategies.

/**
* Crypto plugin interface used for encryption and decryption.
*/
public interface CryptoClient extends RefCounted {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange naming. This isn't a client. It's an interface to provide concrete stream encryption methods? Why not put this in a :libs:opensearch-security library instead of server?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will rename this! Implementation is there in libs, I can move this as well if it makes more sense.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel more like a CryptoProvider

* @param repositoryMetadata repository metadata for which crypto client needs to be returned.
* @return crypto client
*/
public CryptoClient cryptoClient(RepositoryMetadata repositoryMetadata) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method isn't used anywhere.

Copy link
Contributor Author

@vikasvb90 vikasvb90 Jul 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is being used in other integration PR. Let me move this over there.

@nknize nknize added discuss Issues intended to help drive brainstorming and decision making feature New feature or request stalled Issues that have stalled v2.10.0 labels Jul 13, 2023
@nknize
Copy link
Collaborator

nknize commented Jul 13, 2023

From this comment:

I believe this is a duplicate of #5800

Linked issue has specific mentions of Aiven and proposes to use that as a solution. Proposal here though is open for feedback.

@reta would Aiven consider promoting encrypted-repository-opensearch as a core plugin to offer index/repository encryption as a default security option? If so, then I'd like to see us improve that to fill some of the random access encryption gaps needed by this proposal guided by the coming "OnThe Fly Encryption Feature Proposal" and PoC in #3469. Perhaps this PR just gets trimmed to providing the key registry?

@reta
Copy link
Collaborator

reta commented Jul 14, 2023

@reta would Aiven consider promoting encrypted-repository-opensearch as a core plugin to offer index/repository encryption as a default security option?

@nknize absolutely

If so, then I'd like to see us improve that to fill some of the random access encryption gaps needed by this proposal guided by the coming #3469 and PoC in #3469. Perhaps this PR just gets trimmed to providing the key registry?

Makes perfect sense, I will bring that to the team, thank you!

@vikasvb90
Copy link
Contributor Author

vikasvb90 commented Jul 14, 2023

@nknize Thanks for the review!
To clear some confusions, this PR is just an integration PR to add encryption support for repositories but the actual framework implementation is done in other PRs which like you said are built in libs and expose a spi to provide key providers support. PRs are #8465 and #8466 . This is the meta issue which encapsulates all PRs.

If you look at other PRs, there is no context of snapshot, remote store or s3 plugin. It also provides the random access support as also proposed in the feature you linked.

Structure of the plugin is as follows :

  1. Core plugin - which provides actual encryption and decryption support - including partial encryption or decryption capabilities. This is located in libs.
  2. SPI for key providers - To allow users to define their own key providers. For e.g. users can define their own KMS based key management implementations.

I honestly can't grok what the overall objective is?

Overall objective it to provide encryption/decryption support irrespective of the source and target location of the content. Repository encryption is just one of the use cases.

What files are you wanting to encrypt?

For remote backed indexes, plan is to encrypt only the user data - translog and segment files.

Are you wanting encryption over the wire as well?

No, objective of this is to provide encryption of data at rest.

I don't think encrypted-repository-opensearch can be extended as it is tightly coupled with repository plugin and doesn't provide partial encryption or decryption support.

If we want to extend support for local disk as well, then we should be able to do that as well with current framework.

@vikasvb90
Copy link
Contributor Author

no clear use cases or examples how a downstream plugin leverages this framework

This is the spi which downstream plugin will implement to provide custom key providers.
Here is a sample implementation of the same.
I can add more details on exactly how these plugins can be installed. For now, this framework is only integrated with repository and therefore, I can add steps on how it can be used for a repository. Let me know this.

@opensearch-trigger-bot opensearch-trigger-bot bot removed the stalled Issues that have stalled label Jul 14, 2023
@Bukhtawar Bukhtawar mentioned this pull request Jul 20, 2023
6 tasks
@vikasvb90 vikasvb90 added the Storage:Durability Issues and PRs related to the durability framework label Jul 24, 2023
@andrross
Copy link
Member

Closing this as it has been crossed out on the meta issue. @vikasvb90 Please reopen if this shouldn't be closed.

@andrross andrross closed this Aug 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issues intended to help drive brainstorming and decision making feature New feature or request Storage:Durability Issues and PRs related to the durability framework v2.10.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants