-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Separation of compute & storage #14637
Comments
Thanks @bryanlb for starting this RFC. Its well-structured proposal, highlighting the significant benefits of separating compute and storage. It aligns with the Reader and Writer Separation RFC, which also advocates for dedicated node roles, moving us in the same direction. The high-level goals, such as traffic segregation, separation of concerns for resilience, and independent scalability, are substantial. Ability to scale independently adds significant value from an infrastructure perspective, allowing the use of heterogeneous instance types for different node roles. Additionally, this architecture enables us to tackle more complex problems going forward, such as implementing independent sharding schemes for readers and writers based on traffic patterns (or shard heat). Also, performing post-processing tasks like creating rollups or high-level pre-compute caches/indices for improved read performance can be achieved in better isolation. The use of object storage for indexed data and a persistent queue like Apache Kafka for unindexed data ensures durability and scalability. It also addresses the indexing scale problem in today's world. With Pull based indexing approach, we can dynamically allocate resources based on workload characteristics, which will help handling varying query loads and ingest rates. Furthermore, revamping the metadata store should be broadly considered in both proposals. It's also an opportunity to segregate the cluster state with more concise and relevant information based on node roles. One thing to consider is the potential increase in read after write latency, especially when fetching indexes from object storage? It might be worth to think what strategies can we employ to optimize the performance of real-time queries in this new architecture? |
i think that is called out in the question from the proposal:
|
With #9065 (currently in progress), the OpenSearch core would provide request / response streaming out of the box (it is already available as experimental feature). Having said that, it is totally feasible now to have a plugin (deployed index node) that would stream the documents to the object store (or basically anywhere). |
Introduction
Modern distributed search engines like OpenSearch, ElasticSearch and Apache Solr were not designed from the ground up to truly take advantage of the cloud’s elasticity, neither were they built to leverage building blocks that have become foundational pieces in public cloud provider offerings like object storage.
Our proposal is to modify OpenSearch to adopt a cloud native architecture, separating compute from durability and storage. The durability of unindexed data would be provided by a persistent queue like Apache Kafka, and the storage for indexed data would be provided by object storage like Amazon S3.
This also enables alternate architectures, such as a deployment that does not keep a hot tier of data nodes, but use cold storage that streams results directly from object storage.
Goals
Non-Goals
Proposed architecture
We propose moving from the existing cluster model architecture to a stateless node architecture, using an event stream / write ahead log for unindexed data, and using object storage as the durable storage.
Ingest nodes - accept bulk ingest requests, submit to an event stream
Event stream - Apache Kafka, Apache Pulsar, etc used as a write head log
Indexing nodes - consume from the event stream to create indexes and upload to object storage
Data nodes - fetch indexes from object storage and make available to query. Optionally can stream indexes directly from object storage.
Coordinating nodes - perform scatter / gather of from data nodes
Metadata store - Apache Zookeeper, etc used as a centralized store for node, index discovery
Manager node - manages operation of the cluster
Indexers and data nodes all communicate via a cluster manager and do not replicate any data between themselves.
Discussion
Summary
We believe moving towards a stateless node architecture will enable operators of OpenSearch deployments to more quickly adapt to changing workload requirements, improve cluster resource utilization, and enable scaling to larger deployments.
References
Slack Astra Search Engine - https://slackhq.github.io/astra/architecture.html#system-overview
The Snowflake Elastic Data Warehouse - https://dl.acm.org/doi/10.1145/2882903.2903741
Proposal co-authored by @vthacker and @bryanlb for @slackhq
The text was updated successfully, but these errors were encountered: