OpenSearch Engine 2022 Themes #2095

CEHENKLE · 2022-02-11T23:09:50Z

Hello!

We’re the team at Amazon that maintains the OpenSearch repo. The OpenSearch search & analytics engine is the centerpiece product of the OpenSearch project. As part of planning for 2022, we wanted to share with you how we’re thinking about the upcoming year.

We'd love to work with you shoulder to shoulder to build out these ideas. This isn’t designed to be a comprehensive list of everything getting built in the next year. OpenSearch is a shared community project, so we're excited to see the ideas and projects that come from the community. These are just the areas where this group of people plans to put its time and attention given what we know now. So if there’s something missing that you want to see, that’s great! Let everyone know by writing issues/feature proposals/code for the thing you want to see.

1. Introduction

OpenSearch is a community-driven and open source (Apache 2.0 licensed) search engine. Our goal is to make OpenSearch secure, fast, scalable, extensible and always Open source.

Our focus includes:

Search and Indexing: This includes search and indexing, performance and results quality, incorporating Lucene improvements, field mapping and query builder.
Aggregation: We take care of aggregation’s performance, stability and memory tuning, which includes different types of aggregations: metrics, bucket and pipeline aggregations.
Distributed Framework: We ensure that OpenSearch can scale in a distributed manner, this includes node types (e.g. Data, etc.); replication; merging policies; snapshots and restore; sharding; and circuit breaker.
Extensions Framework: We enable others to extend the functionality of OpenSearch by making sure the architecture is modular and extensible.
Clients & Libraries: We make sure communication mechanisms and APIs are built to work with external connections.
Packaging and Installation This includes making sure that packaging, distribution, and installation of the core engine are easy.

2. Themes

These are the themes we’ve identified for 2022. As the year goes on, we’ll be adding issues/features that address these themes, but I’ve added a few issues that have already been opened so far as examples.

Security — Easily secure by default

Today, users need to invest significant time to understand and configure the right level of security for their use case. Also, since security is only available with a plugin, this leaves the minimum distribution with no security features enabled by default. Our goal is to make sure that it’s easy to get started with OpenSearch while also being secure by default. Finally, because software is only as secure as its dependencies, we will focus in 2022 on being able to proactively identify and mitigate vulnerabilities in our dependencies.

OpenSearch inbuilt security

Efficiency — Getting the most out of your resources

When we talk about efficiency, we mean improving search/indexing performance and throughput and reducing cluster costs. We are looking at segment replication, which will copy Lucene segment files from primary to its replicas so that replicas will not have to re-execute operations. Additionally, if we decouple reader and writer JVMs, it should allow for better memory tuning and optimization, especially in cases where the cluster is heavy on read or heavy on write. And finally, we're looking at how to better utilize multi-core CPUs to parallelize searches over multiple Lucene segments.

Durability — Leveraging purpose-built storage to improve durability

OpenSearch provides durability via the transaction log (a.k.a translog) and a materialized view of indexed data (i.e. Lucene segments) which it replicates across multiple hosts in a cluster. To improve durability, we want to decouple storage from compute by extending the disk-based durability guarantees of the current architecture. If enabled, pluggable Remote Storage will ensure that every write has been committed to a durable remote storage system (e.g. Azure Blob Storage, Amazon S3, etc) before acknowledging the write to the user. In addition, we will introduce the concept of storage tiers and pluggable translog, so the disk-based implementation. could be replaced with cloud-based alternatives.

Add Remote Storage Options for Improved Durability

Extensibility — Make it easier to add new capabilities to OpenSearch.

Our goal is to provide a delightful experience for community members to build and distribute features and for cluster administrators to find and use those community-built features. We will do this by taking on projects that make the core smaller and more modular, and by hardening interfaces such that extensions can be authored and work reliably and securely with multiple versions of OpenSearch.

Features — Improve the end user experience

To help anyone make the most out of OpenSearch, we will add new features and capabilities to solve end user problems. We will add features like Flattened field types, Schema on Read, and improve the Out of the Box Experience (OOBE) to remove friction points.

Engagement — Improve the developer experience

To help anyone participate in and contribute to the project, we will improve the developer experience. We will fix “broken windows” like exclusive naming and make it easier to automate and add new tests. We will invest time to to capture metrics to better understand the health of the project.

In addition to working on these themes, we will also reserve 30% of our team’s time for ad hoc requests, bug fixes, and oncall support.

3. Next Steps

There’s a lot on this list of goals and realistically they may take multiple years to fully achieve them. But we look forward to making big leaps ahead on all of them in 2022 with you. Over the year, I will also continue to add links to this post for projects, so please watch the post if one of them interests you.

seanneumann · 2022-02-14T17:27:35Z

Exciting year! Thanks for writing this up @CEHENKLE !!

seanneumann · 2022-03-30T20:31:53Z

Corresponding OpenSearch Dashboards 2022 issue: opensearch-project/OpenSearch-Dashboards#1405

CEHENKLE added the discuss Issues intended to help drive brainstorming and decision making label Feb 11, 2022

CEHENKLE pinned this issue Feb 11, 2022

seanneumann mentioned this issue Mar 30, 2022

OpenSearch Dashboards 2022 Initiatives opensearch-project/OpenSearch-Dashboards#1405

Closed

kartg mentioned this issue May 6, 2022

[RFC] Modeling writes as an extensible workflow #3237

Open

dblock mentioned this issue Sep 6, 2022

Socket hang up opensearch-project/opensearch-js#285

Closed

CEHENKLE unpinned this issue Mar 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenSearch Engine 2022 Themes #2095

OpenSearch Engine 2022 Themes #2095

CEHENKLE commented Feb 11, 2022 •

edited

Loading

seanneumann commented Feb 14, 2022

seanneumann commented Mar 30, 2022 •

edited

Loading

OpenSearch Engine 2022 Themes #2095

OpenSearch Engine 2022 Themes #2095

Comments

CEHENKLE commented Feb 11, 2022 • edited Loading

1. Introduction

2. Themes

Security — Easily secure by default

Efficiency — Getting the most out of your resources

Durability — Leveraging purpose-built storage to improve durability

Extensibility — Make it easier to add new capabilities to OpenSearch.

Features — Improve the end user experience

Engagement — Improve the developer experience

3. Next Steps

seanneumann commented Feb 14, 2022

seanneumann commented Mar 30, 2022 • edited Loading

CEHENKLE commented Feb 11, 2022 •

edited

Loading

seanneumann commented Mar 30, 2022 •

edited

Loading