elastic · georgewallace · Oct 31, 2024 · Nov 1, 2024 · Nov 1, 2024 · Nov 8, 2024
@@ -0,0 +1,139 @@
+[[hot-frozen-architecture]]
+== Hot / Frozen - High Availability
+
+The Hot / Frozen – High Availability architecture is cost optimized for large time-series datasets. All data is fully indexed and searchable in each tier as well as object storage for high speed retrieval. In this architecture, the hot tier is primarily used for indexing and immediate searching (<1 day), https://www.elastic.co/guide/en/elasticsearch/reference/current/searchable-snapshots.html[searchable snapshots] are taken from Hot to the Object Store using Index and Snapshot lifecycle management. Then, the snapshots are automatically cached in the frozen tier for search. Since the hot tier data is moved to an object store, the cost of keeping all of the data searchable is dramatically reduced.
+
+This architecture is ideal for time-series use cases, such as Observability or Security, that does not require updating. All the necessary components of the Elastic Stack are included and this is not intended for sizing workloads, but rather as a basis to ensure your cluster is ready to handle any desired workload with resiliency. A very high level representation of data flow is included, and for more detail around ingest architecture see our https://www.elastic.co/guide/en/ingest/current/use-case-arch.html[ingest architectures] documentation. 
+
+[discrete]
+[[hot-frozen-use-case]]
+=== Use case
+
+This Hot / Frozen – High Availability architecture is intended for organizations that:
+
+* Have a requirement for cost effective long term data storage (many months or years) – in particular ones that are leveraging Elastic as an archival system.
+* Provide insights and alerts using logs, metrics, traces, or various event types to ensure optimal performance and quick issue resolution for applications.
+* Apply Machine Learning and Search AI to assist in dealing with the large amount of data.
+* Deploy an architecture model that allows for maximum flexibility between storage cost and performance.
+
+[IMPORTANT]
+====
+* Security use cases with many event rules that have lookback requirements may require an additional cold tier. 
+* This architecture is intended for ingest streams that are mostly immutable (not changing or updated). Other architectures allow for data that needs to be mutable (data needs to be changed after indexing) after indexing.
+====
+
+[discrete]
+[[hot-frozen-architecture-diagram]]
+=== Architecture
+
+image::images/hot-frozen.png["A Hot/Frozen Highly available architecture"]
+
+TIP: We use an Availability Zone (AZ) concept in the architecture above.  When running in your own Data Center (DC) you can equate AZs to failure zones within a datacenter, racks, or even separate physical machines depending on your constraints.
+
+The diagram illustrates an Elasticsearch cluster deployed across 3 availability zones (AZ). For production we recommend a minimum of 2 availability zones and 3 availability zones for mission critical applications. See https://www.elastic.co/guide/en/cloud/current/ec-planning.html[Plan for Production] for more details. Even if the cluster is deployed across only two AZs, a third master node is still required for quorum voting and will be created automatically in the third AZ. True real-time high availability cannot be achieved without three zones.
+
+The number of data nodes shown for each tier (hot and frozen) is illustrative and would be scaled up depending on ingest volume and retention period. Hot nodes contain both primary and replica shards. By default, primary and replica shards are always guaranteed to be in different availability zones in Elasticsearch Service, but when self-deploying https://www.elastic.co/guide/en/elasticsearch/reference/master/modules-cluster.html#shard-allocation-awareness[shard allocation awareness] would need to be configured. Frozen nodes act as a large high-speed cache and retrieve data from the snapshot store as needed.
+
+Machine learning nodes are optional but highly recommended for large scale time series use cases since the amount of data quickly becomes too difficult to analyze. Applying techniques such as machine learning based anomaly detection or Search AI with large language models helps to dramatically speed up problem identification and resolution. 
+
+[discrete]
+[[hot-frozen-hardware]]
+=== Recommended Hardware Specifications
+
+Elastic Cloud allows you to deploy clusters in AWS, Azure and Google Cloud.  Available hardware types and configurations vary across all three cloud providers but each provides instance types that meet our recommendations for the node types used in this architecture. For more details on these instance types, see our documentation on Elastic Cloud hardware for https://www.elastic.co/guide/en/cloud/current/ec-default-aws-configurations.html[AWS], https://www.elastic.co/guide/en/cloud/current/ec-default-azure-configurations.html[Azure], and https://www.elastic.co/guide/en/cloud/current/ec-default-gcp-configurations.html[GCP]. The **Physical** column below is guidance, based on the cloud node types, when self-deploying Elasticsearch in your own data center.
+
+In the links provided above, elastic has performance tested hardware for each of the cloud providers to find the optimal hardware for each node type. We use ratios to represent the best mix of CPU, Ram, and Disk for each type.   In some cases the CPU to RAM ratio is key, in others the disk to memory ratio and type of disk is critical.   Significantly deviating from these ratios may look like a way to save on hardware costs, but almost always results in an Elasticsearch cluster that does not scale and perform well.
+
+The following table shows our specific recommendations for nodes in Hot / Frozen architecture. 
+
+[cols="10, 10, 10, 10, 10"]
+|===
+| **Type** | **AWS** | **Azure** | **GCP** | **Physical**
+|image:images/hot.png["Hot data node"] | 
+c6gd |
+f32sv2|
+
+
+N2|
+16-32 vCPU +
+64 GB RAM +
+2-6 TB NVMe SSD
+
+|image:images/frozen.png["Frozen data node"]
+| 
+i3en
+|
+e8dsv4
+|
+N2|
+8 vCPU +
+64 GB RAM +
+6-20+ TB NVMe SSD +
+Depending on days cached
+|image:images/machine-learning.png["Machine learning node"]
+| 
+m6gd
+|
+f16sv2
+|
+N2|
+16 vCPU +
+64 GB RAM +
+256 GB SSD
+|image:images/master.png["Master node"]
+| 
+c5d
+|
+f16sv2
+|
+N2|
+8 vCPU +
+16 GB RAM +
+256 GB SSD
+|image:images/kibana.png["Kibana node"]
+| 
+c6gd
+|
+f16sv2
+|
+N2|
+8-16 vCPU +
+8 GB RAM +
+256 GB SSD
+|===
+
+[discrete]
+[[hot-frozen-considerations]]
+=== Important considerations
+
+* Updating Data:
+** Typically, time series logging use cases are append-only and there is rarely a need to update documents. The frozen tier is read-only.
+* Multi-AZ Frozen Tier:
+** Three availability zones is ideal, but at least two availability zones are recommended to ensure that there will be data nodes available in the event of an AZ failure.
+* Shard Management: 
+** The most important foundational step to maintaining performance as you scale is proper shard management. This includes even shard distribution amongst nodes, shard size, and shard count. For a complete understanding of what shards are and how they should be used please review this documentation on https://www.elastic.co/guide/en/elasticsearch/reference/current/size-your-shards.html[sizing your shards].
+* Snapshots:
+** If auditable or business critical events are being logged a backup is necessary.  The choice to backup data will be up to each business's needs and requirements. Please see this documentation if you need to create a https://www.elastic.co/guide/en/elasticsearch/reference/current/snapshots-register-repository.html[snapshot repository].
+** To automate snapshots and attach to Index Lifecycle Management policies see https://www.elastic.co/guide/en/elasticsearch/reference/current/snapshots-take-snapshot.html#automate-snapshots-slm[SLM (Snapshot Lifecycle Management)]
+* Kibana:
+** If self-deploying outside of Elasticsearch Service ensure Kibana is configured for https://www.elastic.co/guide/en/kibana/current/production.html#high-availability[high availability].
+
+[discrete]
+[[hot-frozen-estimate]]
+=== How many nodes of each do you need?
+It depends on:
+* The type of data being ingested (e.g., logs, metrics, traces)
+* The retention of searchable data (e.g., 30 days, 90 days, 1 year)
+* The amount of data you need to ingest each day.
+
+You can https://www.elastic.co/contact[contact us] for an estimate and recommended configuration based on your specific scenario.
+
+[discrete]
+[[hot-frozen-resources]]
+=== Resources and references
+
+* https://www.elastic.co/guide/en/elasticsearch/reference/current/scalability.html[Elasticsearch - Get ready for production]
+
+* https://www.elastic.co/guide/en/cloud/current/ec-prepare-production.html[Elastic Cloud - Preparing a deployment for production]
+
+* https://www.elastic.co/guide/en/elasticsearch/reference/current/size-your-shards.html[Size your shards]
@@ -0,0 +1,6 @@
+[[reference-architectures]]
+= Reference architectures
+
+include::reference-architecture-overview.asciidoc[]
+
+include::hot-frozen.asciidoc[]
@@ -0,0 +1,35 @@
+[[reference-architecture-overview]]
+== Reference architectures
+
+Elasticsearch reference architectures are blueprints for deploying Elasticsearch clusters tailored to different use cases. Whether you're handling logs or metrics these reference architectures focus on scalability, reliability, and efficient resource utilization. Use these guidelines to deploy Elasticsearch for your use case.
+
+These architectures are designed by architects and engineers to provide standardized, proven solutions that help users follow best practices when deploying Elasticsearch. Some of the key areas of focus are listed below. 
+
+* High availability 
+* Scalability
+
+TIP: These architectures are specific to running your deployment on-premises or cloud. If you are using Elastic serverless your Elasticsearch clusters are autoscaled and fully managed by Elastic. For all the deployment options, see https://www.elastic.co/guide/en/elasticsearch/reference/current/elasticsearch-intro-deploy.html[Run Elasticsearch].
+
+These reference architectures are recommendations and should be adapted to fit your specific environment and needs. Each solution can vary based on the unique requirements and conditions of your deployment. In these architectures we discuss about how to deploy cluster components. For information about designing ingest architectures to feed content into your cluster, refer to https://www.elastic.co/guide/en/ingest/current/use-case-arch.html[Ingest architectures]
+
+[discrete]
+[[reference-architectures-time-series]]
+=== Architectures
+
+[cols="50, 50"]
+|===
+| *Architecture* | *When to use*
+| <<hot-frozen-architecture>>
+
+A high availability architecture that is cost optimized for large time-series datasets. 
+
+a| 
+* Have a requirement for cost effective long term data storage (many months or years) 
+* Provide insights and alerts using logs, metrics, traces, or various event types to ensure optimal performance and quick issue resolution for applications.
+* Apply Machine Learning and Search AI to assist in dealing with the large amount of data.
+* Deploy an architecture model that allows for maximum flexibility between storage cost and performance.
+
+
+|===
+
+include::hot-frozen.asciidoc[]