[Segment Replication] Higher refresh count for segrep enabled indices #7296

dreamer-89 · 2023-04-24T17:35:43Z

As part of benchmarking for Segment replication GA #5147 release, the benchmark results show higher number for backgroud refreshes and merge count (possibly side-effect of first). This difference increases with more data ingestion

Post run completion
Primary shard: ~34gb
Total size: ~70gb

Infra setup
10 m5.xlarge data nodes, 200GB storage, 8 GB JVM
3 cluster manager nodes


Metric		Document Replication	Segment Replication	Diff %

Cumulative indexing time of primary shards	min	178.935	206.136	13.20%
Min cumulative indexing time across primary shard	min	1.49993	1.76382	14.96%
Median cumulative indexing time across primary shard	min	1.74738	2.07351	15.73%
Max cumulative indexing time across primary shard	min	2.0436	2.35758	13.32%
Cumulative indexing throttle time of primary shards	min	0	0	#DIV/0!
Min cumulative indexing throttle time across primary shard	min	0	0	#DIV/0!
Median cumulative indexing throttle time across primary shard	min	0	0	#DIV/0!
Max cumulative indexing throttle time across primary shard	min	0	0	#DIV/0!
Cumulative merge time of primary shards	min	30.3864	119.524	74.58%
Cumulative merge count of primary shards		77	6751	98.86%
Min cumulative merge time across primary shard	min	0	0.8594	100.00%
Median cumulative merge time across primary shard	min	0.30523	1.12452	72.86%
Max cumulative merge time across primary shard	min	0.85787	1.84643	53.54%
Cumulative merge throttle time of primary shards	min	8.13938	11.7186	30.54%
Min cumulative merge throttle time across primary shard	min	0	0.04043	100.00%
Median cumulative merge throttle time across primary shard	min	0.10155	0.1112	8.68%
Max cumulative merge throttle time across primary shard	min	0.20813	0.28073	25.86%
Cumulative refresh time of primary shards	min	19.6532	83.47	76.45%
Cumulative refresh count of primary shards		1493	37564	96.03%
Min cumulative refresh time across primary shard	min	0.15265	0.69105	77.91%
Median cumulative refresh time across primary shard	min	0.19396	0.84068	76.93%
Max cumulative refresh time across primary shard	min	0.27413	0.97848	71.98%
Cumulative flush time of primary shards	min	2.50282	0.72735	-244.10%
Cumulative flush count of primary shards		200	200	0.00%
Min cumulative flush time across primary shard	min	0.00327	0.00152	-115.38%
Median cumulative flush time across primary shard	min	0.02454	0.00483	-408.64%
Max cumulative flush time across primary shard	min	0.06605	0.04875	-35.49%
Total Young Gen GC time	s	94.818	26.689	-255.27%
Total Young Gen GC count		2600	1657	-56.91%
Total Old Gen GC time	s	0	0	#DIV/0!
Total Old Gen GC count		0	0	#DIV/0!
Store size	GB	104.083	104.158	0.07%
Translog size	GB	69.4723	73.6275	5.64%
Heap used for segments	MB	0	0	#DIV/0!
Heap used for doc values	MB	0	0	#DIV/0!
Heap used for terms	MB	0	0	#DIV/0!
Heap used for norms	MB	0	0	#DIV/0!
Heap used for points	MB	0	0	#DIV/0!
Heap used for stored fields	MB	0	0	#DIV/0!
Segment count		1713	1807	5.20%
Min Throughput	docs/s	151404	167391	9.55%
Mean Throughput	docs/s	158532	172685	8.20%
Median Throughput	docs/s	154796	172995	10.52%
Max Throughput	docs/s	166173	174655	4.86%
50th percentile latency	ms	407.682	444.745	8.33%
90th percentile latency	ms	844.261	556.603	-51.68%
99th percentile latency	ms	1820.86	693.143	-162.70%
99.9th percentile latency	ms	3080.93	825.769	-273.10%
99.99th percentile latency	ms	3919.41	893.44	-338.69%
100th percentile latency	ms	4200.82	897.265	-368.18%
50th percentile service time	ms	407.682	444.745	8.33%
90th percentile service time	ms	844.261	556.603	-51.68%
99th percentile service time	ms	1820.86	693.143	-162.70%
99.9th percentile service time	ms	3080.93	825.769	-273.10%
99.99th percentile service time	ms	3919.41	893.44	-338.69%
100th percentile service time	ms	4200.82	897.265	-368.18%
error rate	%	0	0	0.00%

dreamer-89 · 2023-04-26T01:02:05Z

The benchmark comparison for so dataset also shows p50, p90 indexing latency hit by ~20%, ~40% respectively for segment replication.

Cluster setup

10 data node
3 cluster manager node
m5.xlarge instance type.

Steps:

Create docrep index with below settings

 {
    "settings": {
        "index": {
            "number_of_shards": 40,
            "number_of_replicas": 1,
            "replication.type": "DOCUMENT"
        }
    }
}

Ingest data using

opensearch-benchmark execute_test --pipeline benchmark-only --workload so --target-hosts "$host:80" --distribution-version=2.7.0 --include-tasks index-append --results-file=~/.benchmark/so_first_segrep --show-in-results=all --kill-running-processes;

Delete index and repeat for segrep.

One sample comparison below, baseline: docrep, contender: segrep

ubuntu@ip-172-31-49-145:~/OpenSearch-Benchmark$ opensearch-benchmark compare --baseline=8c33cbc6-6f0b-4e77-840e-4b16a0aabcc8  --contender=f615b039-6598-44f5-aab6-467d2671a6ba

   ____                  _____                      __       ____                  __                         __
  / __ \____  ___  ____ / ___/___  ____ ___________/ /_     / __ )___  ____  _____/ /_  ____ ___  ____ ______/ /__
 / / / / __ \/ _ \/ __ \\__ \/ _ \/ __ `/ ___/ ___/ __ \   / __  / _ \/ __ \/ ___/ __ \/ __ `__ \/ __ `/ ___/ //_/
/ /_/ / /_/ /  __/ / / /__/ /  __/ /_/ / /  / /__/ / / /  / /_/ /  __/ / / / /__/ / / / / / / / / /_/ / /  / ,<
\____/ .___/\___/_/ /_/____/\___/\__,_/_/   \___/_/ /_/  /_____/\___/_/ /_/\___/_/ /_/_/ /_/ /_/\__,_/_/  /_/|_|
    /_/


Comparing baseline
  TestExecution ID: 8c33cbc6-6f0b-4e77-840e-4b16a0aabcc8
  TestExecution timestamp: 2023-04-25 23:57:26
  TestProcedure: append-no-conflicts
  ProvisionConfigInstance: external

with contender
  TestExecution ID: f615b039-6598-44f5-aab6-467d2671a6ba
  TestExecution timestamp: 2023-04-26 00:13:05
  TestProcedure: append-no-conflicts
  ProvisionConfigInstance: external

------------------------------------------------------
    _______             __   _____
   / ____(_)___  ____ _/ /  / ___/_________  ________
  / /_  / / __ \/ __ `/ /   \__ \/ ___/ __ \/ ___/ _ \
 / __/ / / / / / /_/ / /   ___/ / /__/ /_/ / /  /  __/
/_/   /_/_/ /_/\__,_/_/   /____/\___/\____/_/   \___/
------------------------------------------------------

|                                                        Metric |         Task |   Baseline |   Contender |     Diff |   Unit |
|--------------------------------------------------------------:|-------------:|-----------:|------------:|---------:|-------:|
|                    Cumulative indexing time of primary shards |              |    115.845 |     137.057 |  21.2119 |    min |
|             Min cumulative indexing time across primary shard |              |     2.4691 |     2.74135 |  0.27225 |    min |
|          Median cumulative indexing time across primary shard |              |    2.85135 |     3.51008 |  0.65873 |    min |
|             Max cumulative indexing time across primary shard |              |    3.56195 |     4.11007 |  0.54812 |    min |
|           Cumulative indexing throttle time of primary shards |              |          0 |           0 |        0 |    min |
|    Min cumulative indexing throttle time across primary shard |              |          0 |           0 |        0 |    min |
| Median cumulative indexing throttle time across primary shard |              |          0 |           0 |        0 |    min |
|    Max cumulative indexing throttle time across primary shard |              |          0 |           0 |        0 |    min |
|                       Cumulative merge time of primary shards |              |          0 |     103.464 |  103.464 |    min |
|                      Cumulative merge count of primary shards |              |          0 |        1421 |     1421 |        |
|                Min cumulative merge time across primary shard |              |          0 |     1.64495 |  1.64495 |    min |
|             Median cumulative merge time across primary shard |              |          0 |     2.45208 |  2.45208 |    min |
|                Max cumulative merge time across primary shard |              |          0 |      3.7272 |   3.7272 |    min |
|              Cumulative merge throttle time of primary shards |              |          0 |     37.8906 |  37.8906 |    min |
|       Min cumulative merge throttle time across primary shard |              |          0 |    0.315733 |  0.31573 |    min |
|    Median cumulative merge throttle time across primary shard |              |          0 |    0.837842 |  0.83784 |    min |
|       Max cumulative merge throttle time across primary shard |              |          0 |     1.71068 |  1.71068 |    min |
|                     Cumulative refresh time of primary shards |              |    27.2749 |      51.285 |  24.0101 |    min |
|                    Cumulative refresh count of primary shards |              |        459 |        7021 |     6562 |        |
|              Min cumulative refresh time across primary shard |              |   0.529367 |    0.871567 |   0.3422 |    min |
|           Median cumulative refresh time across primary shard |              |   0.693942 |     1.35739 |  0.66345 |    min |
|              Max cumulative refresh time across primary shard |              |   0.839817 |     1.61723 |  0.77742 |    min |
|                       Cumulative flush time of primary shards |              |     3.0851 |    0.900667 | -2.18443 |    min |
|                      Cumulative flush count of primary shards |              |         40 |          92 |       52 |        |
|                Min cumulative flush time across primary shard |              |     0.0184 |      0.0064 |   -0.012 |    min |
|             Median cumulative flush time across primary shard |              |  0.0753083 |     0.01775 | -0.05756 |    min |
|                Max cumulative flush time across primary shard |              |    0.14295 |     0.06245 |  -0.0805 |    min |
|                                       Total Young Gen GC time |              |     30.535 |        7.63 |  -22.905 |      s |
|                                      Total Young Gen GC count |              |       1011 |         531 |     -480 |        |
|                                         Total Old Gen GC time |              |          0 |           0 |        0 |      s |
|                                        Total Old Gen GC count |              |          0 |           0 |        0 |        |
|                                                    Store size |              |    69.5802 |     98.3018 |  28.7217 |     GB |
|                                                 Translog size |              |    30.4832 |     28.3262 | -2.15695 |     GB |
|                                        Heap used for segments |              |          0 |           0 |        0 |     MB |
|                                      Heap used for doc values |              |          0 |           0 |        0 |     MB |
|                                           Heap used for terms |              |          0 |           0 |        0 |     MB |
|                                           Heap used for norms |              |          0 |           0 |        0 |     MB |
|                                          Heap used for points |              |          0 |           0 |        0 |     MB |
|                                   Heap used for stored fields |              |          0 |           0 |        0 |     MB |
|                                                 Segment count |              |        647 |         915 |      268 |        |
|                                                Min Throughput | index-append |    65101.5 |     75422.1 |  10320.6 | docs/s |
|                                               Mean Throughput | index-append |    66831.3 |     78430.6 |  11599.4 | docs/s |
|                                             Median Throughput | index-append |    66893.6 |     78448.2 |  11554.6 | docs/s |
|                                                Max Throughput | index-append |    67664.2 |     80842.5 |  13178.3 | docs/s |
|                                       50th percentile latency | index-append |    400.337 |     465.148 |   64.811 |     ms |
|                                       90th percentile latency | index-append |    555.599 |     762.042 |  206.443 |     ms |
|                                       99th percentile latency | index-append |    7831.66 |     1478.02 | -6353.63 |     ms |
|                                     99.9th percentile latency | index-append |     9041.4 |     2005.12 | -7036.28 |     ms |
|                                      100th percentile latency | index-append |    9376.91 |     2674.49 | -6702.41 |     ms |
|                                  50th percentile service time | index-append |    400.337 |     465.148 |   64.811 |     ms |
|                                  90th percentile service time | index-append |    555.599 |     762.042 |  206.443 |     ms |
|                                  99th percentile service time | index-append |    7831.66 |     1478.02 | -6353.63 |     ms |
|                                99.9th percentile service time | index-append |     9041.4 |     2005.12 | -7036.28 |     ms |
|                                 100th percentile service time | index-append |    9376.91 |     2674.49 | -6702.41 |     ms |
|                                                    error rate | index-append |          0 |           0 |        0 |      % |


-------------------------------
[INFO] SUCCESS (took 0 seconds)
-------------------------------

mch2 · 2023-05-22T16:26:30Z

The root cause here is the same as #7423 (comment). We intentionally disable shard idle with SR enabled indices because replicas cannot independently refresh themselves. The balance here is to bump up refresh interval with SR to slow refresh/merges and write larger segments per refresh cycle at a trade-off of replication lag.

mch2 · 2023-05-25T19:56:09Z

Closing this issue as we know why this is happening and is expected with lower refresh interval and segrep. Tracking zero replica optimization here.

dreamer-89 added discuss Issues intended to help drive brainstorming and decision making untriaged distributed framework Indexing Indexing, Bulk Indexing and anything related to indexing labels Apr 24, 2023

Rishikesh1159 removed the untriaged label Apr 25, 2023

anasalkouz added this to Segment Replication Apr 26, 2023

github-project-automation bot moved this to Todo in Segment Replication Apr 26, 2023

mch2 mentioned this issue May 4, 2023

[Segment Replication] Investigate lower p50/p90 index latencies with benchmark runs #7423

Closed

kotwanikunal assigned mch2 May 19, 2023

mch2 moved this from Todo to In Progress in Segment Replication May 22, 2023

mch2 mentioned this issue May 24, 2023

Segment Replication - Allow shard idle when there are no replicas in index #7736

Closed

6 tasks

mch2 closed this as completed May 25, 2023

github-project-automation bot moved this from In Progress to Done in Segment Replication May 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Segment Replication] Higher refresh count for segrep enabled indices #7296

[Segment Replication] Higher refresh count for segrep enabled indices #7296

dreamer-89 commented Apr 24, 2023

dreamer-89 commented Apr 26, 2023

mch2 commented May 22, 2023

mch2 commented May 25, 2023

[Segment Replication] Higher refresh count for segrep enabled indices #7296

[Segment Replication] Higher refresh count for segrep enabled indices #7296

Comments

dreamer-89 commented Apr 24, 2023

dreamer-89 commented Apr 26, 2023

mch2 commented May 22, 2023

mch2 commented May 25, 2023