Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate Java 21 and Jruby compatibility #15342

Closed
4 of 6 tasks
roaksoax opened this issue Sep 22, 2023 · 5 comments · Fixed by #15719
Closed
4 of 6 tasks

Investigate Java 21 and Jruby compatibility #15342

roaksoax opened this issue Sep 22, 2023 · 5 comments · Fixed by #15719
Assignees

Comments

@roaksoax
Copy link
Contributor

roaksoax commented Sep 22, 2023

Java 21 is now available and we would like to make it the default for Logstash. However, we need to investigate whether it is possible provided Jruby supports it.

Deprecation list: https://docs.oracle.com/en/java/javase/21/docs/api/deprecated-list.html
Dependant tasks:

Depending tasks:

Other Tasks

@andsel andsel mentioned this issue Dec 21, 2023
3 tasks
@andsel andsel linked a pull request Dec 21, 2023 that will close this issue
3 tasks
@andsel
Copy link
Contributor

andsel commented Jan 25, 2024

As reported in jruby/jruby#8061 (comment) JDK 21 LinkedHashMap introduce a new method (map), not present in JDK 17 and that interfere with JRuby map method.

@andsel
Copy link
Contributor

andsel commented Feb 8, 2024

As reported in jruby/jruby#8061 (comment) the fix will be included in JRuby 9.4.6.0.
The temporary fix would be to add

java.util.LinkedHashSet.remove_method(:map) rescue nil

in rspec bootstrap script:

require_relative "environment"

@andsel
Copy link
Contributor

andsel commented Mar 28, 2024

Analysis of removal of Preventive GC flag on Logstash

Definition of which problem preventive GC was intended to resolve

JDK 17 introduced the flag G1UsePreventiveGC to resolve a problem in G1 evacuation where there are a lot of short lived humongous objects (humongous means object occupation bigger than 1/2 of a region size). Discussed in https://tschatzl.github.io/2021/09/16/jdk17-g1-parallel-gc-changes.html the problem consists in 0 objects copied during evacuation phase because the count of such object raised so quickly and there isn't Eden or Survivor regions available to move, so needs a FullGC (that Stop The World) do to in-place compaction.
The flag was introduced to do some preventive unscheduled GC cycles to avoid reach the situation of humongous objects saturate the humongous regions, so essentially to preserve space to copy object during evacuation and avoid a FullGC.
With JDK 20 the flag was deprecated and defaulted to false, with JDK 21 is has been removed.

Elasticsearch use case

Elasticsearch data node load a lot of 4MB byte[] chuncks of data to be passed down to ML node(but happens also in other case, not limited only to ML case). This generate a lot of humogous allocations (humongous objects are object with size >= 1/2 of region size), in general a spike in allocations would generate an OOM error in the JVM, but ES is able to protect against it with a circuit breaker, and exactly that showed up with a lot circuit breaker exceptions with the memory stying high insted of getting freed and kept lower thanks to the G1 Preventive Collection phases.

How ES solved the issue
ES is resolving this trying to allocate less humongous objects.

Logstash use case

Logstash has some peculiarities:

  • allocation is governed by the environment, the clients push data into inputs or is pulled in from inputs.
  • there isn't any explicit circuit breaker to avoid memory exhaustion.
  • the limitation mechanism is the in-memory queue, where if the upstream is going too fast then it works as a bounding mechanism by blocking.

Queue full case
If the queue is full and is limiting the input, then at a certain point the allocation rate is not high, given that the references are in queue and stay there for relatively long periods, likely those objects transition into tenured regions (old generation) and doesn't have any benefit from preventive GCs.

So from this perspective having or not preventive GCs doesn't provide any improvement.

Queue empty and fast consumers
In this case the queue is almost full, consumers are able to cope with producers. When allocation rate is high and pipelines queues have enogth space to keep live all the events (big objects >= 2MB), being that there isn't any circuit breaker protection the preventive GCs offer limited relieve, JVM hosting Logstash is destined to go OOM without preemptively limiting the allocation rate.

Also in this case having or not preventive GCs doesn't provide improvements.

Considerations

Given the discussion above, preventive GCs doesn't play an important role for Logstash memory management.

How I've done some tests

Used the following pipeline, which is pretty fast and keeps the queue mostly empty:

input {
  http {
    response_headers => {"Content-Type" => "application/json"}
    ecs_compatibility => disabled
  }
}
output {
  sink {}
}  

Created a file of 4MB single line of text.
Run wrk with following Lua script:

wrk.method = "POST"
local f = io.open("input_sample.txt", "r")
wrk.body   = f:read("*all")
wrk --threads 4 --connections 12 -d10m -s wrk_send_file.lua --latency http://localhost:8080

@andsel
Copy link
Contributor

andsel commented Apr 3, 2024

Reopen because inadvertently closed by #15719

@andsel andsel reopened this Apr 3, 2024
@roaksoax
Copy link
Contributor Author

roaksoax commented Apr 9, 2024

Closing this issue since now Logstash will support JDK 21. The discussion to decide whether we make it default its followed on a different thread.

@roaksoax roaksoax closed this as completed Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants