You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
the size of the overall transitive dependencies kicked in is not negligible. The gem size reach 65Mb from the original 13MB, and just for AWS service provider. If in a foreseeable future it would include also GCP IAM, the size could increment even more.
shadowing of class names. Some transitive dependencies included (i.e: Netty) are also included by other plugins (like TCP or Beats inputs) and not necessarily at the same version. Due to the flat classpath, shared between core and all plugins, this could pose some binary compatibility issues that aren't easy to resolve, or simply couldn't be resolved.
Shadowing of class names (pollution of classpath)
This typical problem can be resolved in two ways:
shading the conflicting dependencies under a specific package name (like something related to the plugin).
implement a child classloader that could handle loading of classes for each plugin.
Following, we are going to discuss pros and cons of each solution.
Shading dependencies
Shading all transitive dependencies under a package name related to the plugin is doable by leveraging the Gradle's GradleUp/shadow plugin https://github.com/GradleUp/shadow.
Before apply the shading we have to keep in mind some points:
don't include libraries that have pretty stable APIs, those that doesn't change during the development of minor releases. For example the API changes to Netty from version 4.1.nand 4.1.n+20 suppose, shouldn't change so much (like also log4j2-api), so those can be kept out of shading process.
don't shade class names that can be used in configuration. In the logstash-input-kafkaAWS IAM authentication logstash-plugins/logstash-integration-kafka#178 there are a couple of settings (sasl_jaas_config and sasl_client_callback_handler_class) that receives public AWS classes (like software.amazon.msk.auth.iam.IAMLoginModule and software.amazon.msk.auth.iam.IAMClientCallbackHandler). Those classes are an implicit interface used by user that couldn't be shaded.
some shading best practices include to create a new library (or submodule) for each shaded tree. For example, if a project depends on library software.amazon.msk:aws-msk-iam-auth:2.2.0 a new submodule (like input-kafka-aws-msk-iam-auth-shaded) which only contains the dependency to shade and shading instructions, must be created.
Pros & cons
Pros:
easier to implement, there is a Gradle plugin that can do this in excellent way.
no headaches in get into the classloading mechanism of Java and JRuby.
Cons:
changes has to be made on each plugin that uses external jars.
attention has to be made to not shade (full) class names that are also used in configuration strings.
not all dependencies has good reason to be shaded, for example netty or log4j2-api, which has pretty stable API.
dimension of the uber-jar.
could require to handle the publishing of the uber-jar .
minimal class reuse and potential class space explosion.
Classloader per plugin
Another way to solve the classpath pollution problem is to isolate each plugin in its own classloader, so that different versions of same class can co-exists under different classloaders, one per plugin.
This is harder to implement because nests and intertwines into the operation of JRuby. This classloader segregation has to work also for mixed plugins, plugins that are ruby gems, contains some shim Ruby code but bundles jar classes used to operate. In this context, the classloader should co-operate with JRuby so that when the JRuby plugin's code load a Java class it uses the segregation classloader and not the standard JRuby classloading.
There is more things to understand if an how it's feasible, plus classloading code could be hard.
Pros & cons
Pros:
more elegant way to solve the problem.
no need to change plugins build script.
no need to potentially ship and handle the lifecycle of the uber-jar.
Cons:
nesting into the classloaders hierarchy could be tough.
doesn't eliminate the problem of big gems with a lot of transitive dependencies, because those have to be however vendored in the gem.
remains the need to select which jar dependencies has to be vendored.
Size of the generated plugin gem
The other side of this problem regards the size of the gem can reach. In particular, like in SASL configuration use case, a gem could reach considerable dimension just to ship transitive dependencies that are needed in specific settings. Probably in majority of the uses cases those dependencies are not used but the users pays the penalty to use a gem bundled with everything.
To limit the size of transitive dependencies in a gem the idea is to offload those not mandatory into another artifact, for example:
another gem that can be installed once needed.
an uber-jar that can be downloaded and put into the class loader's path.
Gem with optional dependencies
In this case the plugin that has optional dependencies would also generate and publish (how?) on rubygems a set of additional gems containing the full set of transitive dependencies required just for optional behaviours. In such case the Logstash plugin documentation has to be updated to explain how and when those optional artifacts needs to be installed.
The answer for when question is: the documentation itself.
The question related to how, an extension to bin/logstash-plugin tool can be imagined, so that it can installs also those kind of extension gems.
Uber-jar that can be downloaded and put into the class loader's path
In this case a strategy similar to JDBC plugin driver could be followed, asking the user to manually download an uber-jar which contains all the transitive dependencies, and set a path into the plugin configuration so that it can explicitly load the jar, like in:
requireuber.jar
This poses a couple of questions: where to publish the uber jar and when to update it.
Where to publish?
As first though, being a Java jar, the answer could be the Maven repository, but has to be checked if there are any limitation to the size of the jar that can be uploaded, and is strictly related to the next question.
When to update it?
A new version of the uber-jar should be published when the requiring plugin updates its library dependency version and release a new gem version. The uber-jar creation and publish could be either a task of the Gradle build script or could be shaped as an external CI pipeline, to be manually triggered. The optimal solution is to automate also that step, so that the plugin developer doesn't have to remember that step.
The text was updated successfully, but these errors were encountered:
I have less concern about the size of the plugin and will try to avoid the complexity of the ClassLoader. Adding support for GCP and Azure may add another 100MB. With today's network speed, it increases the download time from a few seconds to minutes.
My concern is the reliability of JRuby and Ruby Maven, which give us build issues from time to time. I would try to avoid further integration with JRuby class loading. Shading dependencies is a durable solution with less follow-up maintenance work
Abstract
From the discussion of logstash-plugins/logstash-integration-kafka#178 (comment) during the development of a PR to include AWS IAM as a SASL mechanism in Kafka integration plugin.
Adding that extension posed the problems:
Shadowing of class names (pollution of classpath)
This typical problem can be resolved in two ways:
Following, we are going to discuss pros and cons of each solution.
Shading dependencies
Shading all transitive dependencies under a package name related to the plugin is doable by leveraging the Gradle's
GradleUp/shadow
plugin https://github.com/GradleUp/shadow.Before apply the shading we have to keep in mind some points:
4.1.n
and4.1.n+20
suppose, shouldn't change so much (like also log4j2-api), so those can be kept out of shading process.logstash-input-kafka
AWS IAM authentication logstash-plugins/logstash-integration-kafka#178 there are a couple of settings (sasl_jaas_config
andsasl_client_callback_handler_class
) that receives public AWS classes (likesoftware.amazon.msk.auth.iam.IAMLoginModule
andsoftware.amazon.msk.auth.iam.IAMClientCallbackHandler
). Those classes are an implicit interface used by user that couldn't be shaded.software.amazon.msk:aws-msk-iam-auth:2.2.0
a new submodule (likeinput-kafka-aws-msk-iam-auth-shaded
) which only contains the dependency to shade and shading instructions, must be created.Pros & cons
Pros:
Cons:
Classloader per plugin
Another way to solve the classpath pollution problem is to isolate each plugin in its own classloader, so that different versions of same class can co-exists under different classloaders, one per plugin.
This is harder to implement because nests and intertwines into the operation of JRuby. This classloader segregation has to work also for mixed plugins, plugins that are ruby gems, contains some shim Ruby code but bundles jar classes used to operate. In this context, the classloader should co-operate with JRuby so that when the JRuby plugin's code load a Java class it uses the segregation classloader and not the standard JRuby classloading.
There is more things to understand if an how it's feasible, plus classloading code could be hard.
Pros & cons
Pros:
Cons:
Size of the generated plugin gem
The other side of this problem regards the size of the gem can reach. In particular, like in SASL configuration use case, a gem could reach considerable dimension just to ship transitive dependencies that are needed in specific settings. Probably in majority of the uses cases those dependencies are not used but the users pays the penalty to use a gem bundled with everything.
To limit the size of transitive dependencies in a gem the idea is to offload those not mandatory into another artifact, for example:
Gem with optional dependencies
In this case the plugin that has optional dependencies would also generate and publish (how?) on rubygems a set of additional gems containing the full set of transitive dependencies required just for optional behaviours. In such case the Logstash plugin documentation has to be updated to explain how and when those optional artifacts needs to be installed.
The answer for when question is: the documentation itself.
The question related to how, an extension to
bin/logstash-plugin
tool can be imagined, so that it can installs also those kind ofextension
gems.Uber-jar that can be downloaded and put into the class loader's path
In this case a strategy similar to JDBC plugin driver could be followed, asking the user to manually download an uber-jar which contains all the transitive dependencies, and set a path into the plugin configuration so that it can explicitly load the jar, like in:
This poses a couple of questions: where to publish the uber jar and when to update it.
Where to publish?
As first though, being a Java jar, the answer could be the Maven repository, but has to be checked if there are any limitation to the size of the jar that can be uploaded, and is strictly related to the next question.
When to update it?
A new version of the uber-jar should be published when the requiring plugin updates its library dependency version and release a new gem version. The uber-jar creation and publish could be either a task of the Gradle build script or could be shaped as an external CI pipeline, to be manually triggered. The optimal solution is to automate also that step, so that the plugin developer doesn't have to remember that step.
The text was updated successfully, but these errors were encountered: