Replies: 1 comment
-
Proposed ChangesThis is a non-definitive list of the proposed changes between v0 and v1. buildconfig/substation.libsonnet
cmdaws/lambda
development
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hey folks,
Over the past few months I've been tracking changes that are needed to get Substation from pre-release to v1.0. The applications (
cmd/
) were always production-ready, but the project has a significant amount of surface area in the public API that can be reduced and improved. I'm working on these changes in the release/v1 branch and think it's worth sharing the major breaking changes that may be coming in v1.0:Simplifying the Public API
The most significant change is that there is no longer differentiation between
processors
,transforms
, andsinks
. These are now alltransforms
. Transforms take one or more events and return zero or more events, and that's it. Nearly any function can be a transform.To start, I'm considering three types of transforms:
meta
: These transforms execute other transforms.proc
: These transforms modify (process) data.send
: These transforms send data to an external system.Since everything is now a
transform
, every function in the system can be chained together in any order. For example:send
data to AWS S3 for long-term storageproc
data by enriching it with an external servicesend
data to AWS Kinesis for real-time processingThe current design has tight coupling between
processors
,transforms
, andsinks
-- with the move totransforms
that coupling is gone.Removing Dependency on Channels
The system no longer provides channels in the public API nor relies on channels in internal packages for communication. The choice to use channels was based on existing data processing systems and recommendations for building in-memory ETL pipelines in Go, but they bring a lot of complexity to the system. Users can bring their own channels if they want to, and most of our pre-built applications will still use a channel for delivering data from a source to transforms.
This directly impacts how the system handles data flow. In the current design, all data goes from a source to
processors
to asink
using channels and the flow stops when the channels close. With the removal of channels we have to add a different type of flow control mechanism that I'm currently calling a control (ctrl
) message.ctrl
messages are sent through the data transform pipeline and are used to trigger specific behavior in each transform. For example, actrl
message can be used to trigger a transform to flush its internal state or to close a connection to an external system. These messages cannot contain data and are never handled as data by the transform -- they are only used to trigger behavior.Another advantage of this design over channels is that it's possible for users to directly control the flow of data using the same mechanisms that they are already used to (i.e., processing data). Instead of waiting for a channel to close to trigger behavior in a transform, users can send any number of
ctrl
messages to trigger that behavior at any time. This is especially useful for long-running data transforms that need to periodically flush their internal state or applications running in a stateful environment that need to asynchronously close gracefully.substation
PackageThe
substation
package is now the only package required to build applications. These new applications are much simpler than what is currently required (mostly due to the removal of channels). Seesubstation_test.go
for examples.Conditional Data Transformation
In the current design every
processor
supports an optionalcondition
-- this is no longer the case. Instead, ameta
transform exists that allows users to perform conditional data transformation using any number of conditions and transforms. This is more flexible because it supports more logic statements:if
(same as current design)if...else
if...elif...else
A shortcut for the original behavior will be available as a pattern in the Jsonnet library.
Removing Patterns from substation.libsonnet
Over the past year we've added several configuration patterns to our Jsonnet library. Most of these are gone now, and the majority of them won't come back as code directly supported in the project. We'll continue to add configuration examples as recipes in our documentation.
Bring Your Own Concurrency
Concurrency is required in the current design due to the reliance on channels, but in v1.0 it will be up to the user to decide how to execute transforms. This change also requires that all
transforms
are safe for concurrent use.In testing, this approach to concurrency makes applications more reliable, more flexible, and faster. The benchmark application shows that simple data transformation configurations (copies, inserts) are now ~50% faster than the current design:
100000 events in 301.993583ms (331132.86 events/sec)
100000 events in 605.160334ms (165245.46 events/sec)
What's Next
Anyone can test the new design by building the
release/v1
branch. All unit tests are passing and most features have been integration tested. The remaining work is to load test the redesigned applications in a production environment and verify there are no regressions in our Terraform modules. For migration, we'll most likely build a tool that can convert existing configurations to the new design.There's no time commitment on getting to v1.0, but we may get there by October 2023.
Beta Was this translation helpful? Give feedback.
All reactions