From 59b0c8fa69d78182840bfd71adba8a19b18731dd Mon Sep 17 00:00:00 2001
From: Josh Liburdi
Date: Sat, 9 Dec 2023 18:18:54 +0000
Subject: [PATCH] docs: README
---
README.md | 29 ++++++++++++++---------------
1 file changed, 14 insertions(+), 15 deletions(-)
diff --git a/README.md b/README.md
index e28532c8..5eaffe80 100644
--- a/README.md
+++ b/README.md
@@ -5,7 +5,7 @@
src="https://github.com/brexhq/substation/blob/release/v1/.github/media/substation_logo.png" />
-Substation is a cloud-native, event-driven data pipeline toolkit designed for security teams.
+Substation is a cloud-native, event-driven data pipeline toolkit built for security teams.
@@ -17,13 +17,13 @@ src="https://github.com/brexhq/substation/blob/release/v1/.github/media/substati
Substation is inspired by data pipeline systems such as Logstash and Fluentd, but is built for modern security teams:
-- **Extensible Data Processing**: Build custom data processing pipeline systems and microservices using out-of-the-box applications and 100+ data transformation functions, or create your own written in Go.
+- **Extensible Data Processing**: Build data processing pipeline systems and microservices using out-of-the-box applications and 100+ data transformation functions, or create your own written in Go.
- **Route Data Across the Enterprise**: Conditionally route data to, from, and between AWS cloud services, including S3, Kinesis, SQS, and Lambda, or to any HTTP endpoint.
- **Bring Your Own Schema**: Format, normalize, and enrich event logs to comply with the Elastic Common Schema (ECS), Open Cybersecurity Schema Framework (OCSF), or any other schema.
- **Unlimited Data Enrichment**: Use external APIs to enrich event logs affordably and at scale with enterprise and threat intelligence, or build a microservice that reduces spend in expensive security APIs.
- **No Servers, No Maintenance**: Deploys as a serverless application in your AWS account, launches in minutes using Terraform, and requires no maintenance after deployment.
- **Runs Almost Anywhere**: Create applications that run on most platforms supported by Go and transform data consistently across laptops, servers, containers, and serverless cloud functions.
-- **High Performance, Low Cost**: Transform 100,000+ events per second while keeping cloud costs as low as a few cents per GB of data processed. Vendor solutions, like [Cribl](https://cribl.io/cribl-pricing/) and [Datadog](https://www.datadoghq.com/pricing/?product=observability-pipelines#products), can cost up to 10x more.
+- **High Performance, Low Cost**: Transform 100,000+ events per second while keeping cloud costs as low as a few cents per GB. Vendor solutions, like [Cribl](https://cribl.io/cribl-pricing/) and [Datadog](https://www.datadoghq.com/pricing/?product=observability-pipelines#products), can cost up to 10x more.
All of these data pipeline and microservice systems, and many more, can be built with Substation:
@@ -37,7 +37,7 @@ You can run Substation on these platforms:
- [macOS / Linux](https://substation.readme.io/v1.0.0/docs/try-substation-on-macos-linux)
- [AWS](https://substation.readme.io/v1.0.0/docs/try-substation-on-aws)
-When you're ready to deploy Substation in production, use the [AWS Lambda examples](examples/aws/lambda) as a starting point. These examples include common deployment patterns and demonstrate best practices for managing the system using Terraform and Jsonnet.
+When you're ready to deploy Substation in production, use the [AWS examples](examples/build/terraform/aws) as a starting point. These examples include common deployment patterns and demonstrate best practices for managing the system using Terraform and Jsonnet.
## Transforming Event Logs
@@ -162,12 +162,12 @@ Substation excels at formatting, normalizing, and enriching event logs. For exam
Substation can route data to several destinations from a single process and, unlike most other data pipeline systems,
data transformation and routing are functionally equivalent -- this means that data can be transformed or routed in any order.
-Below is an example where:
+In this configuration, data is:
-- Data (JSON array) is backed up to AWS S3
-- Events from the array are extracted into individual events and printed to stdout
-- Events are conditionally removed from the pipeline based on the value of a field
-- All remaining events are sent to an HTTPS endpoint
+- Written to AWS S3
+- Printed to stdout
+- Conditionally dropped (filtered, removed)
+- Sent to an HTTPS endpoint
```jsonnet
// The input is a JSON array of objects, such as:
@@ -183,11 +183,10 @@ local is_false = sub.cnd.str.eq(settings={ obj: { key: 'field3' }, string: 'fals
{
transforms: [
- // Pre-transformed data is backed up to S3.
+ // Pre-transformed data is written to an object in AWS S3 for long-term storage.
sub.tf.send.aws.s3(settings={ bucket_name: 'example-bucket-name' }),
- // The JSON array is split into individual events that are processed
- // individually by the remaining transforms. Each event is printed
- // to stdout.
+ // The JSON array is split into individual events that go through
+ // the remaining transforms. Each event is printed to stdout.
sub.tf.agg.from.array(),
sub.tf.send.stdout(),
// Events where field3 is false are removed from the pipeline.
@@ -224,9 +223,9 @@ local sub = import 'substation.libsonnet';
## Configuring Applications
-Substation applications share a common configuration profile and all transform functions behave identically whether they are run from a laptop, server, container, or AWS Lambda. This makes it easy to develop configuration changes locally, validate them in a build (CI/CD) pipeline, and run integration tests in a staging environment before deploying to production.
+Substation applications share configurations and all transform functions behave identically whether they are run from a laptop, server, container, or AWS Lambda. This makes it easy to develop configuration changes locally, validate them in a build (CI/CD) pipeline, and run integration tests in a staging environment before deploying to production.
-Configurations are written in Jsonnet and can be expressed as functional code; this simplifies version control and lets you build reusable data processing libraries. Compare the configuration below to similar configurations from Logstash and Fluentd:
+Configurations are written in Jsonnet and can be expressed as functional code; this makes management in version control easier and lets you build reusable data processing libraries. Compare the configuration below to similar configurations for Logstash and Fluentd: