Contributing

First, thank you for contributing to Vector! The goal of this document is to clearly provide everything you need to start contributing to Vector. The following TOC is sorted in a progressive fashion, starting with the basics and expanding into more specifics.

Assumptions
Workflow
- Git Branches
- Git Commits
  - Style
  - Signing
- Github Pull Requests
  - Title
  - Merging
- CI
Development
FAQ

Assumptions

You are familiar with the docs.
You know about the Vector community, use this for help.

Workflow

Git Branches

All changes must be made in a branch and submitted as pull requests. Vector does not adopt any type of branch naming style, but please use something descriptive of your changes.

Git Commits

Style

Please ensure your commits are small and focused; they should tell a story of your change. This helps reviewers to follow your changes, especially for more complex changes.

Signing

Your commits must include a DCO signature. This is simpler than it sounds; it just means that all of your commits must contain:

Signed-off-by: Joe Smith <joe.smith@email.com>

Git makes this easy by adding the -s or --signoff flags when you commit:

git commit -sm 'My commit message'

We also included a make signoff target that handles this for you if you forget.

Github Pull Requests

Once your changes are ready you must submit your branch as a pull
request.

Title

The pull request title must follow the format outlined in the conventional
commits spec (see the "What is conventional commits?" FAQ). A list of allowed sub categories is defined here.

The follow are all good examples of pull request titles:

feat(new sink): new `xyz` sink
feat(tcp source): add foo bar baz feature
fix(tcp source): fix foo bar baz bug
chore: improve build process
docs: fix typos

Merging

At least one Vector team member must approve your work before merging. All pull requests are squashed and merged.

CI

Currently Vector uses CircleCI. The build process is defined in /.circleci/config.yml. This delegates heavily to the distribution/docker folder where Docker images are defined for all of our testing, building, verifying, and releasing.

Tests are run for all changes, and Circleci is reponsible for releasing updated versions of Vector through various channels.

Development

Setup

Install Rust via rustup:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Install Docker. Docker containers are used for mocking Vector's integrations.
Install Ruby and Bundler 2. They are used to build Vector's documentation.

The Basics

Directory Structure

/benches - Internal benchmarks.
/config - Public facing Vector config, included in releases.
/distribution - Distribution artifacts for various targets.
/docs - https://docs.vector.dev source.
/lib - External libraries that do not depend on vector but are used within the project.
/proto - Protobuf definitions.
/scripts - Scripts used to generate docs and maintain the repo.
/tests - Various high-level test cases.

Makefile

Vector includes a Makefile in the root of the repo. This serves as a high-level interface for common commands. Running make will produce a list of make targets with descriptions. These targets will be referenced throughout this document.

Code Style

We use rustfmt on stable to format our code and CI will verify that your code follows this format style. To run the following command make sure rustfmt has been installed on the stable toolchain locally.

# To install rustfmt
rustup component add rustfmt

# To format the code
make fmt

Documentation

Vector places high importance on documentation. As such, most of the Vector documentation is auto-generated via the make generate command. As a developer you do not need to understand the intricate details of the generation system, but you do need to understand how to use it.

All source, transform, and sink

Changelog

Developers do not need to maintain the Changelog. This is automatically generated via the make release command. This is made possible by the use of conventional commit titles.

Building a sink

Healthchecks

Sinks may implement a healthcheck as a means for validating their configuration against the envionment and external systems. Ideally, this allows the system to inform users of problems such as insufficient credentials, unreachable endpoints, non-existant tables, etc. They're not perfect, however, since it's impossible to exhaustively check for issues that may happen at runtime.

Guidelines for writing healthchecks

When implementing healthchecks, we prefer false positives to false negatives. This means we would prefer that a healthcheck pass and the sink then fail than to have the healthcheck fail when the sink would have been able to run successfully.

A common cause of false negatives in healthchecks is performing an operation that the sink itself does not need. For example, listing all of the available S3 buckets and checking that the configured bucket is in that list. The S3 sink doesn't need the ability to list all buckets, and a user that knows that may not have given it permission to do so. In that case, the healthcheck will fail due to bad credentials even through its credentials are sufficient for normal operation.

This leads to a general strategy of mimicking what the sink itself does. Unfortunately, the fact that healthchecks don't have real events available to them leads to some limitations here. The most obvious example of this is with sinks where the exact target of a write depends on the value of some field in the event (e.g. an interpolated Kinesis stream name). It also pops up for sinks where incoming events are expected to conform to a specific schema. In both cases, random test data is reasonably likely to trigger a potentially false negative result. Even in simpler cases, we need to think about the effects of writing test data and whether the user would find that surprising or invasive. The answer usually depends on the system we're interfacing with.

In some cases, like the Kinesis example above, the right thing to do might be nothing at all. If we require dynamic information to figure out what entity (i.e. Kinesis stream in this case) that we're even dealing with, odds are very low that we'll be able to come up with a way to meaningfully validate that it's in working order. It's perfectly valid to have a healthcheck that falls back to doing nothing when there is a data dependency like this.

With all that in mind, here is a simple checklist to go over when writing a new healthcheck:

Does this check perform different fallible operations from the sink itself?
Does this check have side effects the user would consider undesirable (e.g. data pollution)?
Are there situations where this check would fail but the sink would operate normally?

Not all of the answers need to be a hard "no", but we should think about the likelihood that any "yes" would lead to false negatives and balance that against the usefulness of the check as a whole for finding problems. Because we have the option to disable individual healthchecks, there's an escape hatch for users that fall into a false negative circumstance. Our goal should be to minimize the likelihood of users needing to pull that lever while still making a good effort to detect common problems.

Testing

You can run Vector's tests via the make test command. Our tests use Docker compose to spin up mock services for testing, such as localstack.

Sample Logs

We use flog to build a sample set of log files to test sending logs from a file. This can be done with the following commands on mac with homebrew. Installation instruction for flog can be found here.

flog --bytes $((100 * 1024 * 1024)) > sample.log

This will create a 100MiB sample log file in the sample.log file.

Benchmarking

All benchmarks are placed in the /benchmarks folder. Yuo can run benchmarks via the make benchmarks command.

FAQ

What is conventional commits?

Conventional commits is a standardized format for commit messages. Vector only requires this format for commits on the master branch. And because Vector squashes commits before merging branches, this means that only the pull request title must conform to this format. Vector performs a pull request check to verify the pull request title in case you forget.

Do I need to update the changelog?

Nope! This is one of the primary reasons we use the conventional commits style. Before releasing Vector we'll automatically generate a changelog for the release.

What is a DCO?

DCO stands for Developer Certificate of Origin and is maintained by the Linux Foundation. It is an attestation attached to every commit made by every developer. It ensures that all committed code adheres to the Vector license (Apache 2.0).

Why does Vector adopt the DCO?

To protect the users of Vector. It ensures that all Vector contributors, and committed code, agree to the Vector license.

Why a DCO instead of a CLA?

It's simpler, clearer, and still protects users of Vector. We believe the DCO more accurately embodies the principles of open-source. More info can be found here:

Gitlab's switch to DCO
DCO vs CLA

What about trivial changes?

Trivial changes, such as spelling fixes, do not need to be signed.

Granted rights and copyright assignment

It is important to note that the DCO is not a license. The license of the project – in our case the Apache License – is the license under which the contribution is made. However, the DCO in conjunction with the Apache License may be considered an alternate CLA.

The existence of section 5 of the Apache License is proof that the Apache License is intended to be usable without CLAs. Users need for the code to be open source, with all the legal rights that implies, but it is the open source license that provides this. The Apache License provides very generous copyright permissions from contributors, and contributors explicitly grant patent licenses as well. These rights are granted to everyone.

If I’m contributing while an employee, do I still need my employer to sign something?

Nope! The DCO confirms that you are entitled to submit the code, which assumes that you are authorized to do so. It treats you like an adult and relies on your accurate statement about your rights to submit a contribution.

What if I forgot to sign my commits?

No probs! We made this simple with the signoff Makefile target:

make signoff

If you prefer to do this manually:

https://stackoverflow.com/questions/13043357/git-sign-off-previous-commits

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CONTRIBUTING.md

CONTRIBUTING.md

Contributing

Assumptions

Workflow

Git Branches

Git Commits

Style

Signing

Github Pull Requests

Title

Merging

CI

Development

Setup

The Basics

Directory Structure

Makefile

Code Style

Documentation

Changelog

Building a sink

Healthchecks

Guidelines for writing healthchecks

Testing

Sample Logs

Benchmarking

FAQ

What is conventional commits?

Do I need to update the changelog?

What is a DCO?

Why does Vector adopt the DCO?

Why a DCO instead of a CLA?

What about trivial changes?

Granted rights and copyright assignment

If I’m contributing while an employee, do I still need my employer to sign something?

What if I forgot to sign my commits?

Files

CONTRIBUTING.md

Latest commit

History

CONTRIBUTING.md

File metadata and controls

Contributing

Assumptions

Workflow

Git Branches

Git Commits

Style

Signing

Github Pull Requests

Title

Merging

CI

Development

Setup

The Basics

Directory Structure

Makefile

Code Style

Documentation

Changelog

Building a sink

Healthchecks

Guidelines for writing healthchecks

Testing

Sample Logs

Benchmarking

FAQ

What is conventional commits?

Do I need to update the changelog?

What is a DCO?

Why does Vector adopt the DCO?

Why a DCO instead of a CLA?

What about trivial changes?

Granted rights and copyright assignment

If I’m contributing while an employee, do I still need my employer to sign something?

What if I forgot to sign my commits?