Skip to content

Commit

Permalink
Merge branch 'main' into vimystic/restart-sidecar
Browse files Browse the repository at this point in the history
  • Loading branch information
vimystic authored Oct 28, 2024
2 parents c9349d4 + 20b4d27 commit 9425025
Show file tree
Hide file tree
Showing 24 changed files with 519 additions and 137 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/go.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ jobs:
- uses: actions/checkout@v3
- uses: actions/setup-go@v4
with:
go-version: '>=1.20.2'
go-version: '>=1.22'
- name: golangci-lint
uses: golangci/golangci-lint-action@v3
with:
Expand All @@ -29,6 +29,6 @@ jobs:
- uses: actions/checkout@v3
- uses: actions/setup-go@v4
with:
go-version: '>=1.20.2'
go-version: '>=1.22'
- name: unit tests
run: make test
7 changes: 3 additions & 4 deletions .github/workflows/manifests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,8 @@ jobs:
- uses: actions/checkout@v3
- uses: actions/setup-go@v4
with:
go-version: '>=1.20.2'
go-version: '>=1.22'
- run: make generate manifests

- uses: CatChen/check-git-status-action@v1
with:
fail-if-not-clean: true
- name: Ensure no changes
run: git diff --exit-code
25 changes: 0 additions & 25 deletions .github/workflows/strangelove-project-management.yaml

This file was deleted.

3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -26,4 +26,5 @@ testbin/*
*~

# Local temporary files
/tmp
/tmp
.vscode/settings.json
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# See rocksdb/README.md for instructions to update rocksdb version
FROM ghcr.io/strangelove-ventures/rocksdb:v7.10.2 AS rocksdb

FROM --platform=$BUILDPLATFORM golang:1.20-alpine AS builder
FROM --platform=$BUILDPLATFORM golang:1.23-alpine AS builder

RUN apk add --update --no-cache\
gcc\
Expand Down
93 changes: 59 additions & 34 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,67 +1,88 @@
# Cosmos Operator

[![Conforms to README.lint](https://img.shields.io/badge/README.lint-conforming-brightgreen)](https://github.com/strangelove-ventures/readme-dot-lint)
[![Project Status: Initial Release](https://img.shields.io/badge/repo%20status-active-green.svg?style=flat-square)](https://www.repostatus.org/#active)
[![GoDoc](https://img.shields.io/badge/godoc-reference-blue?style=flat-square&logo=go)](https://pkg.go.dev/github.com/strangelove-ventures/cosmos-operator)
[![Go Report Card](https://goreportcard.com/badge/github.com/strangelove-ventures/cosmos-operator)](https://goreportcard.com/report/github.com/strangelove-ventures/cosmos-operator)
[![License: Apache-2.0](https://img.shields.io/github/license/strangelove-ventures/cosmos-operator.svg?style=flat-square)](https://github.com/strangelove-ventures/cosmos-operator/blob/main/LICENSE)
[![Version](https://img.shields.io/github/tag/strangelove-ventures/cosmos-operator.svg?style=flat-square)](https://github.com/cosmos/strangelove-ventures/cosmos-operator)

Cosmos Operator is a [Kubernetes Operator](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/) for blockchains built with the [Cosmos SDK](https://github.com/cosmos/cosmos-sdk).
Cosmos Operator is a [Kubernetes Operator](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/) primarily for blockchains built with the [Cosmos SDK](https://github.com/cosmos/cosmos-sdk). It also supports [Penumbra](https://github.com/penumbra-zone/penumbra) and other chains which use [CometBFT](https://github.com/cometbft/cometbft) for consensus.

The long-term vision of this operator is to allow you to "configure it and forget it".
🌌 Why use Cosmos Operator?
=============================

## Motivation
Kubernetes ("K8") makes DevOps easier. Cosmos Operator makes Kubernetes easier for use in the Cosmos Ecosystem.

Kubernetes provides a foundation for creating highly-available, scalable, fault-tolerant applications.
Additionally, Kubernetes provides well-known DevOps patterns and abstractions vs.
traditional DevOps which often requires "re-inventing the wheel".
K8 provides a foundation for creating highly-available, scalable, fault-tolerant applications. It provides well-known DevOps patterns and abstractions (as opposed to traditional DevOps which often requires "re-inventing the wheel").

Furthermore, the Operator Pattern allows us to mix infrastructure with business logic,
Furthermore, the [Operator Pattern][] allows us to mix infrastructure with business logic,
thus minimizing human intervention and human error.

# Disclaimers

* Tested on Google's GKE and Bare-metal with Kubeadm. Although kubernetes is portable, we cannot guarantee or provide support for AWS, Azure, or other kubernetes providers.
* Requires a recent version of kubernetes: v1.23+.
* CosmosFullNode: The chain must be built from the [Cosmos SDK](https://github.com/cosmos/cosmos-sdk).
* CosmosFullNode: Validator sentries require a remote signer such as [horcrux](https://github.com/strangelove-ventures/horcrux).
* CosmosFullNode: The controller requires [heighliner](https://github.com/strangelove-ventures/heighliner) images. If you build your own image, you will need a shell `sh` and set the uid:gid to 1025:1025. If running as a validator sentry, you need `sleep` as well.
* CosmosFullNode: May not work for all Cosmos chains. (Some chains diverge from common conventions.) Strangelove has yet to encounter a Cosmos chain that does not work with this operator.
🌌🌌 Who benefits from Cosmos Operator?
=============================

People who'd like to use the [Operator Pattern][] to "configure it and forget it".

> The [operator pattern][] aims to capture the key aim of a human operator who is managing a service or set of services. Human operators who look after specific applications and services have deep knowledge of how the system ought to behave, how to deploy it, and how to react if there are problems.
> People who run workloads on Kubernetes often like to use automation to take care of repeatable tasks. The [operator pattern][] captures how you can write code to automate a task beyond what Kubernetes itself provides.

# CosmosFullNode CRD
🌌🌌🌌 What does Cosmos Operator do?
=============================

Cosmos Operator is a [Kubernetes Operator](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/) for blockchains built with the [Cosmos SDK](https://github.com/cosmos/cosmos-sdk). Write your own Custom Resource Definition ("CRD") as a `yaml` file and deploy with ease!


🌌🌌🌌🌌 How do I use Cosmos Operator?
=============================

## Quick Start

See the [quick start guide](./docs/quick_start.md).

Status: v1, stable
## CosmosFullNode CRD

CosmosFullNode is the flagship CRD. Its purpose is to deploy highly-available, fault-tolerant blockchain nodes.
CosmosFullNode is the flagship CRD. Its purpose is to deploy highly-available, fault-tolerant blockchain nodes.

The CosmosFullNode controller is like a StatefulSet for running Cosmos SDK blockchains.

A CosmosFullNode can be configured to run as an RPC node, a validator sentry, or a seed node. All configurations can
be used as persistent peers.
A CosmosFullNode can be configured to run as an RPC node, a validator sentry, or a seed node. All configurations can be used as persistent peers.

As of this writing, Strangelove has been running CosmosFullNode in production for many months.
As of this writing, Strangelove has been running CosmosFullNode in production for over a year.

[Minimal example yaml](./config/samples/cosmos_v1_cosmosfullnode.yaml)
## Samples

[Full example yaml](./config/samples/cosmos_v1_cosmosfullnode_full.yaml)
- [Minimal example yaml](./config/samples/cosmos_v1_cosmosfullnode.yaml)
- [Full example yaml](./config/samples/cosmos_v1_cosmosfullnode_full.yaml)
- [Penumbra example yaml](./config/samples/cosmos_v1_cosmosfullnode_penumbra.yaml)

## Support CRDs

These CRDs are part of the operator and serve to support CosmosFullNodes.

- [ScheduledVolumeSnapshot](./docs/scheduled_volume_snapshot.md)
- [StatefulJob](./docs/stateful_job.md)

### Why not a StatefulSet?

Each pod requires different config, such as peer settings in config.toml and mounted node keys. Therefore, a blanket
template as found in StatefulSet did not suffice.

Additionally, CosmosFullNode gives you more control over individual pod and pvc pairs vs. a StatefulSet to help
the human operator debug and recover from situations such as a corrupted PVCs.
Additionally, CosmosFullNode gives you more control over individual pod and pvc pairs vs. a StatefulSet to help the human operator debug and recover from situations such as a corrupted PVCs.

# Support CRDs

These CRDs are part of the operator and serve to support CosmosFullNodes.
🌌🌌🌌🌌🌌 Extras
=============================

* [ScheduledVolumeSnapshot](./docs/scheduled_volume_snapshot.md)
* [StatefulJob](./docs/stateful_job.md)

# Quick Start
# Disclaimers

See the [quick start guide](./docs/quick_start.md).
- Tested on Google's GKE and Bare-metal with `Kubeadm`. Although kubernetes is portable, we cannot guarantee or provide support for AWS, Azure, or other kubernetes providers.
- Requires a recent version of kubernetes: v1.23+.
- CosmosFullNode: The chain must be built from the [Cosmos SDK](https://github.com/cosmos/cosmos-sdk).
- CosmosFullNode: Validator sentries require a remote signer such as [horcrux](https://github.com/strangelove-ventures/horcrux).
- CosmosFullNode: The controller requires [heighliner](https://github.com/strangelove-ventures/heighliner) images. If you build your own image, you will need a shell `sh` and set the uid:gid to 1025:1025. If running as a validator sentry, you need `sleep` as well.
- CosmosFullNode: May not work for all Cosmos chains. (Some chains diverge from common conventions.) Strangelove has yet to encounter a Cosmos chain that does not work with this operator.

# Contributing

Expand All @@ -75,7 +96,7 @@ See the [best practices guide for CosmosFullNode](./docs/fullnode_best_practices

Disclaimer: Strangelove has not committed to these enhancements and cannot estimate when they will be completed.

- [ ] Scheduled upgrades. Set a halt height and image version. The controller performs a rolling update with the new image version after the committed halt height.
- [x] Scheduled upgrades. Set the upgrade height and image version, optionally setting halt height. The controller performs a rolling update with the new image version after the committed height.
- [x] Support configuration suitable for validator sentries.
- [x] Reliable, persistent peer support.
- [x] Quicker p2p discovery using private peers.
Expand Down Expand Up @@ -104,3 +125,7 @@ distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

---

[Operator Pattern]: https://kubernetes.io/docs/concepts/extend-kubernetes/operator/#operators-in-kubernetes
25 changes: 23 additions & 2 deletions api/v1/cosmosfullnode_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -303,14 +303,16 @@ type PodSpec struct {
type FullNodeProbeStrategy string

const (
FullNodeProbeStrategyNone FullNodeProbeStrategy = "None"
FullNodeProbeStrategyNone FullNodeProbeStrategy = "None"
FullNodeProbeStrategyReachable FullNodeProbeStrategy = "Reachable"
FullNodeProbeStrategyInSync FullNodeProbeStrategy = "InSync"
)

// FullNodeProbesSpec configures probes for created pods
type FullNodeProbesSpec struct {
// Strategy controls the default probes added by the controller.
// None = Do not add any probes. May be necessary for Sentries using a remote signer.
// +kubebuilder:validation:Enum:=None
// +kubebuilder:validation:Enum:=None;Reachable;InSync
// +optional
Strategy FullNodeProbeStrategy `json:"strategy"`
}
Expand Down Expand Up @@ -436,6 +438,8 @@ type ChainSpec struct {
Comet CometConfig `json:"config"`

// App configuration applied to app.toml.
// Although optional, it's highly recommended you configure this field.
// +optional
App SDKAppConfig `json:"app"`

// One of trace|debug|info|warn|error|fatal|panic.
Expand Down Expand Up @@ -559,6 +563,14 @@ type ChainVersion struct {
// The docker image for this version in "repository:tag" format. E.g. busybox:latest.
Image string `json:"image"`

// Version overrides for initContainers of the fullnode/sentry pods.
// +optional
InitContainers map[string]string `json:"initContainers"`

// Version overrides for containers of the fullnode/sentry pods.
// +optional
Containers map[string]string `json:"containers"`

// Determines if the node should forcefully halt at the upgrade height.
// +optional
SetHaltHeight bool `json:"setHaltHeight,omitempty"`
Expand Down Expand Up @@ -720,6 +732,10 @@ type ServiceSpec struct {
// Overrides for the single RPC service.
// +optional
RPCTemplate ServiceOverridesSpec `json:"rpcTemplate"`

// Overrides for default cluster domain name.
// +optional
ClusterDomain *string `json:"clusterDomain"`
}

// ServiceOverridesSpec allows some overrides for the created, single RPC service.
Expand All @@ -733,6 +749,11 @@ type ServiceOverridesSpec struct {
// +optional
Type *corev1.ServiceType `json:"type"`

// Setting this to "None" makes a "headless service" (no virtual IP), which is useful when direct endpoint connections are preferred and proxying is not required.
// If not set, defaults to "".
// +optional
ClusterIP *string `json:"clusterIP"`

// Sets endpoint and routing behavior.
// See: https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/#caveats-and-limitations-when-preserving-source-ips
// If not set, defaults to "Cluster".
Expand Down
28 changes: 27 additions & 1 deletion api/v1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 6 additions & 1 deletion cmd/versioncheck.go
Original file line number Diff line number Diff line change
Expand Up @@ -174,7 +174,12 @@ func checkVersion(
image = v.Image
}

thisPodImage := thisPod.Spec.Containers[0].Image
var thisPodImage string
for _, c := range thisPod.Spec.Containers {
if c.Name == "node" {
thisPodImage = c.Image
}
}
if thisPodImage != image {
return fmt.Errorf("image mismatch for height %d: %s != %s", height, thisPodImage, image)
}
Expand Down
Loading

0 comments on commit 9425025

Please sign in to comment.