Skip to content

Commit

Permalink
Changes to add health-sync container to the task definition (#255)
Browse files Browse the repository at this point in the history
* Changes to add health-sync container to the task definition

* Address comments
  • Loading branch information
Ganeshrockz authored Dec 27, 2023
1 parent 20bbbb1 commit 3a5c0ac
Show file tree
Hide file tree
Showing 6 changed files with 131 additions and 36 deletions.
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
## Unreleased

BREAKING CHANGES
* Following are the changes made to the task definitions for `mesh-task` and `gateway-task` submodules to react to the changes made in [this](https://github.com/hashicorp/consul-ecs/pull/211) PR.
- Removes the `consul-ecs-control-plane` container from the task definition and adds a new `consul-ecs-mesh-init` container which will be responsible for setting up mesh on ECS.
- Adds a new container named `consul-ecs-health-sync` to the task definition which will be responsible for syncing back ECS container health checks into Consul. This container will wait for a successful exit of `consul-ecs-mesh-init` container before starting.

FEATURES
* Add support for provisioning API gateways as ECS tasks [[GH-234](https://github.com/hashicorp/terraform-aws-consul-ecs/pull/234)]
- Add `api-gateway` as an acceptable `kind` input.
Expand Down
15 changes: 11 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,16 +16,20 @@ additional containers known as sidecar containers to your task definition.

Specifically, it adds the following containers:

* `consul-ecs-control-plane`Runs for the full lifecycle of the task.
* `consul-ecs-mesh-init`This is the first container that starts up inside an ECS task. This is short lived.
* At startup it connects to the available Consul servers and performs a login with the configured IAM Auth method to obtain an ACL token with appropriate privileges.
* Using the token, it registers the service and proxy entities to Consul's catalog.
* It then bootstraps the configuration JSON required by the Consul dataplane container and writes it to a shared volume.
* After this, the container enters into its reconciliation loop where it periodically syncs the health of ECS containers into Consul.
* Upon receiving SIGTERM, it marks the corresponding service instance in Consul as unhealthy and waits for the dataplane container to shutdown.
* Finally, it deregisters the service and proxy entities from Consul's catalog and performs a Consul logout.
* After this point the container exits.
* `consul-dataplane` – Runs for the full lifecycle of the task. This container runs
the [Consul dataplane](https://github.com/hashicorp/consul-dataplane) that configures and starts the Envoy proxy, which controls all the service mesh traffic. All requests to and from the application run through
the proxy.
* `consul-ecs-health-sync` - Runs for the full lifecycle of the task. This container is primarily responsible for syncing back ECS container health into Consul.
* At startup it connects to the available Consul servers and performs a login with the configured IAM Auth method to obtain an ACL token with appropriate privileges.
* Using the token it fetches the Consul health checks registered by the `mesh-init` container when registering the service/proxy to Consul.
* After this, the container enters into its reconciliation loop where it periodically syncs the health of ECS containers into Consul.
* Upon receiving SIGTERM, it marks the corresponding service instance in Consul as unhealthy and waits for the dataplane container to shutdown.
* Finally, it deregisters the service and proxy entities from Consul's catalog and performs a Consul logout.

The `controller` module runs a controller that automatically provisions ACL tokens
for tasks on the mesh. It also deregisters service instances from Consul for missing/finished tasks in ECS.
Expand All @@ -47,6 +51,9 @@ See https://www.consul.io/docs/ecs.
* [dev-server](https://github.com/hashicorp/terraform-aws-consul-ecs/blob/main/modules/dev-server) [**For Development/Testing Only**]: This module deploys a Consul server onto your ECS Cluster
for development/testing purposes. The server does not have persistent storage and so is not suitable for production deployments.

* [gateway-task](https://github.com/hashicorp/terraform-aws-consul-ecs/blob/main/modules/gateway-task): This module creates an [ECS Task Definition](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definitions.html)
that adds required containers to deploy a Consul gateway (API/Mesh/Terminating) as a ECS workload.

* [controller](https://github.com/hashicorp/terraform-aws-consul-ecs/blob/main/modules/controller): This modules deploys a controller that automatically provisions ACL tokens
for services on the Consul service mesh. It also keeps an eye on the tasks and deregisters the service instances of those tasks that go missing or get finished.

Expand Down
67 changes: 56 additions & 11 deletions modules/gateway-task/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -80,11 +80,11 @@ resource "aws_ecs_task_definition" "this" {
concat(
[
{
name = "consul-ecs-control-plane"
name = "consul-ecs-mesh-init"
image = var.consul_ecs_image
essential = false
logConfiguration = var.log_configuration
command = ["control-plane"]
command = ["mesh-init"]
mountPoints = [
local.consul_data_mount_read_write,
{
Expand All @@ -104,12 +104,6 @@ resource "aws_ecs_task_definition" "this" {
linuxParameters = {
initProcessEnabled = true
}
healthCheck = {
command = ["CMD-SHELL", "curl -f localhost:10000/consul-ecs/health"] # consul-ecs-control-plane exposes a listener on 10000 to indicate it's readiness
interval = 30
retries = 10
timeout = 5
}
secrets = flatten(
concat(
var.tls ? [
Expand Down Expand Up @@ -139,7 +133,7 @@ resource "aws_ecs_task_definition" "this" {
essential = true
logConfiguration = var.log_configuration
entryPoint = ["/consul/consul-ecs", "envoy-entrypoint"]
command = ["consul-dataplane", "-config-file", "/consul/consul-dataplane.json"] # consul-ecs-control-plane dumps the dataplane's config into consul-dataplane.json
command = ["consul-dataplane", "-config-file", "/consul/consul-dataplane.json"] # consul-ecs-mesh-init dumps the dataplane's config into consul-dataplane.json
portMappings = [
{
containerPort = local.lan_port
Expand All @@ -152,8 +146,8 @@ resource "aws_ecs_task_definition" "this" {
]
dependsOn = [
{
containerName = "consul-ecs-control-plane"
condition = "HEALTHY"
containerName = "consul-ecs-mesh-init"
condition = "SUCCESS"
},
]
healthCheck = {
Expand All @@ -174,6 +168,57 @@ resource "aws_ecs_task_definition" "this" {
hardLimit = 1048576
}]
},
{
name = "consul-ecs-health-sync"
image = var.consul_ecs_image
essential = false
logConfiguration = var.log_configuration
command = ["health-sync"]
user = "5996"
portMappings = []
mountPoints = [
local.consul_data_mount
]
dependsOn = [
{
containerName = "consul-ecs-mesh-init"
condition = "SUCCESS"
}
]
cpu = 0
volumesFrom = []
environment = [
{
name = "CONSUL_ECS_CONFIG_JSON",
value = local.encoded_config
}
]
linuxParameters = {
initProcessEnabled = true
}
secrets = flatten(
concat(
var.tls ? [
concat(
local.https_ca_cert_arn != "" ? [
{
name = "CONSUL_HTTPS_CACERT_PEM"
valueFrom = local.https_ca_cert_arn
},
] : [],
local.grpc_ca_cert_arn != "" ? [
{
name = "CONSUL_GRPC_CACERT_PEM"
valueFrom = local.grpc_ca_cert_arn
}
] : [],
[]
)
] : [],
[]
)
)
},
],
)
)
Expand Down
3 changes: 1 addition & 2 deletions modules/gateway-task/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ variable "consul_server_hosts" {
}

variable "skip_server_watch" {
description = "Set this to true to prevent the consul-dataplane and consul-ecs-control-plane containers from watching the Consul servers for changes. This is useful for situations where Consul servers are behind a load balancer."
description = "Set this to true to prevent the consul-dataplane and consul-ecs-health-sync containers from watching the Consul servers for changes. This is useful for situations where Consul servers are behind a load balancer."
type = bool
default = false
}
Expand Down Expand Up @@ -131,7 +131,6 @@ variable "envoy_readiness_port" {
error_message = "The envoy_readiness_port must not conflict with the following ports that are reserved for Consul and Envoy: 8300, 8301, 8302, 8500, 8501, 8502, 8600, 10000, 19000."
condition = !contains([
8600, // consul dns
10000, // consul-ecs-control-plane health check port
19000, // envoy admin port
], var.envoy_readiness_port)
}
Expand Down
73 changes: 57 additions & 16 deletions modules/mesh-task/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ locals {
namespace_tag = var.consul_namespace != "" ? { "consul.hashicorp.com/namespace" = var.consul_namespace } : {}

// container_defs_with_depends_on is the app's container definitions with their dependsOn keys
// modified to add in dependencies on consul-ecs-control-plane and consul-dataplane.
// modified to add in dependencies on consul-ecs-mesh-init and consul-dataplane.
// We add these dependencies in so that the app containers don't start until the proxy
// is ready to serve traffic.
container_defs_with_depends_on = [for def in var.container_definitions :
Expand All @@ -46,10 +46,6 @@ locals {
concat(
lookup(def, "dependsOn", []),
[
{
containerName = "consul-ecs-control-plane"
condition = "HEALTHY"
},
{
containerName = "consul-dataplane"
condition = "HEALTHY"
Expand Down Expand Up @@ -175,11 +171,11 @@ resource "aws_ecs_task_definition" "this" {
local.container_defs_with_depends_on,
[
{
name = "consul-ecs-control-plane"
name = "consul-ecs-mesh-init"
image = var.consul_ecs_image
essential = false
logConfiguration = var.log_configuration
command = ["control-plane"]
command = ["mesh-init"]
mountPoints = [
local.consul_data_mount_read_write,
{
Expand All @@ -199,12 +195,6 @@ resource "aws_ecs_task_definition" "this" {
linuxParameters = {
initProcessEnabled = true
}
healthCheck = {
command = ["CMD-SHELL", "curl -f localhost:10000/consul-ecs/health"] # consul-ecs-control-plane exposes a listener on 10000 to indicate it's readiness
interval = 30
retries = 10
timeout = 5
}
secrets = flatten(
concat(
var.tls ? [
Expand Down Expand Up @@ -234,15 +224,15 @@ resource "aws_ecs_task_definition" "this" {
essential = false
logConfiguration = var.log_configuration
entryPoint = ["/consul/consul-ecs", "envoy-entrypoint"]
command = ["consul-dataplane", "-config-file", "/consul/consul-dataplane.json"] # consul-ecs-control-plane dumps the dataplane's config into consul-dataplane.json
command = ["consul-dataplane", "-config-file", "/consul/consul-dataplane.json"] # consul-ecs-mesh-init dumps the dataplane's config into consul-dataplane.json
portMappings = []
mountPoints = [
local.consul_data_mount
]
dependsOn = [
{
containerName = "consul-ecs-control-plane"
condition = "HEALTHY"
containerName = "consul-ecs-mesh-init"
condition = "SUCCESS"
},
]
healthCheck = {
Expand All @@ -263,6 +253,57 @@ resource "aws_ecs_task_definition" "this" {
hardLimit = 1048576
}]
},
{
name = "consul-ecs-health-sync"
image = var.consul_ecs_image
essential = false
logConfiguration = var.log_configuration
command = ["health-sync"]
user = "5996"
portMappings = []
mountPoints = [
local.consul_data_mount
]
dependsOn = [
{
containerName = "consul-ecs-mesh-init"
condition = "SUCCESS"
}
]
cpu = 0
volumesFrom = []
environment = [
{
name = "CONSUL_ECS_CONFIG_JSON",
value = local.encoded_config
}
]
linuxParameters = {
initProcessEnabled = true
}
secrets = flatten(
concat(
var.tls ? [
concat(
local.https_ca_cert_arn != "" ? [
{
name = "CONSUL_HTTPS_CACERT_PEM"
valueFrom = local.https_ca_cert_arn
},
] : [],
local.grpc_ca_cert_arn != "" ? [
{
name = "CONSUL_GRPC_CACERT_PEM"
valueFrom = local.grpc_ca_cert_arn
}
] : [],
[]
)
] : [],
[]
)
)
},
],
)
)
Expand Down
4 changes: 1 addition & 3 deletions modules/mesh-task/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,7 @@ variable "additional_execution_role_policies" {
}

variable "skip_server_watch" {
description = "Set this to true to prevent the consul-dataplane and consul-ecs-control-plane from watching the Consul servers for changes. This is useful for situations where Consul servers are behind a load balancer."
description = "Set this to true to prevent the consul-dataplane and consul-ecs-health-sync from watching the Consul servers for changes. This is useful for situations where Consul servers are behind a load balancer."
type = bool
default = false
}
Expand Down Expand Up @@ -168,7 +168,6 @@ variable "envoy_public_listener_port" {
error_message = "The envoy_public_listener_port must not conflict with the following ports that are reserved for Consul and Envoy: 8300, 8301, 8302, 8500, 8501, 8502, 8600, 10000, 19000."
condition = !contains([
8600, // consul dns
10000, // consul-ecs-control-plane health check port
19000, // envoy admin port
], var.envoy_public_listener_port)
}
Expand All @@ -188,7 +187,6 @@ variable "envoy_readiness_port" {
error_message = "The envoy_readiness_port must not conflict with the following ports that are reserved for Consul and Envoy: 8300, 8301, 8302, 8500, 8501, 8502, 8600, 10000, 19000."
condition = !contains([
8600, // consul dns
10000, // consul-ecs-control-plane health check port
19000, // envoy admin port
], var.envoy_readiness_port)
}
Expand Down

0 comments on commit 3a5c0ac

Please sign in to comment.