Module - Multi runner

This module replaces the top-level module to make it easy to create with one deployment multiple type of runners.

This module creates many runners with a single GitHub app. The module utilizes the internal modules and deploys parts of the stack for each runner defined.

The module takes a configuration as input containing a matcher for the labels. The webhook lambda is using the configuration to delegate events based on the labels in the workflow job and sent them to a dedicated queue based on the configuration. Events on each queue are processed by a dedicated lambda per configuration to scale runners.

For each configuration:

When enabled, the distribution syncer is deployed for each unique combination of OS and architecture.
For each configuration a queue is created and runner module is deployed

Matching

Matching of the configuration is done based on the labels specified in labelMatchers configuration. The webhook is processing the workflow_job event and match the labels against the labels specified in labelMatchers configuration in the order of configuration with exact-match true first, followed by all exact matches false.

The catch

Controlling which event is taken up by which runner is not to this module. It is completely done by GitHub. This means when potentially different runners can run the same job there is nothing that can be done to guarantee a certain runner will take up the job.

An example, given you have two runners one with the labels. self-hosted, linux, x64, large and one with the labels self-hosted, linux, x64, small. Once you define a subset of the labels in the workflow, for example self-hosted, linux, x64. Both runners can take the job potentially. You can define to scale one of the runners for the event, but still there is no guarantee that the scaled runner takes the job. The workflow with subset of labels (self-hosted, linux, x64) can take up runner with specific labels (self-hosted, linux, x64, large) and leave the workflow with labels (self-hosted, linux, x64, large) be without the runner. The only mitigation that is available right now is to use a small pool of runners. Pool instances can also exist for a short amount of time and only created once in x time based on a cron expression.

Usages

A complete example is available in the examples, see the multi-runner example for actual implementation.

module "multi-runner" {
  prefix = "multi-runner"

  github_app = {
    # app details
  }

  multi_runner_config = {
    "linux-arm" = {
      matcherConfig : {
        labelMatchers = [["self-hosted", "linux", "arm64", "arm"]]
        exactMatch    = true
      }
      runner_config = {
        runner_os                      = "linux"
        runner_architecture            = "arm64"
        runner_extra_labels            = "arm"
        enable_ssm_on_runners          = true
        instance_types                 = ["t4g.large", "c6g.large"]
        ...
      }
      ...
    },
    "linux-x64" = {
      matcherConfig : {
        labelMatchers = [["self-hosted", "linux", "x64"]]
        exactMatch    = false
      }
      runner_config = {
        runner_os                       = "linux"
        runner_architecture             = "x64"
        instance_types                  = ["m5ad.large", "m5a.large"]
        enable_ephemeral_runners        = true
        ...
      }
      delay_webhook_event = 0
      ...
    }
  }

}

Requirements

Name	Version
terraform	>= 1.3
aws	~> 4.0
random	~> 3.0

Providers

Name	Version
aws	~> 4.0
random	~> 3.0

Modules

Name	Source	Version
runner_binaries	../runner-binaries-syncer	n/a
runners	../runners	n/a
ssm	../ssm	n/a
webhook	../webhook	n/a

Resources

Name	Type
aws_sqs_queue.queued_builds	resource
aws_sqs_queue.queued_builds_dlq	resource
aws_sqs_queue.webhook_events_workflow_job_queue	resource
aws_sqs_queue_policy.build_queue_dlq_policy	resource
aws_sqs_queue_policy.build_queue_policy	resource
random_string.random	resource
aws_iam_policy_document.deny_unsecure_transport	data source

Inputs

Name	Description	Type	Default	Required
aws_partition	(optiona) partition in the arn namespace to use if not 'aws'	`string`	`"aws"`	no
aws_region	AWS region.	`string`	n/a	yes
cloudwatch_config	(optional) Replaces the module default cloudwatch log config. See https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-Configuration-File-Details.html for details.	`string`	`null`	no
enable_managed_runner_security_group	Enabling the default managed security group creation. Unmanaged security groups can be specified via `runner_additional_security_group_ids`.	`bool`	`true`	no
enable_workflow_job_events_queue	Enabling this experimental feature will create a secondory sqs queue to wich a copy of the workflow_job event will be delivered.	`bool`	`false`	no
ghes_ssl_verify	GitHub Enterprise SSL verification. Set to 'false' when custom certificate (chains) is used for GitHub Enterprise Server (insecure).	`bool`	`true`	no
ghes_url	GitHub Enterprise Server URL. Example: https://github.internal.co - DO NOT SET IF USING PUBLIC GITHUB	`string`	`null`	no
github_app	GitHub app parameters, see your github app. Ensure the key is the base64-encoded `.pem` file (the output of `base64 app.private-key.pem`, not the content of `private-key.pem`).	object({ key_base64 = string id = string webhook_secret = string })	n/a	yes
instance_profile_path	The path that will be added to the instance_profile, if not set the environment name will be used.	`string`	`null`	no
key_name	Key pair name	`string`	`null`	no
kms_key_arn	Optional CMK Key ARN to be used for Parameter Store.	`string`	`null`	no
lambda_architecture	AWS Lambda architecture. Lambda functions using Graviton processors ('arm64') tend to have better price/performance than 'x86_64' functions.	`string`	`"arm64"`	no
lambda_principals	(Optional) add extra principals to the role created for execution of the lambda, e.g. for local testing.	list(object({ type = string identifiers = list(string) }))	`[]`	no
lambda_runtime	AWS Lambda runtime.	`string`	`"nodejs18.x"`	no
lambda_s3_bucket	S3 bucket from which to specify lambda functions. This is an alternative to providing local files directly.	`string`	`null`	no
lambda_security_group_ids	List of security group IDs associated with the Lambda function.	`list(string)`	`[]`	no
lambda_subnet_ids	List of subnets in which the action runners will be launched, the subnets needs to be subnets in the `vpc_id`.	`list(string)`	`[]`	no
log_level	Logging level for lambda logging. Valid values are 'silly', 'trace', 'debug', 'info', 'warn', 'error', 'fatal'.	`string`	`"info"`	no
log_type	Logging format for lambda logging. Valid values are 'json', 'pretty', 'hidden'.	`string`	`"pretty"`	no
logging_kms_key_id	Specifies the kms key id to encrypt the logs with	`string`	`null`	no
logging_retention_in_days	Specifies the number of days you want to retain log events for the lambda log group. Possible values are: 0, 1, 3, 5, 7, 14, 30, 60, 90, 120, 150, 180, 365, 400, 545, 731, 1827, and 3653.	`number`	`7`	no
multi_runner_config	multi_runner_config = { runner_config: { runner_os: "The EC2 Operating System type to use for action runner instances (linux,windows)." runner_architecture: "The platform architecture of the runner instance_type." runner_metadata_options: "(Optional) Metadata options for the ec2 runner instances." ami_filter: "(Optional) List of maps used to create the AMI filter for the action runner AMI. By default amazon linux 2 is used." ami_owners: "(Optional) The list of owners used to select the AMI of action runner instances." create_service_linked_role_spot: (Optional) create the serviced linked role for spot instances that is required by the scale-up lambda. delay_webhook_event: "The number of seconds the event accepted by the webhook is invisible on the queue before the scale up lambda will receive the event." disable_runner_autoupdate: "Disable the auto update of the github runner agent. Be-aware there is a grace period of 30 days, see also the GitHub article" enable_ephemeral_runners: "Enable ephemeral runners, runners will only be used once." enable_job_queued_check: "Only scale if the job event received by the scale up lambda is is in the state queued. By default enabled for non ephemeral runners and disabled for ephemeral. Set this variable to overwrite the default behavior." = optional(bool, null) enable_organization_runners: "Register runners to organization, instead of repo level" enable_runner_binaries_syncer: "Option to disable the lambda to sync GitHub runner distribution, useful when using a pre-build AMI." enable_ssm_on_runners: "Enable to allow access the runner instances for debugging purposes via SSM. Note that this adds additional permissions to the runner instances." enable_userdata: "Should the userdata script be enabled for the runner. Set this to false if you are using your own prebuilt AMI." instance_allocation_strategy: "The allocation strategy for spot instances. AWS recommends to use `capacity-optimized` however the AWS default is `lowest-price`." instance_max_spot_price: "Max price price for spot intances per hour. This variable will be passed to the create fleet as max spot price for the fleet." instance_target_capacity_type: "Default lifecycle used for runner instances, can be either `spot` or `on-demand`." instance_types: "List of instance types for the action runner. Defaults are based on runner_os (amzn2 for linux and Windows Server Core for win)." job_queue_retention_in_seconds: "The number of seconds the job is held in the queue before it is purged" minimum_running_time_in_minutes: "The time an ec2 action runner should be running at minimum before terminated if not busy." pool_runner_owner: "The pool will deploy runners to the GitHub org ID, set this value to the org to which you want the runners deployed. Repo level is not supported." runner_as_root: "Run the action runner under the root user. Variable `runner_run_as` will be ignored." runner_boot_time_in_minutes: "The minimum time for an EC2 runner to boot and register as a runner." runner_extra_labels: "Extra (custom) labels for the runners (GitHub). Separate each label by a comma. Labels checks on the webhook can be enforced by setting `enable_workflow_job_labels_check`. GitHub read-only labels should not be provided." runner_group_name: "Name of the runner group." runner_run_as: "Run the GitHub actions agent as user." runners_maximum_count: "The maximum number of runners that will be created." scale_down_schedule_expression: "Scheduler expression to check every x for scale down." scale_up_reserved_concurrent_executions: "Amount of reserved concurrent executions for the scale-up lambda function. A value of 0 disables lambda from being triggered and -1 removes any concurrency limitations." userdata_template: "Alternative user-data template, replacing the default template. By providing your own user_data you have to take care of installing all required software, including the action runner. Variables userdata_pre/post_install are ignored." enable_runner_detailed_monitoring: "Should detailed monitoring be enabled for the runner. Set this to true if you want to use detailed monitoring. See https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-cloudwatch-new.html for details." enable_cloudwatch_agent: "Enabling the cloudwatch agent on the ec2 runner instances, the runner contains default config. Configuration can be overridden via `cloudwatch_config`." userdata_pre_install: "Script to be ran before the GitHub Actions runner is installed on the EC2 instances" userdata_post_install: "Script to be ran after the GitHub Actions runner is installed on the EC2 instances" runner_ec2_tags: "Map of tags that will be added to the launch template instance tag specifications." runner_iam_role_managed_policy_arns: "Attach AWS or customer-managed IAM policies (by ARN) to the runner IAM role" idle_config: "List of time period that can be defined as cron expression to keep a minimum amount of runners active instead of scaling down to 0. By defining this list you can ensure that in time periods that match the cron expression within 5 seconds a runner is kept idle." runner_log_files: "(optional) Replaces the module default cloudwatch log config. See https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-Configuration-File-Details.html for details." block_device_mappings: "The EC2 instance block device configuration. Takes the following keys: `device_name`, `delete_on_termination`, `volume_type`, `volume_size`, `encrypted`, `iops`, `throughput`, `kms_key_id`, `snapshot_id`." pool_config: "The configuration for updating the pool. The `pool_size` to adjust to by the events triggered by the `schedule_expression`. For example you can configure a cron expression for week days to adjust the pool to 10 and another expression for the weekend to adjust the pool to 1." } matcherConfig: { labelMatchers: "The list of list of labels supported by the runner configuration. `[[self-hosted, linux, x64, example]]`" exactMatch: "If set to true all labels in the workflow job must match the GitHub labels (os, architecture and `self-hosted`). When false if any workflow label matches it will trigger the webhook." } fifo: "Enable a FIFO queue to remain the order of events received by the webhook. Suggest to set to true for repo level runners." redrive_build_queue: "Set options to attach (optional) a dead letter queue to the build queue, the queue between the webhook and the scale up lambda. You have the following options. 1. Disable by setting `enabled` to false. 2. Enable by setting `enabled` to `true`, `maxReceiveCount` to a number of max retries." }	map(object({ runner_config = object({ runner_os = string runner_architecture = string runner_metadata_options = optional(map(any), { instance_metadata_tags = "enabled" http_endpoint = "enabled" http_tokens = "optional" http_put_response_hop_limit = 1 }) ami_filter = optional(map(list(string)), null) ami_owners = optional(list(string), ["amazon"]) create_service_linked_role_spot = optional(bool, false) delay_webhook_event = optional(number, 30) disable_runner_autoupdate = optional(bool, false) enable_ephemeral_runners = optional(bool, false) enable_job_queued_check = optional(bool, null) enable_organization_runners = optional(bool, false) enable_runner_binaries_syncer = optional(bool, true) enable_ssm_on_runners = optional(bool, false) enable_userdata = optional(bool, true) instance_allocation_strategy = optional(string, "lowest-price") instance_max_spot_price = optional(string, null) instance_target_capacity_type = optional(string, "spot") instance_types = list(string) job_queue_retention_in_seconds = optional(number, 86400) minimum_running_time_in_minutes = optional(number, null) pool_runner_owner = optional(string, null) runner_as_root = optional(bool, false) runner_boot_time_in_minutes = optional(number, 5) runner_extra_labels = string runner_group_name = optional(string, "Default") runner_run_as = optional(string, "ec2-user") runners_maximum_count = number scale_down_schedule_expression = optional(string, "cron(/5 * * ? *)") scale_up_reserved_concurrent_executions = optional(number, 1) userdata_template = optional(string, null) enable_runner_detailed_monitoring = optional(bool, false) enable_cloudwatch_agent = optional(bool, true) userdata_pre_install = optional(string, "") userdata_post_install = optional(string, "") runner_ec2_tags = optional(map(string), {}) runner_iam_role_managed_policy_arns = optional(list(string), []) idle_config = optional(list(object({ cron = string timeZone = string idleCount = number })), []) runner_log_files = optional(list(object({ log_group_name = string prefix_log_group = bool file_path = string log_stream_name = string })), null) block_device_mappings = optional(list(object({ delete_on_termination = bool device_name = string encrypted = bool iops = number kms_key_id = string snapshot_id = string throughput = number volume_size = number volume_type = string })), [{ delete_on_termination = true device_name = "/dev/xvda" encrypted = true iops = null kms_key_id = null snapshot_id = null throughput = null volume_size = 30 volume_type = "gp3" }]) pool_config = optional(list(object({ schedule_expression = string size = number })), []) }) matcherConfig = object({ labelMatchers = list(list(string)) exactMatch = optional(bool, false) }) fifo = optional(bool, false) redrive_build_queue = optional(object({ enabled = bool maxReceiveCount = number }), { enabled = false maxReceiveCount = null }) }))	n/a	yes
pool_lambda_reserved_concurrent_executions	Amount of reserved concurrent executions for the scale-up lambda function. A value of 0 disables lambda from being triggered and -1 removes any concurrency limitations.	`number`	`1`	no
pool_lambda_timeout	Time out for the pool lambda in seconds.	`number`	`60`	no
prefix	The prefix used for naming resources	`string`	`"github-actions"`	no
queue_encryption	Configure how data on queues managed by the modules in ecrypted at REST. Options are encryped via SSE, non encrypted and via KMSS. By default encryptes via SSE is enabled. See for more details the Terraform `aws_sqs_queue` resource https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/sqs_queue.	object({ kms_data_key_reuse_period_seconds = number kms_master_key_id = string sqs_managed_sse_enabled = bool })	{ "kms_data_key_reuse_period_seconds": null, "kms_master_key_id": null, "sqs_managed_sse_enabled": true }	no
repository_white_list	List of repositories allowed to use the github app	`list(string)`	`[]`	no
role_path	The path that will be added to the role; if not set, the environment name will be used.	`string`	`null`	no
role_permissions_boundary	Permissions boundary that will be added to the created role for the lambda.	`string`	`null`	no
runner_additional_security_group_ids	(optional) List of additional security groups IDs to apply to the runner	`list(string)`	`[]`	no
runner_binaries_s3_sse_configuration	Map containing server-side encryption configuration for runner-binaries S3 bucket.	`any`	`{}`	no
runner_binaries_syncer_lambda_timeout	Time out of the binaries sync lambda in seconds.	`number`	`300`	no
runner_binaries_syncer_lambda_zip	File location of the binaries sync lambda zip file.	`string`	`null`	no
runner_egress_rules	List of egress rules for the GitHub runner instances.	list(object({ cidr_blocks = list(string) ipv6_cidr_blocks = list(string) prefix_list_ids = list(string) from_port = number protocol = string security_groups = list(string) self = bool to_port = number description = string }))	[ { "cidr_blocks": [ "0.0.0.0/0" ], "description": null, "from_port": 0, "ipv6_cidr_blocks": [ "::/0" ], "prefix_list_ids": null, "protocol": "-1", "security_groups": null, "self": null, "to_port": 0 } ]	no
runners_lambda_s3_key	S3 key for runners lambda function. Required if using S3 bucket to specify lambdas.	`string`	`null`	no
runners_lambda_s3_object_version	S3 object version for runners lambda function. Useful if S3 versioning is enabled on source bucket.	`string`	`null`	no
runners_lambda_zip	File location of the lambda zip file for scaling runners.	`string`	`null`	no
runners_scale_down_lambda_timeout	Time out for the scale down lambda in seconds.	`number`	`60`	no
runners_scale_up_lambda_timeout	Time out for the scale up lambda in seconds.	`number`	`30`	no
ssm_paths	The root path used in SSM to store configuration and secreets.	object({ root = optional(string, "github-action-runners") app = optional(string, "app") runners = optional(string, "runners") })	`{}`	no
subnet_ids	List of subnets in which the action runners will be launched, the subnets needs to be subnets in the `vpc_id`.	`list(string)`	n/a	yes
syncer_lambda_s3_key	S3 key for syncer lambda function. Required if using S3 bucket to specify lambdas.	`string`	`null`	no
syncer_lambda_s3_object_version	S3 object version for syncer lambda function. Useful if S3 versioning is enabled on source bucket.	`string`	`null`	no
tags	Map of tags that will be added to created resources. By default resources will be tagged with name and environment.	`map(string)`	`{}`	no
vpc_id	The VPC for security groups of the action runners.	`string`	n/a	yes
webhook_lambda_apigateway_access_log_settings	Access log settings for webhook API gateway.	object({ destination_arn = string format = string })	`null`	no
webhook_lambda_s3_key	S3 key for webhook lambda function. Required if using S3 bucket to specify lambdas.	`string`	`null`	no
webhook_lambda_s3_object_version	S3 object version for webhook lambda function. Useful if S3 versioning is enabled on source bucket.	`string`	`null`	no
webhook_lambda_timeout	Time out of the lambda in seconds.	`number`	`10`	no
webhook_lambda_zip	File location of the webhook lambda zip file.	`string`	`null`	no
workflow_job_queue_configuration	Configuration options for workflow job queue which is only applicable if the flag enable_workflow_job_events_queue is set to true.	object({ delay_seconds = number visibility_timeout_seconds = number message_retention_seconds = number })	{ "delay_seconds": null, "message_retention_seconds": null, "visibility_timeout_seconds": null }	no

Outputs

Name	Description
binaries_syncer	n/a
runners	n/a
ssm_parameters	n/a
webhook	n/a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Module - Multi runner

Matching

The catch

Usages

Requirements

Providers

Modules

Resources

Inputs

Outputs

Files

README.md

Latest commit

History

README.md

File metadata and controls

Module - Multi runner

Matching

The catch

Usages

Requirements

Providers

Modules

Resources

Inputs

Outputs