Skip to content

Commit

Permalink
Merge pull request #130 from ExpediaGroup/fix/tcp_keep_alive
Browse files Browse the repository at this point in the history
Override keepAlive time to be lower then NLB idle time (350s)
  • Loading branch information
Patrick Duin authored Nov 30, 2023
2 parents 74a9459 + 8216dac commit da1353f
Show file tree
Hide file tree
Showing 5 changed files with 43 additions and 1 deletion.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@ All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).

## [4.1.5] - 2023-11-30
### Fixed
- Issue where requests can hit 10min connection timeout, TCP keepalive prevents NLB closing idle connections. Similar to the issue explained here: https://paramount.tech/blog/2021/07/26/mitigation-of-connection-reset-in-aws.html

## [4.1.4] - 2023-11-08
### Fixed
- Added tags to ECS service and tasks.
Expand Down
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,9 @@ For more information please refer to the main [Apiary](https://github.com/Expedi
| root_vol_type | Waggle Dance EC2 root volume type. | string | `gp2` | no |
| root_vol_size | Waggle Dance EC2 root volume size. | string | `10` | no |
|enable_query_functions_across_all_metastores | This controls the thrift call for `get_all_functions`. It is generally used to initialize a client and get built-in functions and registered UDF's from a metastore. Setting this to `false` is more performant as WD then only gets the functions from the `primary` metastore. However, setting this to `true` will collate results by calling `get_all_functions` from all configured metastores. This could be potentially slow if some of the metastores are slow to respond. If all the metastores configured are of the same version and no additional UDF's are installed, then WD gets the same functions back so it's not very useful to call this across metastores. For backwards compatibility, this property can be set to `true`. Further read: https://github.com/ExpediaGroup/waggle-dance#server | bool | false | no |
| tcp_keepalive_time | Sets net.ipv4.tcp_keepalive_time (seconds), currently only supported in ECS. | number | `200` | no |
| tcp_keepalive_intvl | Sets net.ipv4.tcp_keepalive_intvl (seconds), currently only supported in ECS. | number | `30` | no |
| tcp_keepalive_probes | Sets net.ipv4.tcp_keepalive_probes (seconds), currently only supported in ECS. | number | `2` | no |

## Usage

Expand Down
3 changes: 3 additions & 0 deletions templates.tf
Original file line number Diff line number Diff line change
Expand Up @@ -183,5 +183,8 @@ data "template_file" "waggledance" {
hive_site_xml = var.alluxio_endpoints == [] ? "" : base64encode(data.template_file.hive_site_xml.rendered)
bastion_ssh_key_arn = var.bastion_ssh_key_secret_name == "" ? "" : join("", data.aws_secretsmanager_secret.bastion_ssh_key.*.arn)
docker_auth = var.docker_registry_auth_secret_name == "" ? "" : format("\"repositoryCredentials\" :{\n \"credentialsParameter\":\"%s\"\n},", join("\",\"", concat(data.aws_secretsmanager_secret.docker_registry.*.arn)))
tcp_keepalive_time = var.tcp_keepalive_time
tcp_keepalive_intvl = var.tcp_keepalive_intvl
tcp_keepalive_probes = var.tcp_keepalive_probes
}
}
16 changes: 15 additions & 1 deletion templates/waggledance.json
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,20 @@
"softLimit": 65536,
"hardLimit": 65536
}
]
],
"systemControls": [
{
"namespace": "net.ipv4.tcp_keepalive_time",
"value": "${tcp_keepalive_time}"
},
{
"namespace": "net.ipv4.tcp_keepalive_intvl",
"value": "${tcp_keepalive_intvl}"
},
{
"namespace": "net.ipv4.tcp_keepalive_probes",
"value": "${tcp_keepalive_probes}"
}
]
}
]
18 changes: 18 additions & 0 deletions variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -359,3 +359,21 @@ variable "datadog_metrics_enabled" {
type = bool
default = false
}

variable "tcp_keepalive_time" {
description = "Sets net.ipv4.tcp_keepalive_time (seconds), currently only supported in ECS."
type = number
default = 200
}

variable "tcp_keepalive_intvl" {
description = "Sets net.ipv4.tcp_keepalive_intvl (seconds), currently only supported in ECS."
type = number
default = 30
}

variable "tcp_keepalive_probes" {
description = "Sets net.ipv4.tcp_keepalive_probes (number), currently only supported in ECS."
type = number
default = 2
}

0 comments on commit da1353f

Please sign in to comment.