Skip to content

Commit

Permalink
YARN-11681. Update the cgroup documentation with v2 support (#6834)
Browse files Browse the repository at this point in the history
Co-authored-by: Benjamin Teke <[email protected]>
Co-authored-by: K0K0V0K <[email protected]>
  • Loading branch information
3 people authored May 21, 2024
1 parent fb156e8 commit d876505
Show file tree
Hide file tree
Showing 5 changed files with 33 additions and 31 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -514,7 +514,7 @@ uid:gid pair will be used to launch the container's process.
As an example of what is meant by uid:gid pair, consider the following. By
default, in non-secure mode, YARN will launch processes as the user `nobody`
(see the table at the bottom of
[Using CGroups with YARN](./NodeManagerCgroups.html) for how the run as user is
[Using Cgroups with YARN](./NodeManagerCgroups.html) for how the run as user is
determined in non-secure mode). On CentOS based systems, the `nobody` user's uid
is `99` and the `nobody` group is `99`. As a result, YARN will call `docker run`
with `--user 99:99`. If the `nobody` user does not have the uid `99` in the
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,51 +12,53 @@
limitations under the License. See accompanying LICENSE file.
-->

Using CGroups with YARN
Using Cgroups with YARN
=======================

<!-- MACRO{toc|fromDepth=0|toDepth=3} -->

CGroups is a mechanism for aggregating/partitioning sets of tasks, and all their future children, into hierarchical groups with specialized behaviour. CGroups is a Linux kernel feature and was merged into kernel version 2.6.24. From a YARN perspective, this allows containers to be limited in their resource usage. A good example of this is CPU usage. Without CGroups, it becomes hard to limit container CPU usage.
Cgroups is a mechanism for aggregating/partitioning sets of tasks, and all their future children, into hierarchical groups with specialized behaviour. Cgroups (v1) is a Linux kernel feature and was merged into kernel version 2.6.24, while Control Group v2 is available since the kernel version 4.5. From a YARN perspective, this allows containers to be limited in their resource usage. A good example of this is CPU usage. Without cgroups, it becomes hard to limit container CPU usage.

CGroups Configuration
Cgroups Configuration
---------------------

This section describes the configuration variables for using CGroups.
This section describes the configuration variables for using cgroups.

The following settings are related to setting up CGroups. These need to be set in *yarn-site.xml*.
The following settings are related to setting up cgroups. These need to be set in *yarn-site.xml*.

|Configuration Name | Description |
|:---- |:---- |
| `yarn.nodemanager.container-executor.class` | This should be set to "org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor". CGroups is a Linux kernel feature and is exposed via the LinuxContainerExecutor. |
| `yarn.nodemanager.linux-container-executor.resources-handler.class` | This should be set to "org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler". Using the LinuxContainerExecutor doesn't force you to use CGroups. If you wish to use CGroups, the resource-handler-class must be set to CGroupsLCEResourceHandler. DefaultLCEResourcesHandler won't work. |
| `yarn.nodemanager.linux-container-executor.cgroups.hierarchy` | The cgroups hierarchy under which to place YARN proccesses(cannot contain commas). If yarn.nodemanager.linux-container-executor.cgroups.mount is false (that is, if cgroups have been pre-configured) and the YARN user has write access to the parent directory, then the directory will be created. If the directory already exists, the administrator has to give YARN write permissions to it recursively. |
| `yarn.nodemanager.linux-container-executor.cgroups.mount` | Whether the LCE should attempt to mount cgroups if not found - can be true or false. |
| `yarn.nodemanager.linux-container-executor.cgroups.mount-path` | Optional. Where CGroups are located. LCE will try to mount them here, if `yarn.nodemanager.linux-container-executor.cgroups.mount` is true. LCE will try to use CGroups from this location, if `yarn.nodemanager.linux-container-executor.cgroups.mount` is false. If specified, this path and its subdirectories (CGroup hierarchies) must exist and they should be readable and writable by YARN before the NodeManager is launched. See CGroups mount options below for details. |
| `yarn.nodemanager.linux-container-executor.group` | The Unix group of the NodeManager. It should match the setting in "container-executor.cfg". This configuration is required for validating the secure access of the container-executor binary. |
|Configuration Name | Description |
|:---- |:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `yarn.nodemanager.container-executor.class` | This should be set to "org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor". Cgroups is a Linux kernel feature and is exposed via the LinuxContainerExecutor. |
| `yarn.nodemanager.linux-container-executor.resources-handler.class` | This should be set to "org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler". Using the LinuxContainerExecutor doesn't force you to use cgroups. If you wish to use cgroups, the resource-handler-class must be set to CGroupsLCEResourceHandler. DefaultLCEResourcesHandler won't work. |
| `yarn.nodemanager.linux-container-executor.cgroups.v2.enabled` | A property to enable cgroup v2 support. Setting this to true YARN will try to use the cgroup v2 structure and controllers. If this setting is true, but no unified (v2) hierarchy is mounted it will automatically fall back to v1. Defaults to false. |
| `yarn.nodemanager.linux-container-executor.cgroups.hierarchy` | The cgroups hierarchy under which to place YARN proccesses (cannot contain commas). If `yarn.nodemanager.linux-container-executor.cgroups.mount` is false (that is, if cgroups have been pre-configured) and the YARN user has write access to the parent directory, then the directory will be created. If the directory already exists, the administrator has to give YARN write permissions to it recursively. |
| `yarn.nodemanager.linux-container-executor.cgroups.mount` | Whether the LCE should attempt to mount cgroups if not found - can be true or false. Mounting is not supported with cgroup v2. |
| `yarn.nodemanager.linux-container-executor.cgroups.mount-path` | Optional. Where cgroup is located. LCE will try to mount them here, if `yarn.nodemanager.linux-container-executor.cgroups.mount` is true (and cgroup v1 is used). LCE will try to use cgroups from this location, if `yarn.nodemanager.linux-container-executor.cgroups.mount` is false. If specified, this path and its subdirectories (cgroup hierarchies) must exist and they should be readable and writable by YARN before the NodeManager is launched. See Cgroups mount options below for details. |
| `yarn.nodemanager.linux-container-executor.cgroups.v2.mount-path` | Optional. Where cgroup v2 is located. This property needs to be specified only if both cgroup v1 and v2 is used. For example in mixed mode cgroup v1 controllers can be mounted under /sys/fs/cgroup/ (i.e. /sys/fs/cgroup/cpu,cpuacct), while v2 can be mounted in /sys/fs/cgroup/unified folder. If specified, this path (cgroup v2 hierarchy) must exist and it should be readable and writable by YARN before the NodeManager is launched. |
| `yarn.nodemanager.linux-container-executor.group` | The Unix group of the NodeManager. It should match the setting in "container-executor.cfg". This configuration is required for validating the secure access of the container-executor binary. |

Once CGroups enabled, the following settings related to limiting resource usage of YARN containers can works:
Once cgroup is enabled, the following settings related to limiting resource usage of YARN containers can works:

|Configuration Name | Description |
|:---- |:---- |
| `yarn.nodemanager.resource.percentage-physical-cpu-limit` | This setting lets you limit the cpu usage of all YARN containers. It sets a hard upper limit on the cumulative CPU usage of the containers. For example, if set to 60, the combined CPU usage of all YARN containers will not exceed 60%. |
| `yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage` | CGroups allows cpu usage limits to be hard or soft. When this setting is true, containers cannot use more CPU usage than allocated even if spare CPU is available. This ensures that containers can only use CPU that they were allocated. When set to false, containers can use spare CPU if available. It should be noted that irrespective of whether set to true or false, at no time can the combined CPU usage of all containers exceed the value specified in "yarn.nodemanager.resource.percentage-physical-cpu-limit". |
|Configuration Name | Description |
|:---- |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `yarn.nodemanager.resource.percentage-physical-cpu-limit` | This setting lets you limit the cpu usage of all YARN containers. It sets a hard upper limit on the cumulative CPU usage of the containers. For example, if set to 60, the combined CPU usage of all YARN containers will not exceed 60%. |
| `yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage` | Cgroups allows cpu usage limits to be hard or soft. When this setting is true, containers cannot use more CPU usage than allocated even if spare CPU is available. This ensures that containers can only use CPU that they were allocated. When set to false, containers can use spare CPU if available. It should be noted that irrespective of whether set to true or false, at no time can the combined CPU usage of all containers exceed the value specified in "yarn.nodemanager.resource.percentage-physical-cpu-limit". |

CGroups mount options
Cgroups mount options
---------------------

YARN uses CGroups through a directory structure mounted into the file system by the kernel. There are three options to attach to CGroups.
YARN uses cgroups through a directory structure mounted into the file system by the kernel. There are three options to attach to cgroups.

| Option | Description |
|:---- |:---- |
| Discover CGroups mounted already | This should be used on newer systems like RHEL7 or Ubuntu16 or if the administrator mounts CGroups before YARN starts. Set `yarn.nodemanager.linux-container-executor.cgroups.mount` to false and leave other settings set to their defaults. YARN will locate the mount points in `/proc/mounts`. Common locations include `/sys/fs/cgroup` and `/cgroup`. The default location can vary depending on the Linux distribution in use.|
| CGroups mounted by YARN | IMPORTANT: This option is deprecated due to security reasons with the `container-executor.cfg` option `feature.mount-cgroup.enabled=0` by default. Please mount cgroups before launching YARN.|
| CGroups mounted already or linked but not in `/proc/mounts` | If cgroups is accessible through lxcfs or simulated by another filesystem, then point `yarn.nodemanager.linux-container-executor.cgroups.mount-path` to your CGroups root directory. Set `yarn.nodemanager.linux-container-executor.cgroups.mount` to false. YARN tries to use this path first, before any CGroup mount point discovery. The path should have a subdirectory for each CGroup hierarchy named by the comma separated CGroup subsystems supported like `<path>/cpu,cpuacct`. Valid subsystem names are `cpu, cpuacct, cpuset, memory, net_cls, blkio, freezer, devices`.|
| Option | Description |
|:------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Discover cgroups mounted already | This should be used on newer systems like RHEL7 or Ubuntu16 or if the administrator mounts cgroups before YARN starts. Set `yarn.nodemanager.linux-container-executor.cgroups.mount` to false and leave other settings set to their defaults. YARN will locate the mount points in `/proc/mounts`. Common locations include `/sys/fs/cgroup` and `/cgroup`. The default location can vary depending on the Linux distribution in use. |
| Cgroups mounted by YARN | IMPORTANT: This option is deprecated due to security reasons with the `container-executor.cfg` option `feature.mount-cgroup.enabled=0` by default. Please mount cgroups before launching YARN. |
| Cgroups mounted already or linked but not in `/proc/mounts` | If cgroups is accessible through lxcfs or simulated by another filesystem, then point `yarn.nodemanager.linux-container-executor.cgroups.mount-path` to your cgroups root directory. Set `yarn.nodemanager.linux-container-executor.cgroups.mount` to false. YARN tries to use this path first, before any cgroup mount point discovery. The path should have a subdirectory in cgroup v1 for each cgroup hierarchy named by the comma separated cgroup subsystems supported like `<path>/cpu,cpuacct`. Valid subsystem names are `cpu, cpuacct, cpuset, memory, net_cls, blkio, freezer, devices`. |

CGroups and security
Cgroups and security
--------------------

CGroups itself has no requirements related to security. However, the LinuxContainerExecutor does have some requirements. If running in non-secure mode, by default, the LCE runs all jobs as user "nobody". This user can be changed by setting "yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user" to the desired user. However, it can also be configured to run jobs as the user submitting the job. In that case "yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users" should be set to false.
Cgroups itself has no requirements related to security. However, the LinuxContainerExecutor does have some requirements. If running in non-secure mode, by default, the LCE runs all jobs as user "nobody". This user can be changed by setting "yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user" to the desired user. However, it can also be configured to run jobs as the user submitting the job. In that case "yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users" should be set to false.

| yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user | yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users | User running jobs |
|:---- |:---- |:---- |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ containers run with both YARN cgroups and Nvidia Docker runtime v2.
1. The pluggable device framework depends on LinuxContainerExecutor to handle
resource isolation and Docker stuff. So LCE and Docker enabled on YARN is a
must.
See [Using CGroups with YARN](./NodeManagerCgroups.html) and [Docker on YARN](./DockerContainers.html)
See [Using Cgroups with YARN](./NodeManagerCgroups.html) and [Docker on YARN](./DockerContainers.html)

2. The sample plugin `NvidiaGPUPluginForRuntimeV2` requires Nvidia GPU drivers
and Nvidia Docker runtime v2 installed in the nodes. See Nvidia official
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -602,7 +602,7 @@ uid:gid pair will be used to launch the container's process.
As an example of what is meant by uid:gid pair, consider the following. By
default, in non-secure mode, YARN will launch processes as the user `nobody`
(see the table at the bottom of
[Using CGroups with YARN](./NodeManagerCgroups.html) for how the run as user is
[Using Cgroups with YARN](./NodeManagerCgroups.html) for how the run as user is
determined in non-secure mode). On CentOS based systems, the `nobody` user's uid
is `99` and the `nobody` group is `99`. As a result, YARN will invoke runC
with uid `99` and gid `99`. If the `nobody` user does not have the uid `99` in the
Expand Down
Loading

0 comments on commit d876505

Please sign in to comment.