-
Notifications
You must be signed in to change notification settings - Fork 132
Resource management support for csds
Cloudera Manager provides several different means of managing a cluster's resources. These methods are described in more detail in the resource management documentation.
CSDs can participate in static service pools. The rest of this page describes how to add support for static service pools to a CSD.
The foundation of static service pools is based on Linux control groups (cgroups). Cgroups make it easy to partition a system's resources amongst different processes, ensuring that each receives a certain guaranteed proportion. The CM static service pool configuration page exposes all of this information in one central place. Moreover, the page exposes a wizard that will automatically configure cgroups cluster-wide based on per-service resource percentages provided by the user. More details can be found here.
A sample descriptor is shown below:
{
"cgroup" : {
"cpu" : {
"autoConfigured" : true
},
"memory" : {
"autoConfigured" : true,
"autoConfiguredMin" : 1073741824
},
"blkio" : {
"autoConfigured" : true
}
}
}
Within the cgroup descriptor, each subdescriptor describes a different cgroup subsystem. If autoConfigured
is true, the wizard will automatically configure that cgroup's resource control based on the resource percentage corresponding to this role's service. Here are the supported resource controls:
- cpu:
cpu.shares
- memory:
memory.limit_in_bytes
- blkio:
blkio.weight
Additionally, the memory cgroup subsystem exposes autoConfiguredMin
which, if set, specifies the absolute minimum memory value that the wizard may use for this role.
Cgroup-based memory limiting can apply to any process whatsoever, but it has a major disadvantage: when the limit is reached, the Linux kernel reclaims memory by swapping or killing the process. In lieu of cgroup memory limits, a process can implement its own memory policing system, the details of which are particular to the process in question. For example, the JVM forces users to specify a maximum heap size. Provided it doesn't use JNI or direct memory allocation, the maximum heap size serves as a proxy for a true process memory limit. In this section, we will refer to such parameters as "cooperative" memory limits.
In CSDs, cooperative memory limits are representing by the 'memory' parameter type. To add a cooperative memory limit to a CSD, simply add a memory parameter to its role parameters section. For example, here's the Spark Executor cooperative memory limit:
{
"parameters" : [
{
"name" : "executor_total_max_heapsize",
"label" : "Total Java Heap Sizes of Worker's Executors in Bytes",
"description" : "Memory available to the Worker's Executors. This is the maximum sum total of all the Executors' Java heap sizes on this Worker node. Passed to Java -Xmx. Measured in bytes.",
"required" : "true",
"type" : "memory",
"unit" : "bytes",
"min" : 67108864,
"default" : 8589934592,
"scaleFactor" : 1.3,
"autoConfigShare" : 100
}
]
}
Memory parameters are like long parameters with several additional fields:
-
scaleFactor
: This field should be set when there's a non-trivial difference between the configured memory value and the actual memory consumed. For example, the JVM heap isn't fully representative of the amount of memory that a JVM will actually consume; the JVM includes some overhead of its own. We typically represent this with ascaleFactor
of 1.3, that is, the JVM itself adds 30% overhead. -
autoConfigShare
: If this field is set, the aforementioned wizard in the static service pool configuration page will automatically configure the cooperative memory limit. The field's value represents the share of the role's overall memory allotment that should be used for this particular limit. A role may expose multiple memory limits; the wizard will distribute the role's memory allotment amongst the various limits. For example, the Solr server role is a Java process that uses a standard JVM heap as well as direct memory allocation. As such, it exposes two memory limits: one for the heap (limited by -Xmx) and one for direct memory (limited by -XX:MaxDirectMemorySize). Each limit has anautoConfigShare
of 50. As such, the Solr server memory allotment is split in half, one half for each limit. If multiple limits are exposed, the sum of all theirautoConfigShare
values must be 100.
Various Cloudera Manager wizards include memory configuration workflows; that is, workflows where each host's RAM is distributed amongst per-role reservations. In these workflows, each role and cooperative memory limit pair is assigned a "minimum" and an "ideal" RAM value. On each host, all minimums and all ideals are summed, then compared to the actual host RAM. The ratio between what's available and what's requested is used to dictate where on the minimum<-->ideal spectrum to set the actual memory limit's value. For example, on a host where sum(ideals) does not execeed the host RAM, the configured memory limit values will simply be the ideal values. Conversely, if sum(minimums) exceeds host RAM, all values will be set to the minimums.
If a role is not directly involved in a workflow, its existing memory consumption will still be accounted for by summing up the values of all cooperative memory limits, factoring each against its scaleFactor
value.
To participate in any memory configuration workflow, the 'default' field in a cooperative memory limit must set. Additionally, for the memory workflow in the static service pool configuration wizard, the autoConfigShare
field must be set (as described earlier).
Outside of the static service pool wizard, the {role, limit} pair's "minimum" is the limit's min
field if set, or 0 otherwise. The "ideal" is the limit's default
field. In the static service pool wizard, the "minimum" is the limit's default
field. The ideal
is computed using the following formula:
min(limit.softMax (if exists) or limit.max (if exists) or Long.MAX_VALUE,
host_ram * 0.8 / limit.scaleFactor * service_percentage * limit.autoConfigShare)
host_ram is equal to the host's RAM, while service_percentage is equal to the percentage (up to 100) assigned to the overall service by the user at the start of the wizard.