Skip to content

Commit

Permalink
[fix](cgroup memory) Correct cgroup mem info cache (apache#36966)
Browse files Browse the repository at this point in the history
## Proposed changes
After upgrading to Doris 2.1.3, we noticed that the "sys available
memory" in be.INFO continuously decreases until it falls below the
warning water mark, leading to persistent garbage collection (GC)
despite the actual memory usage being very low. And The cache in cgroup
mem info is always 0. Consequently, I identified an error in the
calculation of available memory in cgroup memory:

1. The memory information for cgroup memory is stored in the file
"memory.stat" rather than "memory.meminfo" (in fact, the
"memory.meminfo" file does not exist). You can see the files under the
cgroup path in the attached screenshot1.
2. The output content of "memory.stat" is shown in the screenshot1
below.

<img width="1720" alt="image"
src="https://github.com/apache/doris/assets/38196564/e654322e-9bf4-4f5e-951f-99e101ebbf47">
<img width="1364" alt="image"
src="https://github.com/apache/doris/assets/38196564/02cf8899-7618-4d5f-bf59-68fa0c90ebf2">


<!--Describe your changes.-->
My change is about two steps:
1. Modified the file name for mem info in cgroup.
2. Modified the process for extracting the cache from cgroup.

Co-authored-by: Xinyi Zou <[email protected]>
  • Loading branch information
INNOCENT-BOY and xinyiZzz committed Jul 8, 2024
1 parent 95dad14 commit e6e06ff
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 5 deletions.
2 changes: 1 addition & 1 deletion be/src/util/cgroup_util.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -184,7 +184,7 @@ Status CGroupUtil::find_cgroup_mem_info(std::string* file_path) {
}
string cgroup_path;
RETURN_IF_ERROR(find_abs_cgroup_path("memory", &cgroup_path));
*file_path = cgroup_path + "/memory.meminfo";
*file_path = cgroup_path + "/memory.stat";
return Status::OK();
}

Expand Down
8 changes: 4 additions & 4 deletions be/src/util/mem_info.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -423,7 +423,7 @@ void MemInfo::refresh_proc_meminfo() {
if (fields.size() < 2) {
continue;
}
std::string key = fields[0].substr(0, fields[0].size() - 1);
std::string key = fields[0].substr(0, fields[0].size());

StringParser::ParseResult result;
auto mem_value = StringParser::string_to_int<int64_t>(fields[1].data(),
Expand All @@ -449,19 +449,19 @@ void MemInfo::refresh_proc_meminfo() {
// https://serverfault.com/questions/902009/the-memory-usage-reported-in-cgroup-differs-from-the-free-command
// memory.usage_in_bytes ~= free.used + free.(buff/cache) - (buff)
// so, memory.usage_in_bytes - memory.meminfo["Cached"]
_s_cgroup_mem_usage = cgroup_mem_usage - _s_cgroup_mem_info_bytes["Cached"];
_s_cgroup_mem_usage = cgroup_mem_usage - _s_cgroup_mem_info_bytes["cache"];
// wait 10s, 100 * 100ms, avoid too frequently.
_s_cgroup_mem_refresh_wait_times = -100;
LOG(INFO) << "Refresh cgroup memory win, refresh again after 10s, cgroup mem limit: "
<< _s_cgroup_mem_limit << ", cgroup mem usage: " << _s_cgroup_mem_usage
<< ", cgroup mem info cached: " << _s_cgroup_mem_info_bytes["Cached"];
<< ", cgroup mem info cached: " << _s_cgroup_mem_info_bytes["cache"];
} else {
// find cgroup failed, wait 300s, 1000 * 100ms.
_s_cgroup_mem_refresh_wait_times = -3000;
LOG(INFO)
<< "Refresh cgroup memory failed, refresh again after 300s, cgroup mem limit: "
<< _s_cgroup_mem_limit << ", cgroup mem usage: " << _s_cgroup_mem_usage
<< ", cgroup mem info cached: " << _s_cgroup_mem_info_bytes["Cached"];
<< ", cgroup mem info cached: " << _s_cgroup_mem_info_bytes["cache"];
}
} else {
if (config::enable_use_cgroup_memory_info) {
Expand Down

0 comments on commit e6e06ff

Please sign in to comment.