Skip to content

Commit

Permalink
Incremental GC Tuning (#4300)
Browse files Browse the repository at this point in the history
# Incremental GC Tuning

Tuning the incremental GC for more conservative collection of larger heaps:

1. GC memory reserve: Keeping a reserve of 640 MB for the incremental GC itself, to be always able to allocate the mark bitmap (128 MB) and have some free space (512 MB) to perform evacuations. This reserve is on top of the 256 MB general reserve for upgrades and query calls.

2. Scheduling the GC more frequently for a medium-sized heap. Until now, there were only two thresholds (critical and non-critical).
    
    Threshold | Heap Size        | GC-Triggering Growth
    ----------|------------------|------------------
    Low       | 0 .. 1 GB        | 65% 
    Medium    | 1 .. 2.25 GB     | 35%
    Critical  | 2.25 .. 3.125 GB | 1%

3. Increasing the GC increment limit, in particular the allocation increment, considering that DTS (deterministic time-slicing) limits have been increased by factor of 10 since the GC design. Increments now have a limit of about 900 million instructions (while the IC message instruction limit is at 20 billion instructions). The allocation increment is increased by a factor of 5 to reduce GC reclamation latency. Focusing on the allocation increment also shows a better performance in the GC benchmark, compared to increasing the base GC increment limit.

4. Scheduling GC also on heartbeats and timers, since DTS is also supported for those.

5. Introducing a system-level GC trigger function that can be called by the canister controller or owner to explicitly run a GC increment. This can be helpful if the canister memory is full and the GC received too little time to complete. This can happen e.g. when a large amount of memory is allocated in a single message or when objects are made unreachable shortly before the memory is exhausted.

    ```
    dfx canister call CANISTER_ID __motoko_gc_trigger "()" 
    ```

    This can be called multiple times in a series to complete a GC run.

The points 4. and 5. are also applicable to the other GCs.

## GC Measurements

According to measurements with the GC benchmark, using `dfx 0.15.1` with the new metering, the changes only have a small effect on the average performance:


Metric                | Difference
----------------------|-----------
Total instructions    | +1.8%
Allocated memory size | -4.5%
  • Loading branch information
luc-blaeser authored Dec 7, 2023
1 parent 0e72f43 commit 5a8cd29
Show file tree
Hide file tree
Showing 20 changed files with 287 additions and 56 deletions.
5 changes: 5 additions & 0 deletions rts/motoko-rts/src/constants.rs
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,8 @@ pub const WASM_HEAP_SIZE: Words<u32> = Words(1024 * 1024 * 1024);

/// Wasm memory size (4 GiB) in bytes. Note: Represented as `u64` in order not to overflow.
pub const WASM_MEMORY_BYTE_SIZE: Bytes<u64> = Bytes(4 * 1024 * 1024 * 1024);

/// Byte constants
pub const KB: usize = 1024;
pub const MB: usize = 1024 * KB;
pub const GB: usize = 1024 * MB;
3 changes: 2 additions & 1 deletion rts/motoko-rts/src/gc/generational.rs
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,8 @@ static mut OLD_GENERATION_THRESHOLD: usize = 32 * 1024 * 1024;
static mut PASSED_CRITICAL_LIMIT: bool = false;

#[cfg(feature = "ic")]
const CRITICAL_MEMORY_LIMIT: usize = (4096 - 512) * 1024 * 1024 - crate::memory::MEMORY_RESERVE;
const CRITICAL_MEMORY_LIMIT: usize =
(4096 - 512) * 1024 * 1024 - crate::memory::GENERAL_MEMORY_RESERVE;

#[cfg(feature = "ic")]
unsafe fn decide_strategy(limits: &Limits) -> Option<Strategy> {
Expand Down
47 changes: 39 additions & 8 deletions rts/motoko-rts/src/gc/incremental.rs
Original file line number Diff line number Diff line change
Expand Up @@ -70,18 +70,22 @@ static mut LAST_ALLOCATIONS: Bytes<u64> = Bytes(0u64);
#[cfg(feature = "ic")]
unsafe fn should_start() -> bool {
use self::partitioned_heap::PARTITION_SIZE;
use crate::memory::{ic::partitioned_memory, MEMORY_RESERVE};
use crate::constants::{GB, MB};
use crate::memory::ic::partitioned_memory;

const CRITICAL_HEAP_LIMIT: Bytes<u32> =
Bytes(u32::MAX - 768 * 1024 * 1024 - MEMORY_RESERVE as u32);
const CRITICAL_HEAP_LIMIT: Bytes<u32> = Bytes((2 * GB + 256 * MB) as u32);
const CRITICAL_GROWTH_THRESHOLD: f64 = 0.01;
const NORMAL_GROWTH_THRESHOLD: f64 = 0.65;
const MEDIUM_HEAP_LIMIT: Bytes<u32> = Bytes(1 * GB as u32);
const MEDIUM_GROWTH_THRESHOLD: f64 = 0.35;
const LOW_GROWTH_THRESHOLD: f64 = 0.65;

let heap_size = partitioned_memory::get_heap_size();
let growth_threshold = if heap_size > CRITICAL_HEAP_LIMIT {
CRITICAL_GROWTH_THRESHOLD
} else if heap_size > MEDIUM_HEAP_LIMIT {
MEDIUM_GROWTH_THRESHOLD
} else {
NORMAL_GROWTH_THRESHOLD
LOW_GROWTH_THRESHOLD
};

let current_allocations = partitioned_memory::get_total_allocations();
Expand Down Expand Up @@ -122,11 +126,11 @@ unsafe fn record_gc_stop<M: Memory>() {
/// Finally, all the evacuated and temporary partitions are freed.
/// The temporary partitions store mark bitmaps.
/// The limit on the GC increment has a fix base with a linear increase depending on the number of
/// The limit on the GC increment has a fixed base with a linear increase depending on the number of
/// allocations that were performed during a running GC. The allocation-proportional term adapts
/// to the allocation rate and helps the GC to reduce reclamation latency.
const INCREMENT_BASE_LIMIT: usize = 3_500_000; // Increment limit without concurrent allocations.
const INCREMENT_ALLOCATION_FACTOR: usize = 10; // Additional time factor per concurrent allocation.
const INCREMENT_BASE_LIMIT: usize = 5_000_000; // Increment limit without concurrent allocations.
const INCREMENT_ALLOCATION_FACTOR: usize = 50; // Additional time factor per concurrent allocation.

// Performance note: Storing the phase-specific state in the enum would be nicer but it is much slower.
#[derive(PartialEq)]
Expand All @@ -144,6 +148,7 @@ pub struct State {
allocation_count: usize, // Number of allocations during an active GC run.
mark_state: Option<MarkState>,
iterator_state: Option<PartitionedHeapIterator>,
running_increment: bool, // GC increment is active.
}

/// GC state retained over multiple GC increments.
Expand All @@ -153,6 +158,7 @@ static mut STATE: RefCell<State> = RefCell::new(State {
allocation_count: 0,
mark_state: None,
iterator_state: None,
running_increment: false,
});

/// Incremental GC.
Expand All @@ -173,6 +179,7 @@ impl<'a, M: Memory + 'a> IncrementalGC<'a, M> {
state.allocation_count = 0;
state.mark_state = None;
state.iterator_state = None;
state.running_increment = false;
}

/// Each GC schedule point can get a new GC instance that shares the common GC state.
Expand All @@ -193,6 +200,8 @@ impl<'a, M: Memory + 'a> IncrementalGC<'a, M> {
/// * The mark phase can only be started on an empty call stack.
/// * The update phase can only be completed on an empty call stack.
pub unsafe fn empty_call_stack_increment(&mut self, roots: Roots) {
debug_assert!(!self.state.running_increment);
self.state.running_increment = true;
assert!(self.state.phase != Phase::Stop);
if self.pausing() {
self.start_marking(roots);
Expand All @@ -215,6 +224,7 @@ impl<'a, M: Memory + 'a> IncrementalGC<'a, M> {
if self.updating_completed() {
self.complete_run(roots);
}
self.state.running_increment = false;
}

unsafe fn pausing(&mut self) -> bool {
Expand Down Expand Up @@ -407,3 +417,24 @@ pub unsafe fn get_partitioned_heap() -> &'static mut PartitionedHeap {
debug_assert!(STATE.get_mut().partitioned_heap.is_initialized());
&mut STATE.get_mut().partitioned_heap
}

#[cfg(feature = "ic")]
use crate::constants::MB;

/// Additional memory reserve in bytes for the GC.
/// * To allow mark bitmap allocation, i.e. max. 128 MB in 4 GB address space.
/// * 512 MB of free space for evacuations/compactions.
#[cfg(feature = "ic")]
const GC_MEMORY_RESERVE: usize = (128 + 512) * MB;

#[cfg(feature = "ic")]
pub unsafe fn memory_reserve() -> usize {
use crate::memory::GENERAL_MEMORY_RESERVE;

let additional_reserve = if STATE.borrow().running_increment {
0
} else {
GC_MEMORY_RESERVE
};
GENERAL_MEMORY_RESERVE + additional_reserve
}
9 changes: 6 additions & 3 deletions rts/motoko-rts/src/gc/incremental/partitioned_heap.rs
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,11 @@
use core::{array::from_fn, ops::Range, ptr::null_mut};

use crate::{
constants::WASM_MEMORY_BYTE_SIZE, gc::incremental::mark_bitmap::BITMAP_ITERATION_END,
memory::Memory, rts_trap_with, types::*,
constants::{MB, WASM_MEMORY_BYTE_SIZE},
gc::incremental::mark_bitmap::BITMAP_ITERATION_END,
memory::Memory,
rts_trap_with,
types::*,
};

use super::{
Expand All @@ -57,7 +60,7 @@ use super::{
/// due to the increased frequency of large object handling.
/// -> Large partitions above 32 MB are a waste for small programs, since the WASM memory is
/// allocated in that granularity and GC is then triggered later.
pub const PARTITION_SIZE: usize = 32 * 1024 * 1024;
pub const PARTITION_SIZE: usize = 32 * MB;

/// Total number of partitions in the memory.
/// For simplicity, the last partition is left unused, to avoid a numeric overflow when
Expand Down
5 changes: 4 additions & 1 deletion rts/motoko-rts/src/memory.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,15 @@ use crate::constants::WASM_HEAP_SIZE;
use crate::rts_trap_with;
use crate::types::*;

#[cfg(feature = "ic")]
use crate::constants::MB;

use motoko_rts_macros::ic_mem_fn;

// Memory reserve in bytes ensured during update and initialization calls.
// For use by queries and upgrade calls.
#[cfg(feature = "ic")]
pub(crate) const MEMORY_RESERVE: usize = 256 * 1024 * 1024;
pub(crate) const GENERAL_MEMORY_RESERVE: usize = 256 * MB;

/// A trait for heap allocation. RTS functions allocate in heap via this trait.
///
Expand Down
7 changes: 4 additions & 3 deletions rts/motoko-rts/src/memory/ic.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ pub mod partitioned_memory;

use super::Memory;
use crate::constants::WASM_PAGE_SIZE;
use crate::memory::MEMORY_RESERVE;
use crate::rts_trap_with;
use crate::types::{Bytes, Value};
use core::arch::wasm32;
Expand Down Expand Up @@ -39,12 +38,14 @@ pub struct IcMemory;

/// Page allocation. Ensures that the memory up to, but excluding, the given pointer is allocated.
/// Ensure a memory reserve of at least one Wasm page depending on the canister state.
unsafe fn grow_memory(ptr: u64) {
/// `memory_reserve`: A memory reserve in bytes ensured during update and initialization calls.
// For use by queries and upgrade calls. The reserve may vary depending on the GC and the phase of the GC.
unsafe fn grow_memory(ptr: u64, memory_reserve: usize) {
const LAST_PAGE_LIMIT: usize = 0xFFFF_0000;
debug_assert_eq!(LAST_PAGE_LIMIT, usize::MAX - WASM_PAGE_SIZE.as_usize() + 1);
let limit = if keep_memory_reserve() {
// Spare a memory reserve during update and initialization calls for use by queries and upgrades.
usize::MAX - MEMORY_RESERVE + 1
usize::MAX - memory_reserve + 1
} else {
// Spare the last Wasm memory page on queries and upgrades to support the Rust call stack boundary checks.
LAST_PAGE_LIMIT
Expand Down
4 changes: 2 additions & 2 deletions rts/motoko-rts/src/memory/ic/linear_memory.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
use core::arch::wasm32;

use super::{get_aligned_heap_base, IcMemory, Memory};
use crate::types::*;
use crate::{memory::GENERAL_MEMORY_RESERVE, types::*};

/// Amount of garbage collected so far.
pub(crate) static mut RECLAIMED: Bytes<u64> = Bytes(0);
Expand Down Expand Up @@ -65,6 +65,6 @@ impl Memory for IcMemory {

#[inline(never)]
unsafe fn grow_memory(&mut self, ptr: u64) {
super::grow_memory(ptr);
super::grow_memory(ptr, GENERAL_MEMORY_RESERVE);
}
}
3 changes: 2 additions & 1 deletion rts/motoko-rts/src/memory/ic/partitioned_memory.rs
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ impl Memory for IcMemory {

#[inline(never)]
unsafe fn grow_memory(&mut self, ptr: u64) {
super::grow_memory(ptr);
let memory_reserve = crate::gc::incremental::memory_reserve();
super::grow_memory(ptr, memory_reserve);
}
}
117 changes: 84 additions & 33 deletions src/codegen/compile.ml
Original file line number Diff line number Diff line change
Expand Up @@ -627,10 +627,10 @@ module E = struct
| Flags.Generational -> "generational"
| Flags.Incremental -> "incremental"

let collect_garbage env =
let collect_garbage env force =
(* GC function name = "schedule_"? ("compacting" | "copying" | "generational" | "incremental") "_gc" *)
let name = gc_strategy_name !Flags.gc_strategy in
let gc_fn = if !Flags.force_gc then name else "schedule_" ^ name in
let gc_fn = if force || !Flags.force_gc then name else "schedule_" ^ name in
call_import env "rts" (gc_fn ^ "_gc")

(* See Note [Candid subtype checks] *)
Expand Down Expand Up @@ -1193,7 +1193,7 @@ module GC = struct

let collect_garbage env =
record_mutator_instructions env ^^
E.collect_garbage env ^^
E.collect_garbage env false ^^
record_collector_instructions env

end (* GC *)
Expand Down Expand Up @@ -1425,6 +1425,12 @@ module Stack = struct
f get_x ^^
dynamic_free_words env get_n

let dynamic_with_bytes env name f =
(* round up to nearest wordsize *)
compile_add_const (Int32.sub Heap.word_size 1l) ^^
compile_divU_const Heap.word_size ^^
dynamic_with_words env name f

(* Stack Frames *)

(* Traditional frame pointer for accessing statically allocated locals/args (all words)
Expand Down Expand Up @@ -4717,11 +4723,7 @@ module IC = struct
let fi = E.add_fun env "canister_heartbeat"
(Func.of_body env [] [] (fun env ->
G.i (Call (nr (E.built_in env "heartbeat_exp"))) ^^
(* TODO(3622)
Until DTS is implemented for heartbeats, don't collect garbage here,
just record mutator_instructions and leave GC scheduling to the
already scheduled async message running `system` function `heartbeat` *)
GC.record_mutator_instructions env (* future: GC.collect_garbage env *)))
GC.collect_garbage env))
in
E.add_export env (nr {
name = Lib.Utf8.decode "canister_heartbeat";
Expand All @@ -4734,11 +4736,7 @@ module IC = struct
let fi = E.add_fun env "canister_global_timer"
(Func.of_body env [] [] (fun env ->
G.i (Call (nr (E.built_in env "timer_exp"))) ^^
(* TODO(3622)
Until DTS is implemented for timers, don't collect garbage here,
just record mutator_instructions and leave GC scheduling to the
already scheduled async message running `system` function `timer` *)
GC.record_mutator_instructions env (* future: GC.collect_garbage env *)))
GC.collect_garbage env))
in
E.add_export env (nr {
name = Lib.Utf8.decode "canister_global_timer";
Expand Down Expand Up @@ -4922,30 +4920,47 @@ module IC = struct
E.trap_with env (Printf.sprintf "assertion failed at %s" (string_of_region at))

let async_method_name = Type.(motoko_async_helper_fld.lab)
let gc_trigger_method_name = Type.(motoko_gc_trigger_fld.lab)

let is_self_call env =
let (set_len_self, get_len_self) = new_local env "len_self" in
let (set_len_caller, get_len_caller) = new_local env "len_caller" in
system_call env "canister_self_size" ^^ set_len_self ^^
system_call env "msg_caller_size" ^^ set_len_caller ^^
get_len_self ^^ get_len_caller ^^ G.i (Compare (Wasm.Values.I32 I32Op.Eq)) ^^
G.if1 I32Type
begin
get_len_self ^^ Stack.dynamic_with_bytes env "str_self" (fun get_str_self ->
get_len_caller ^^ Stack.dynamic_with_bytes env "str_caller" (fun get_str_caller ->
get_str_caller ^^ compile_unboxed_const 0l ^^ get_len_caller ^^
system_call env "msg_caller_copy" ^^
get_str_self ^^ compile_unboxed_const 0l ^^ get_len_self ^^
system_call env "canister_self_copy" ^^
get_str_self ^^ get_str_caller ^^ get_len_self ^^ Heap.memcmp env ^^
compile_eq_const 0l))
end
begin
compile_unboxed_const 0l
end

let assert_caller_self env =
let (set_len1, get_len1) = new_local env "len1" in
let (set_len2, get_len2) = new_local env "len2" in
let (set_str1, get_str1) = new_local env "str1" in
let (set_str2, get_str2) = new_local env "str2" in
system_call env "canister_self_size" ^^ set_len1 ^^
system_call env "msg_caller_size" ^^ set_len2 ^^
get_len1 ^^ get_len2 ^^ G.i (Compare (Wasm.Values.I32 I32Op.Eq)) ^^
E.else_trap_with env "not a self-call" ^^

get_len1 ^^ Blob.dyn_alloc_scratch env ^^ set_str1 ^^
get_str1 ^^ compile_unboxed_const 0l ^^ get_len1 ^^
system_call env "canister_self_copy" ^^

get_len2 ^^ Blob.dyn_alloc_scratch env ^^ set_str2 ^^
get_str2 ^^ compile_unboxed_const 0l ^^ get_len2 ^^
system_call env "msg_caller_copy" ^^


get_str1 ^^ get_str2 ^^ get_len1 ^^ Heap.memcmp env ^^
compile_eq_const 0l ^^
is_self_call env ^^
E.else_trap_with env "not a self-call"

let is_controller_call env =
let (set_len_caller, get_len_caller) = new_local env "len_caller" in
system_call env "msg_caller_size" ^^ set_len_caller ^^
get_len_caller ^^ Stack.dynamic_with_bytes env "str_caller" (fun get_str_caller ->
get_str_caller ^^ compile_unboxed_const 0l ^^ get_len_caller ^^
system_call env "msg_caller_copy" ^^
get_str_caller ^^ get_len_caller ^^ is_controller env)

let assert_caller_self_or_controller env =
is_self_call env ^^
is_controller_call env ^^
G.i (Binary (Wasm.Values.I32 I32Op.Or)) ^^
E.else_trap_with env "not a self-call or call from controller"

(* Cycles *)

let cycle_balance env =
Expand Down Expand Up @@ -9062,6 +9077,41 @@ module FuncDec = struct
| _ -> ()
end

let export_gc_trigger_method env =
let name = IC.gc_trigger_method_name in
begin match E.mode env with
| Flags.ICMode | Flags.RefMode ->
Func.define_built_in env name [] [] (fun env ->
message_start env (Type.Shared Type.Write) ^^
(* Check that we are called from this or a controller, w/o allocation *)
IC.assert_caller_self_or_controller env ^^
(* To avoid more failing allocation, don't deserialize args nor serialize reply,
i.e. don't even try to do this:
Serialization.deserialize env [] ^^
Tuple.compile_unit ^^
Serialization.serialize env [] ^^
*)
(* Instead, just ignore the argument and
send a *statically* allocated, nullary reply *)
Blob.lit_ptr_len env "DIDL\x00\x00" ^^
IC.reply_with_data env ^^
(* Finally, act like
message_cleanup env (Type.Shared Type.Write)
but *force* collection *)
GC.record_mutator_instructions env ^^
E.collect_garbage env true ^^
GC.record_collector_instructions env ^^
Lifecycle.trans env Lifecycle.Idle
);

let fi = E.built_in env name in
E.add_export env (nr {
name = Lib.Utf8.decode ("canister_update " ^ name);
edesc = nr (FuncExport (nr fi))
})
| _ -> ()
end

end (* FuncDec *)


Expand Down Expand Up @@ -12057,6 +12107,7 @@ and conclude_module env set_serialization_globals start_fi_o =
RTS_Exports.system_exports env;

FuncDec.export_async_method env;
FuncDec.export_gc_trigger_method env;

(* See Note [Candid subtype checks] *)
Serialization.set_delayed_globals env set_serialization_globals;
Expand Down
Loading

0 comments on commit 5a8cd29

Please sign in to comment.