diff --git a/README.en.md b/README.en.md new file mode 100644 index 00000000..ece787c3 --- /dev/null +++ b/README.en.md @@ -0,0 +1,87 @@ +**[[简体中文]](README.zh-cn.md)** + +# Babylon + +[![CI](https://github.com/baidu/babylon/actions/workflows/ci.yml/badge.svg)](https://github.com/baidu/babylon/actions/workflows/ci.yml) +[![Coverage Status](https://coveralls.io/repos/github/baidu/babylon/badge.svg)](https://coveralls.io/github/baidu/babylon) + +Babylon is a foundational library designed to support high-performance C++ server-side development. It provides a wide array of core components focusing on memory and parallelism management. This library is widely applied in scenarios with stringent performance requirements, such as search and recommendation engines, autonomous driving, etc. + +## Core Features + +- **Efficient application-level memory pool mechanism** + - Compatible with and extends the [std::pmr::memory_resource](https://en.cppreference.com/w/cpp/memory/memory_resource) mechanism. + - Integrates and enhances the [google::protobuf::Arena](https://protobuf.dev/reference/cpp/arenas) mechanism. + - Provides capacity reservation, cleanup, and reconstruction mechanisms when object pools are used in conjunction with memory pools. + +- **Modular parallel computing framework** + - A high-performance automatic parallel framework based on lock-free DAG deduction. + - Provides a dataflow management scheme derived naturally from the execution flow, ensuring safe data race management in complex computation graph scenarios. + - Micro-pipeline parallel mechanism offering enhanced parallel capabilities. + +- **Core components for parallel development** + - Wait-free concurrent-safe containers (vector/queue/hash_table/...). + - Traversable thread-local storage development framework. + - Extensible synchronization primitives supporting both threads and coroutines (future/mutex/...). + +- **Foundational tools for application/framework construction** + - IOC component development framework. + - C++ object serialization framework. + - Zero-copy/zero-allocation asynchronous logging framework. + +## Build and Usage + +### Supported Platforms and Compilers + +- **OS**: Linux +- **CPU**: x86-64/aarch64 +- **COMPILER**: gcc/clang + +### Bazel + +Babylon uses [Bazel](https://bazel.build) for build management and [bzlmod](https://bazel.build/external/module) for dependency management. Given the ongoing transition of the Bazel ecosystem towards bzlmod, Babylon also compatible with the [workspace](https://bazel.build/rules/lib/globals/workspace) dependency management mode. + +- [Depend with bazel using bzlmod](example/depend-use-bzlmod) +- [Depend with bazel using workspace](example/depend-use-workspace) + +### CMake + +Babylon also supports building with [CMake](https://cmake.org) and allows dependency management through [find_package](https://cmake.org/cmake/help/latest/command/find_package.html), [add_subdirectory](https://cmake.org/cmake/help/latest/command/add_subdirectory.html), or [FetchContent](https://cmake.org/cmake/help/latest/module/FetchContent.html). + +- [Depend with CMake using FetchContent](example/depend-use-cmake-fetch) +- [Depend with CMake using find_package](example/depend-use-cmake-find) +- [Depend with CMake using add_subdirectory](example/depend-use-cmake-subdir) + +## Module Documentation + +- [:any](docs/any.en.md) +- [:anyflow](docs/anyflow/README.en.md) +- [:application_context](docs/application_context.en.md) +- [:concurrent](docs/concurrent/README.en.md) +- [:coroutine](docs/coroutine/README.en.md) +- [:executor](docs/executor.en.md) +- [:future](docs/future.en.md) +- [:logging](docs/logging/README.en.md) + - [Use async logger](example/use-async-logger) + - [Use with glog](example/use-with-glog) +- [:reusable](docs/reusable/README.en.md) +- [:serialization](docs/serialization.en.md) +- [:time](docs/time.en.md) +- Protobuf [arenastring](docs/arenastring.en.md) patch +- Typical usage with [brpc](https://github.com/apache/brpc) + - use [:future](docs/future.en.md) with bthread: [example/use-with-bthread](example/use-with-bthread) + - use [:reusable_memory_resource](docs/reusable/memory_resource.en.md) for rpc server: [example/use-arena-with-brpc](example/use-arena-with-brpc) + - use [:concurrent_counter](docs/concurrent/counter.en.md) implement bvar: [example/use-counter-with-bvar](example/use-counter-with-bvar) + +## Design Philosophy (chinese version only) + +- [Extreme optimizations by Baidu C++ engineers (Memory)](https://mp.weixin.qq.com/s?__biz=Mzg5MjU0NTI5OQ==&mid=2247489076&idx=1&sn=748bf716d94d5ed2739ea8a9385cd4a6&chksm=c03d2648f74aaf5e11298cf450c3453a273eb6d2161bc90e411b6d62fa0c1b96a45e411af805&scene=178&cur_album_id=1693053794688761860#rd) +- [Extreme optimizations by Baidu C++ engineers (Concurrency)](https://mp.weixin.qq.com/s/0Ofo8ak7-UXuuOoD0KIHwA) + +## How to Contribute + +If you encounter any issues or need new features, feel free to create an issue. + +If you can solve an issue, you're welcome to submit a PR. + +Before sending a PR, please ensure corresponding test cases are included. diff --git a/README.md b/README.md deleted file mode 100644 index 07c4ccce..00000000 --- a/README.md +++ /dev/null @@ -1,82 +0,0 @@ -# Babylon - -[![CI](https://github.com/baidu/babylon/actions/workflows/ci.yml/badge.svg)](https://github.com/baidu/babylon/actions/workflows/ci.yml) -[![Coverage Status](https://coveralls.io/repos/github/baidu/babylon/badge.svg)](https://coveralls.io/github/baidu/babylon) - -Babylon是一个用于支持C++高性能服务端开发的基础库,从内存和并行管理角度提供了大量的基础组件。广泛应用在对性能有严苛要求的场景,典型例如搜索推荐引擎,自动驾驶车载计算等场景 - -## 核心功能 - -- 高效的应用级内存池机制 - - 兼容并扩展了[std::pmr::memory_resource](https://en.cppreference.com/w/cpp/memory/memory_resource)机制 - - 整合并增强了[google::protobuf::Arena](https://protobuf.dev/reference/cpp/arenas)机制 - - 在对象池结合内存池使用的情况下提供了保留容量清理和重建的机制 -- 组件式并行计算框架 - - 基于无锁DAG推导的高性能自动组件并行框架 - - 依照执行流天然生成数据流管理方案,复杂计算图场景下提供安全的数据竞争管理 - - 微流水线并行机制,提供上限更好的并行化能力 -- 并行开发基础组件 - - wait-free级别的并发安全容器(vector/queue/hash_table/...) - - 可遍历的线程缓存开发框架 - - 可扩展支持线程/协程的同步原语(future/mutex/...) -- 应用/框架搭建基础工具 - - IOC组件开发框架 - - C++对象序列化框架 - - 零拷贝/零分配异步日志框架 - -## 编译并使用 - -### 支持平台和编译器 - -- OS: Linux -- CPU: x86-64/aarch64 -- COMPILER: gcc/clang - -### Bazel - -Babylon使用[Bazel](https://bazel.build)进行构建并使用[bzlmod](https://bazel.build/external/module)进行依赖管理,考虑到目前Bazel生态整体处于bzlmod的转换周期,Babylon也依然兼容[workspace](https://bazel.build/rules/lib/globals/workspace)依赖管理模式 - -- [Depend with bazel use bzlmod](example/depend-use-bzlmod) -- [Depend with bazel use workspace](example/depend-use-workspace) - -### CMake - -Babylon也支持使用[CMake](https://cmake.org)进行构建,并支持通过[find_package](https://cmake.org/cmake/help/latest/command/find_package.html)、[add_subdirectory](https://cmake.org/cmake/help/latest/command/add_subdirectory.html)或[FetchContent](https://cmake.org/cmake/help/latest/module/FetchContent.html)进行依赖引入 - -- [Depend with cmake use FetchContent](example/depend-use-cmake-fetch) -- [Depend with cmake use find_package](example/depend-use-cmake-find) -- [Depend with cmake use add_subdirectory](example/depend-use-cmake-subdir) - -## 模块功能文档 - -- [:any](docs/any.md) -- [:anyflow](docs/anyflow/index.md) -- [:application_context](docs/application_context.md) -- [:concurrent](docs/concurrent/index.md) -- [:coroutine](docs/coroutine) -- [:executor](docs/executor.md) -- [:future](docs/future.md) -- [:logging](docs/logging/index.md) - - [Use async logger](example/use-async-logger) - - [Use with glog](example/use-with-glog) -- [:reusable](docs/reusable/index.md) -- [:serialization](docs/serialization.md) -- [:time](docs/time.md) -- Protobuf [arenastring](docs/arenastring.md) patch -- Typical usage with [brpc](https://github.com/apache/brpc) - - use [:future](docs/future.md) with bthread: [example/use-with-bthread](example/use-with-bthread) - - use [:reusable_memory_resource](docs/reusable/memory_resource.md) for rpc server: [example/use-arena-with-brpc](example/use-arena-with-brpc) - - use [:concurrent_counter](docs/concurrent/counter.md) implement bvar: [example/use-counter-with-bvar](example/use-counter-with-bvar) - -## 整体设计思路 - -- [百度C++工程师的那些极限优化(内存篇)](https://mp.weixin.qq.com/s?__biz=Mzg5MjU0NTI5OQ==&mid=2247489076&idx=1&sn=748bf716d94d5ed2739ea8a9385cd4a6&chksm=c03d2648f74aaf5e11298cf450c3453a273eb6d2161bc90e411b6d62fa0c1b96a45e411af805&scene=178&cur_album_id=1693053794688761860#rd) -- [百度C++工程师的那些极限优化(并发篇)](https://mp.weixin.qq.com/s/0Ofo8ak7-UXuuOoD0KIHwA) - -## 如何贡献 - -如果你遇到问题或需要新功能,欢迎创建issue。 - -如果你可以解决某个issue, 欢迎发送PR。 - -发送PR前请确认有对应的单测代码。 diff --git a/README.md b/README.md new file mode 120000 index 00000000..b636b478 --- /dev/null +++ b/README.md @@ -0,0 +1 @@ +README.zh-cn.md \ No newline at end of file diff --git a/README.zh-cn.md b/README.zh-cn.md new file mode 100644 index 00000000..0dd0524f --- /dev/null +++ b/README.zh-cn.md @@ -0,0 +1,84 @@ +**[[English]](README.en.md)** + +# Babylon + +[![CI](https://github.com/baidu/babylon/actions/workflows/ci.yml/badge.svg)](https://github.com/baidu/babylon/actions/workflows/ci.yml) +[![Coverage Status](https://coveralls.io/repos/github/baidu/babylon/badge.svg)](https://coveralls.io/github/baidu/babylon) + +Babylon是一个用于支持C++高性能服务端开发的基础库,从内存和并行管理角度提供了大量的基础组件。广泛应用在对性能有严苛要求的场景,典型例如搜索推荐引擎,自动驾驶车载计算等场景 + +## 核心功能 + +- 高效的应用级内存池机制 + - 兼容并扩展了[std::pmr::memory_resource](https://en.cppreference.com/w/cpp/memory/memory_resource)机制 + - 整合并增强了[google::protobuf::Arena](https://protobuf.dev/reference/cpp/arenas)机制 + - 在对象池结合内存池使用的情况下提供了保留容量清理和重建的机制 +- 组件式并行计算框架 + - 基于无锁DAG推导的高性能自动组件并行框架 + - 依照执行流天然生成数据流管理方案,复杂计算图场景下提供安全的数据竞争管理 + - 微流水线并行机制,提供上限更好的并行化能力 +- 并行开发基础组件 + - wait-free级别的并发安全容器(vector/queue/hash_table/...) + - 可遍历的线程缓存开发框架 + - 可扩展支持线程/协程的同步原语(future/mutex/...) +- 应用/框架搭建基础工具 + - IOC组件开发框架 + - C++对象序列化框架 + - 零拷贝/零分配异步日志框架 + +## 编译并使用 + +### 支持平台和编译器 + +- OS: Linux +- CPU: x86-64/aarch64 +- COMPILER: gcc/clang + +### Bazel + +Babylon使用[Bazel](https://bazel.build)进行构建并使用[bzlmod](https://bazel.build/external/module)进行依赖管理,考虑到目前Bazel生态整体处于bzlmod的转换周期,Babylon也依然兼容[workspace](https://bazel.build/rules/lib/globals/workspace)依赖管理模式 + +- [Depend with bazel use bzlmod](example/depend-use-bzlmod) +- [Depend with bazel use workspace](example/depend-use-workspace) + +### CMake + +Babylon也支持使用[CMake](https://cmake.org)进行构建,并支持通过[find_package](https://cmake.org/cmake/help/latest/command/find_package.html)、[add_subdirectory](https://cmake.org/cmake/help/latest/command/add_subdirectory.html)或[FetchContent](https://cmake.org/cmake/help/latest/module/FetchContent.html)进行依赖引入 + +- [Depend with cmake use FetchContent](example/depend-use-cmake-fetch) +- [Depend with cmake use find_package](example/depend-use-cmake-find) +- [Depend with cmake use add_subdirectory](example/depend-use-cmake-subdir) + +## 模块功能文档 + +- [:any](docs/any.zh-cn.md) +- [:anyflow](docs/anyflow/README.zh-cn.md) +- [:application_context](docs/application_context.zh-cn.md) +- [:concurrent](docs/concurrent/README.zh-cn.md) +- [:coroutine](docs/coroutine/README.zh-cn.md) +- [:executor](docs/executor.zh-cn.md) +- [:future](docs/future.zh-cn.md) +- [:logging](docs/logging/README.zh-cn.md) + - [Use async logger](example/use-async-logger) + - [Use with glog](example/use-with-glog) +- [:reusable](docs/reusable/README.zh-cn.md) +- [:serialization](docs/serialization.zh-cn.md) +- [:time](docs/time.zh-cn.md) +- Protobuf [arenastring](docs/arenastring.zh-cn.md) patch +- Typical usage with [brpc](https://github.com/apache/brpc) + - use [:future](docs/future.zh-cn.md) with bthread: [example/use-with-bthread](example/use-with-bthread) + - use [:reusable_memory_resource](docs/reusable/memory_resource.zh-cn.md) for rpc server: [example/use-arena-with-brpc](example/use-arena-with-brpc) + - use [:concurrent_counter](docs/concurrent/counter.zh-cn.md) implement bvar: [example/use-counter-with-bvar](example/use-counter-with-bvar) + +## 整体设计思路 + +- [百度C++工程师的那些极限优化(内存篇)](https://mp.weixin.qq.com/s?__biz=Mzg5MjU0NTI5OQ==&mid=2247489076&idx=1&sn=748bf716d94d5ed2739ea8a9385cd4a6&chksm=c03d2648f74aaf5e11298cf450c3453a273eb6d2161bc90e411b6d62fa0c1b96a45e411af805&scene=178&cur_album_id=1693053794688761860#rd) +- [百度C++工程师的那些极限优化(并发篇)](https://mp.weixin.qq.com/s/0Ofo8ak7-UXuuOoD0KIHwA) + +## 如何贡献 + +如果你遇到问题或需要新功能,欢迎创建issue。 + +如果你可以解决某个issue, 欢迎发送PR。 + +发送PR前请确认有对应的单测代码。 diff --git a/WORKSPACE b/WORKSPACE index 76e65b88..950b98d5 100644 --- a/WORKSPACE +++ b/WORKSPACE @@ -36,17 +36,6 @@ http_archive( sha256 = '7b42b4d6ed48810c5362c265a17faebe90dc2373c885e5216439d37927f02926', ) -http_archive( - name = 'rules_foreign_cc', - urls = ['https://github.com/bazelbuild/rules_foreign_cc/releases/download/0.12.0/rules_foreign_cc-0.12.0.tar.gz'], - strip_prefix = 'rules_foreign_cc-0.12.0', - sha256 = 'a2e6fb56e649c1ee79703e99aa0c9d13c6cc53c8d7a0cbb8797ab2888bbc99a3', -) -load('@rules_foreign_cc//foreign_cc:repositories.bzl', 'rules_foreign_cc_dependencies') -rules_foreign_cc_dependencies() -load("@bazel_features//:deps.bzl", "bazel_features_deps") -bazel_features_deps() - http_archive( name = 'rules_cuda', urls = ['https://github.com/bazel-contrib/rules_cuda/releases/download/v0.2.3/rules_cuda-v0.2.3.tar.gz'], diff --git a/docs/README.en.md b/docs/README.en.md new file mode 100644 index 00000000..a524d970 --- /dev/null +++ b/docs/README.en.md @@ -0,0 +1,22 @@ +**[[简体中文]](README.zh-cn.md)** + +## Module Documentation + +- [:any](any.en.md) +- [:anyflow](anyflow/README.en.md) +- [:application_context](application_context.en.md) +- [:concurrent](concurrent/README.en.md) +- [:coroutine](coroutine/README.en.md) +- [:executor](executor.en.md) +- [:future](future.en.md) +- [:logging](logging/README.en.md) + - [Use async logger](../example/use-async-logger) + - [Use with glog](../example/use-with-glog) +- [:reusable](reusable/README.en.md) +- [:serialization](serialization.en.md) +- [:time](time.en.md) +- Protobuf [arenastring](arenastring.en.md) patch +- Typical usage with [brpc](https://github.com/apache/brpc) + - use [:future](future.en.md) with bthread: [example/use-with-bthread](../example/use-with-bthread) + - use [:reusable_memory_resource](reusable/memory_resource.en.md) for rpc server: [example/use-arena-with-brpc](../example/use-arena-with-brpc) + - use [:concurrent_counter](concurrent/counter.en.md) implement bvar: [example/use-counter-with-bvar](../example/use-counter-with-bvar) diff --git a/docs/README.md b/docs/README.md new file mode 120000 index 00000000..b636b478 --- /dev/null +++ b/docs/README.md @@ -0,0 +1 @@ +README.zh-cn.md \ No newline at end of file diff --git a/docs/README.zh-cn.md b/docs/README.zh-cn.md new file mode 100644 index 00000000..5d9d979f --- /dev/null +++ b/docs/README.zh-cn.md @@ -0,0 +1,22 @@ +**[[English]](README.en.md)** + +# 模块功能文档 + +- [:any](any.zh-cn.md) +- [:anyflow](anyflow/README.zh-cn.md) +- [:application_context](application_context.zh-cn.md) +- [:concurrent](concurrent/README.zh-cn.md) +- [:coroutine](coroutine/README.zh-cn.md) +- [:executor](executor.zh-cn.md) +- [:future](future.zh-cn.md) +- [:logging](logging/README.zh-cn.md) + - [Use async logger](../example/use-async-logger) + - [Use with glog](../example/use-with-glog) +- [:reusable](reusable/README.zh-cn.md) +- [:serialization](serialization.zh-cn.md) +- [:time](time.zh-cn.md) +- Protobuf [arenastring](arenastring.zh-cn.md) patch +- Typical usage with [brpc](https://github.com/apache/brpc) + - use [:future](future.zh-cn.md) with bthread: [example/use-with-bthread](../example/use-with-bthread) + - use [:reusable_memory_resource](reusable/memory_resource.zh-cn.md) for rpc server: [example/use-arena-with-brpc](../example/use-arena-with-brpc) + - use [:concurrent_counter](concurrent/counter.zh-cn.md) implement bvar: [example/use-counter-with-bvar](../example/use-counter-with-bvar) diff --git a/docs/any.en.md b/docs/any.en.md new file mode 100644 index 00000000..b01a51b9 --- /dev/null +++ b/docs/any.en.md @@ -0,0 +1,87 @@ +**[[简体中文]](any.zh-cn.md)** + +# any + +## Overview + +`babylon::Any` is a general-purpose container that supports type erasure, similar to `std::any`, but with additional features. + +1. **Reference Capability**: The container can choose not to hold the instance itself but record a reference (pointer). In contrast, `std::any` can only implement references by holding a `T*`. This makes it difficult for users to treat `std::any` holding an object and one holding a reference uniformly when using `std::any_cast`. `babylon::Any` uses specialized reference support to eliminate the distinction between holding the object and holding a reference for the user. +2. **Pointer Transfer**: The container can accept externally constructed instances and directly retain their pointers. Unlike `std::any`, which only supports creating a new instance (via move, copy, or in-place construction), `babylon::Any` supports pointer transfer, useful in cases where instances are non-movable, non-copyable, or constructed via factory functions. +3. **Cascading Type Erasure**: Besides the `` version for transfers and references, the container also supports a type-erased version (`const Descriptor*`, `void*`), enabling interoperability with other type-erased containers. + +## Usage + +```c++ +#include "babylon/any.h" + +using ::babylon::Any; + +// Copy Assignment +{ + Object obj; + Any any; // Create an empty container + any = obj; // Copy-construct a new Object into the container +} +// Move Assignment +{ + Object obj; + Any any; // Create an empty container + any = ::std::move(obj); // Move-construct a new Object into the container +} +// Explicit pointer transfer with unique_ptr; supports objects that cannot be refactored to support move/copy, such as those from legacy components +{ + Object* obj = create(); // Assume obj is constructed via a factory function, which is non-extensible, non-copyable, and non-movable. + Any any; // Create an empty container + any = ::std::unique_ptr(obj); // Wrap obj in a unique_ptr and move it into the container, retaining control over the instance +} +// Reference external objects; the container does not hold the object itself or manage its lifetime, providing a unified view in some frameworks +{ + Object obj; + Any any; // Create an empty container + any.ref(obj); // Store a reference to obj, while ensuring the user manages its lifecycle + any.cref(obj); // Similar to ref but ensures the object is immutable, preventing the retrieval of non-const pointers +} +// Retrieving values: All assignment methods appear identical to the consumer +{ + Object* pobj = any.get(); // Retrieve a pointer to the stored object, only if the exact type matches (no base/derived type conversion). + // cref ensures that mutable references are not retrievable, returning nullptr instead. + const Object* pobj = ((const Any&)any).get(); // Retrieves a constant pointer, bypassing cref restrictions. + const Object* pobj = any.cget(); // Shortcut for retrieving a constant pointer (cget follows STL naming conventions). + if (pobj != nullptr) { // Returns nullptr if no content is stored, types do not match, or cref blocks mutable access + pobj->func_in_obj(); + } +} +// Primitive type specialization +{ + Any any; // Create an empty container + any = (int32_t) 123; // Explicitly store an int32_t value + int64_t* pv = any.get(); // Returns nullptr, as the types are not strictly identical + int64_t v = any.as(); // Converts and returns 123 + // Automatic conversion is supported for bool, int8-64, uint8-64, float, and double +} + +// Type erasure +{ + auto desc = Any::descriptor(); // Retrieve the Descriptor for Object + void* ptr = get_from_some_type_erased_container(); // Get an untyped pointer to an Object instance + Any any; + any.ref(desc, ptr); // Reference the pointer, treating it as an Object; the user must ensure that ptr points to an actual Object + // Typically, ptr comes from another type-erasure mechanism, and desc is stored alongside it for future retrieval + any.assign(desc, ptr); // Similar to ref, but transfers ownership, analogous to assigning a unique_ptr of Object +} +``` + +## Performance Comparison + +`babylon::Any` has the same size as `std::any`, consisting of a type accessor and a real instance pointer. + +![](images/any.1.png) + +For both primitive and instance types, construction and destruction performance is on par with `std::any`. + +![](images/any.2.png) + +Access is optimized through lightweight checks and reduced reliance on virtual functions, providing a performance boost. + +![](images/any.3.png) diff --git a/docs/any.md b/docs/any.zh-cn.md similarity index 97% rename from docs/any.md rename to docs/any.zh-cn.md index 202df025..f15058db 100644 --- a/docs/any.md +++ b/docs/any.zh-cn.md @@ -1,3 +1,5 @@ +**[[English]](any.en.md)** + # any ## 原理 @@ -11,7 +13,7 @@ ## 使用方法 ```c++ -#include +#include "babylon/any.h" using ::babylon::Any; @@ -69,10 +71,6 @@ using ::babylon::Any; // 在需要呈现为any时,和实例指针成对使用即可对接到babylon::Any any.assign(desc, ptr) // 除了引用,也可以支持生命周期转移,相当于明确operator=到Object类型std::unique_ptr } - -// 更说明见互联注释 -// 单测 test/test_any.cpp -// 性能对比 bench/bench_any.cpp ``` ## 性能对比 diff --git a/docs/anyflow/README.en.md b/docs/anyflow/README.en.md new file mode 100644 index 00000000..274558d3 --- /dev/null +++ b/docs/anyflow/README.en.md @@ -0,0 +1,19 @@ +**[[简体中文]](README.zh-cn.md)** + +# anyflow + +`anyflow` is a component-based parallel computing framework that achieves parallelism by breaking down a computing task into a series of sub-tasks, which are then organized in a Directed Acyclic Graph (DAG). + +Unlike traditional DAG-based parallel frameworks, in `anyflow`, the sub-tasks are not directly connected. Instead, the framework introduces the concept of **data nodes** to explicitly represent the data flow between sub-tasks. This explicit data flow representation eliminates implicit data dependencies between sub-tasks, reducing the coupling between them. Moreover, through the intermediary data nodes, `anyflow` enables advanced features such as partial execution, conditional execution, and micro-pipeline interaction. + +![anyflow logic](images/anyflow_logic.png) + +## Documentation + +- [Design](design.pdf)(chinese version only) +- [Overview](overview.en.md) +- [Quick Start Guide](quick_start.en.md) +- [Builder](builder.en.md) +- [Graph](graph.en.md) +- [Processor](processor.en.md) +- [Expression](expression.en.md) diff --git a/docs/anyflow/README.md b/docs/anyflow/README.md new file mode 120000 index 00000000..b636b478 --- /dev/null +++ b/docs/anyflow/README.md @@ -0,0 +1 @@ +README.zh-cn.md \ No newline at end of file diff --git a/docs/anyflow/index.md b/docs/anyflow/README.zh-cn.md similarity index 71% rename from docs/anyflow/index.md rename to docs/anyflow/README.zh-cn.md index 1ba83ab3..12cbd843 100644 --- a/docs/anyflow/index.md +++ b/docs/anyflow/README.zh-cn.md @@ -1,3 +1,5 @@ +**[[English]](README.en.md)** + # anyflow anyflow是一个组件化并行计算框架,通过将一段计算逻辑转化成一系列子计算逻辑,并通过DAG组织起来,实现计算的并行化 @@ -8,10 +10,10 @@ anyflow是一个组件化并行计算框架,通过将一段计算逻辑转化 ## 功能文档 -- [design](design.pdf) -- [overview](overview.md) -- [fast begin](fast_begin.md) -- [builder](builder.md) -- [graph](graph.md) -- [processor](processor.md) -- [expression](expression.md) +- [Design](design.pdf) +- [Overview](overview.zh-cn.md) +- [Quick Start Guide](quick_start.zh-cn.md) +- [Builder](builder.zh-cn.md) +- [Graph](graph.zh-cn.md) +- [Processor](processor.zh-cn.md) +- [Expression](expression.zh-cn.md) diff --git a/docs/anyflow/builder.en.md b/docs/anyflow/builder.en.md new file mode 100644 index 00000000..21fab7c5 --- /dev/null +++ b/docs/anyflow/builder.en.md @@ -0,0 +1,135 @@ +**[[简体中文]](builder.zh-cn.md)** + +# Builder + +## Hierarchical Structure + +![](images/builder_hierarchie.png) + +Builder is the composition component of the graph engine, consisting of a group of objects. `GraphBuilder` serves as the main entry point. + +- Using `add_vertex` on `GraphBuilder` returns a `GraphVertexBuilder`. +- On `GraphVertexBuilder`, you can call `(named_depend | anonymous_depend)` or `(named_emit | anonymous_emit)` to obtain `GraphDependencyBuilder` and `GraphEmitBuilder`. +- These builders allow detailed configuration of the graph. The builder functions typically return a reference to the object itself to enable method chaining. + +## GraphBuilder + +```c++ +#include "babylon/anyflow/builder.h" + +using babylon::anyflow::Graph; +using babylon::anyflow::GraphBuilder; +using babylon::anyflow::GraphExecutor; +using babylon::anyflow::GraphVertexBuilder; + +// GraphBuilder can be initialized using the default constructor +GraphBuilder builder; + +// Set a name, mainly used for logging and identification +builder.set_name("NameOfThisGraph"); + +// When the final graph is executed, ready nodes will be submitted for execution via the GraphExecutor +// The default is to use the InplaceGraphExecutor, which executes all nodes sequentially in the current thread +GraphExecutor& executor = get_some_useful_executor(); +builder.set_executor(executor); + +// The only graph construction operation on GraphBuilder is adding a new vertex (node), +// which returns a reference to the corresponding `GraphVertexBuilder` for that node. +// A factory function for the `GraphProcessor` is required to implement the actual functionality of the node. +// Further node settings can be done via the returned `GraphVertexBuilder`. +// The returned `GraphVertexBuilder` is maintained within the `GraphBuilder`. +GraphVertexBuilder& vertex_builder = builder.add_vertex([] { + return std::make_unique(); +}); + +... // Continue constructing the graph until completion + +// Finish the graph construction. This will analyze and validate the existing information. +// It returns 0 on success, or a non-zero value in case of errors such as conflicting outputs or type mismatches. +// After calling this method, no further non-const operations are allowed on the graph. +// The only subsequent action typically is repeatedly using `build` to obtain executable graph instances. +int return_code = builder.finish(); + +// Returns a newly constructed `Graph` instance. +// A `Graph` instance can only be executed exclusively, but it can be reused. +// Typically, threads or object pools can be used to support concurrency. +// The `Graph` will reference members of the builder, so the builder must remain alive until all graphs created by `build` are destroyed. +// The builder itself can be managed using a singleton pattern or similar. +::std::unique_ptr graph = builder.build(); +``` + +## GraphVertexBuilder + +```c++ +#include "babylon/anyflow/builder.h" + +using babylon::anyflow::GraphDependencyBuilder; +using babylon::anyflow::GraphEmitBuilder; +using babylon::anyflow::GraphVertexBuilder; + +// `GraphVertexBuilder` is obtained via `GraphBuilder::add_vertex` +GraphVertexBuilder& vertex_builder = builder.add_vertex(processor_creator); + +// Add a new named dependency +// `local_name` corresponds to the input member name defined in the `GraphProcessor` via `ANYFLOW_INTERFACE`. +// Calling `named_depend` with the same `local_name` returns the same `GraphDependencyBuilder` instance. +// Further settings can be done through the returned `GraphDependencyBuilder`. +// The `GraphDependencyBuilder` is maintained within the `GraphVertexBuilder`. +GraphDependencyBuilder& named_dependency_builder = vertex_builder.named_depend("local_name"); + +// Add a new anonymous dependency, supporting advanced variadic input features. +// In the `GraphProcessor`, the data can be accessed by index through `vertex().anonymous_dependency(index)`. +// Similarly, a `GraphDependencyBuilder` is returned for further configuration. +GraphDependencyBuilder& anonymous_dependency_builder = vertex_builder.anonymous_depend(); + +// Add a new named emit (output). +// `local_name` corresponds to the output member name defined in the `GraphProcessor` via `ANYFLOW_INTERFACE`. +// Calling `named_emit` with the same `local_name` returns the same `GraphEmitBuilder` instance. +// Further settings can be done through the returned `GraphEmitBuilder`. +// The `GraphEmitBuilder` is maintained within the `GraphVertexBuilder`. +GraphEmitBuilder& named_emit_builder = vertex_builder.named_emit("local_name"); + +// Add a new anonymous emit (output), supporting advanced variadic output features. +// In the `GraphProcessor`, the data can be accessed by index through `vertex().anonymous_emit(index)`. +// Similarly, a `GraphEmitBuilder` is returned for further configuration. +GraphEmitBuilder& anonymous_emit_builder = vertex_builder.anonymous_emit(); + +// Set configuration data for the vertex. The data will be moved or copied into the `vertex_builder`. +// In the `GraphProcessor`, the config can be further processed via the `config` function and finally obtained via the `option` function. +// This is mainly used to customize the behavior of the `GraphProcessor`. +// The `option` can support any type, though it is typically a configuration file format like JSON or YAML. +AnyTypeUseAsOption option; +vertex_builder.option(::std::move(option)); +``` + +## GraphDependencyBuilder + +```c++ +#include "babylon/anyflow/builder.h" + +using babylon::anyflow::GraphDependencyBuilder; + +// `GraphDependencyBuilder` is obtained via `GraphVertexBuilder::named_depend` and `GraphVertexBuilder::anonymous_depend` +GraphDependencyBuilder& dependency_builder = vertex_builder.named_depend("local_name"); +GraphDependencyBuilder& dependency_builder = vertex_builder.anonymous_depend(); + +// Set the dependency target to the globally named data `target_name`. +// The output with the same target name in `GraphEmitBuilder` will be connected together. +dependency_builder.to("target_name"); +``` + +## GraphEmitBuilder + +```c++ +#include "babylon/anyflow/builder.h" + +using babylon::anyflow::GraphEmitBuilder; + +// `GraphEmitBuilder` is obtained via `GraphVertexBuilder::named_emit` and `GraphVertexBuilder::anonymous_emit` +GraphEmitBuilder& emit_builder = vertex_builder.named_emit("local_name"); +GraphEmitBuilder& emit_builder = vertex_builder.anonymous_emit(); + +// Set the output target to the globally named data `target_name`. +// The output with the same target name in `GraphDependencyBuilder` will be connected together. +emit_builder.to("target_name"); +``` diff --git a/docs/anyflow/builder.md b/docs/anyflow/builder.zh-cn.md similarity index 99% rename from docs/anyflow/builder.md rename to docs/anyflow/builder.zh-cn.md index 97c558b7..f3db55ae 100644 --- a/docs/anyflow/builder.md +++ b/docs/anyflow/builder.zh-cn.md @@ -1,3 +1,5 @@ +**[[English]](builder.en.md)** + # builder ## 层次关系 diff --git a/docs/anyflow/expression.en.md b/docs/anyflow/expression.en.md new file mode 100644 index 00000000..8937a467 --- /dev/null +++ b/docs/anyflow/expression.en.md @@ -0,0 +1,46 @@ +**[[简体中文]](expression.zh-cn.md)** + +# expression + +## ExpressionProcessor + +```c++ +#include "babylon/anyflow/builtin/expression.h" + +using babylon::anyflow::builtin::ExpressionProcessor; +using babylon::anyflow::GraphBuilder; + +GraphBuilder builder; +// ... +// Normally construct a graph + +// Arithmetic Expression +// B and C are the names of existing GraphData in the graph, and applying this will generate a new GraphData named A. +// A depends on B and C, and its value is determined by the expression B + C * 5. +// B and C involved in the operation must be primitive types. +ExpressionProcessor::apply(builder, "A", "B + C * 5"); + +// Conditional Expression +// B and C are the names of existing GraphData in the graph, and applying this will generate a new GraphData named A. +// A first depends on B, and if B is true, it outputs the value of C; otherwise, it outputs 5 (in this case, C is not activated, and no evaluation of C will be performed). +// A typical conditional expression can be used to selectively activate execution paths. +ExpressionProcessor::apply(builder, "A", "B ? C : 5"); + +// Mixed Nesting +ExpressionProcessor::apply(builder, "A", "B > C ? D + 4 : E * 5"); + +// String Support +ExpressionProcessor::apply(builder, "A", "B != \"6\" ? C : D"); + +// Auto-completion +// It will traverse all GraphData present in the builder. If there is some GraphData: +// 1. There is no GraphVertex declaration in the graph using it as an output. +// 2. The name is an expression, not just a single variable name. +// Then it will attempt to use ExpressionProcessor::apply(builder, data_name, data_name); +// to create a GraphVertex that can produce the result of the expression. +ExpressionProcessor::apply(builder); + +// Expressions must be applied before calling finish, after which the graph is used normally. +builder.finish(); +auto graph = builder.build(); +``` diff --git a/docs/anyflow/expression.md b/docs/anyflow/expression.zh-cn.md similarity index 97% rename from docs/anyflow/expression.md rename to docs/anyflow/expression.zh-cn.md index fea536ee..de799876 100644 --- a/docs/anyflow/expression.md +++ b/docs/anyflow/expression.zh-cn.md @@ -1,3 +1,5 @@ +**[[English]](expression.en.md)** + # expression ## ExpressionProcessor diff --git a/docs/anyflow/graph.en.md b/docs/anyflow/graph.en.md new file mode 100644 index 00000000..feab585e --- /dev/null +++ b/docs/anyflow/graph.en.md @@ -0,0 +1,153 @@ +**[[简体中文]](graph.zh-cn.md)** + +# Graph + +## Graph + +```c++ +#include "babylon/anyflow/graph.h" + +using babylon::anyflow::Graph; +using babylon::anyflow::GraphData; +using babylon::anyflow::Closure; + +// The Graph instance is obtained via GraphBuilder::build and must be used exclusively. +// Generally, concurrency needs to be supported through thread_local or pooling techniques. +std::unique_ptr graph = builder.build(); + +// Find the data named 'name'. If it doesn't exist, return nullptr. +// GraphData can be further used for retrieving or assigning values. +// The data set is collected from all +// named_depend(...).to("name") +// named_emit(...).to("name") +// for the specified 'name'. +GraphData* data = graph->find_data("name"); + +... // Initial value assignment + +// Run the Graph instance to solve for the target data. +// The typical usage is to assign initial values to several nodes first, and then run to evaluate other terminal nodes. +// The execution is an asynchronous process, and progress can be managed via the returned closure. +Closure closure = graph->run(data); + +... // Wait for completion and retrieve the result + +// Reset the graph's execution state for reuse of the same Graph instance. +// Before resetting, ensure that the previous execution is completely finished, i.e., wait for the previous closure to fully finish (either destruct or closure.wait). +graph->reset(); +``` + +## GraphData + +```c++ +#include "babylon/anyflow/data.h" + +using babylon::anyflow::GraphData; +using babylon::anyflow::Committer; +using babylon::Any; + +// GraphData is obtained via Graph::find_data. +GraphData* data = graph->find_data("name"); + +// Get a read-only value of type T from data. If data is unassigned or the type does not match, return nullptr. +// T can be ::babylon::Any, in which case the underlying value container will be returned, useful for advanced scenarios (e.g., operating on objects that cannot be default-constructed). +const T* value = data->value(); +const Any* value = data->value(); + +// Get a committer for type T from data. If data has already been assigned, an invalid committer will be returned. +// T can be ::babylon::Any, in which case the returned committer can directly operate on the underlying value container, useful for advanced scenarios (e.g., operating on objects that cannot be default-constructed). +Committer committer = data->emit(); +Committer committer = data->emit(); +if (committer) { + // Valid committer +} else { + // Invalid committer, possibly due to previous emission +} + +// Convert and return the data as an object of type T. T supports basic types, equivalent to the underlying Any's as method. +// Automatic conversion is supported for intxx_t/uintxx_t/bool and other primitive types. +T value = data->as(); + +// [Advanced] Pre-set the reference of some_exist_instance into data before the graph runs. +// When data->emit() is called later, some_exist_instance will be used as the underlying instance. +// This is mainly used to optimize data transmission to external systems (e.g., communication framework layers) by allowing the graph to directly operate on external instances, avoiding unnecessary copies. +T some_exist_instance; +data->preset(some_exist_instance); +Committer committer = data->emit(); +// committer.get() == &some_exist_instance + +// [Advanced] Obtain the type declared for data. +// The declaration is completed through the GraphProcessor associated with data, using the ANYFLOW_INTERFACE macro. +const babylon::Id& type_id = data->declared_type(); +``` + +## Committer + +```c++ +#include "babylon/anyflow/data.h" + +using babylon::anyflow::Committer; + +// The committer is obtained from the GraphData::emit template function. +Committer committer = data->emit(); + +// Check if the committer is valid. +bool valid = committer.valid(); +bool valid = (bool)committer; + +// Use the committer as a pointer-like object to operate on the data to be emitted. +T* value = committer->get(); // Get raw pointer +committer->some_func(); // Directly call a function +*committer = value; // Assign a value, calling the assignment function +committer.ref(value); // Reference the value, where the lifecycle of the referenced value needs to be managed externally +committer.cref(value); // Reference an immutable value, where the lifecycle of the referenced value needs to be managed externally + +// Logically clear the object to be emitted, as if emitting a nullptr. +committer.clear(); + +// Cancel data emission. When destructed, the object will not be emitted, as if data->emit was never called. +committer.cancel(); + +// Confirm the data emission. After this, the object to be emitted can no longer be operated on. +committer.release(); +~committer; // Destructor +``` + +## Closure + +```c++ +#include + +using anyflow::Closure; + +// Closure is obtained via Graph::run and is used to track the execution state. +Closure closure = graph->run(data); + +// Check whether the execution is complete, i.e., the target data has been solved, or the execution has failed. +// 'Note' that even if the data is 'finished', there may still be residual nodes running. +bool finished = closure.finished(); + +// Get the return code. 0 indicates successful execution, and non-zero indicates failure. +// This only makes sense if finished is true. +int error_code = closure.error_code(); + +// Block and wait until finished, and retrieve the return code. +int error_code = closure.get(); + +// Wait until all the nodes triggered by the execution have completed. The wait will be automatically called upon destruction. +// Only after closure.wait() returns, the execution of the graph is fully finished, allowing for its destruction or reset for reuse. +closure.wait(); +~closure; // Destructor automatically calls wait + +// Asynchronously wait for the execution result. When finished, the callback will be called. +// Once on_finish is registered, the closure object becomes invalid and will not wait during destruction. +closure.on_finish([graph = std::move(graph)] (Closure&& finished_closure) { + // The passed finished_closure is equivalent to the closure that invoked on_finish. + // It can be reused to track execution. + // For instance, typically the graph can only be destroyed after all triggered nodes are finished running, so in the example, wait is called before destroying the captured graph. + finished_closure.wait(); + graph.reset(nullptr); + // Note: In practice, this is unnecessary. After the callback finishes, the framework ensures that finished_closure is destructed before the lambda. + // Therefore, the order is correct as the graph is captured within the lambda. +}); +``` diff --git a/docs/anyflow/graph.md b/docs/anyflow/graph.zh-cn.md similarity index 99% rename from docs/anyflow/graph.md rename to docs/anyflow/graph.zh-cn.md index a1c788d5..20c758bb 100644 --- a/docs/anyflow/graph.md +++ b/docs/anyflow/graph.zh-cn.md @@ -1,3 +1,5 @@ +**[[English]](graph.en.md)** + # graph ## Graph diff --git a/docs/anyflow/overview.en.md b/docs/anyflow/overview.en.md new file mode 100644 index 00000000..1a31810a --- /dev/null +++ b/docs/anyflow/overview.en.md @@ -0,0 +1,59 @@ +**[[简体中文]](overview.zh-cn.md)** + +# Overview + +## Basic Concepts + +![anyflow logic](images/anyflow_logic.png) + +Conceptually, an execution graph consists of two types of entities: **data nodes** and **compute nodes**, which are connected by two types of relationships: **dependencies** and **outputs**. + +- **Data Nodes**: These hold data, such as a string, an integer, or an object. Each data node can only have one compute node that **outputs** to it. Data nodes without an output from any compute node are typically input nodes of the entire graph and are injected from external sources. + +- **Compute Nodes**: These hold a processing function, **depend** on multiple data nodes for input, and **output** their results to multiple data nodes after computation. + +## Graph Construction + +![Graph Builder](images/builder.png) + +Graph construction occurs in two stages: **build time** and **run time**. + +- **Build Time**: The graph structure is mutable, and nodes can be added or configured by calling graph-building APIs. + +- **Run Time**: Once the graph is finalized, its structure becomes immutable, and no further changes can be made. At this stage, the graph can be used repeatedly by building instances that can be executed. + +- **Execution Lifecycle**: A graph instance manages its execution state and needs to be used exclusively. After execution is complete, it can be reset and reused. To support concurrent execution, graph instances are typically pooled. + +## Execution Process + +![Running Flow](images/running.png) + +The execution process consists of two main actions: **activation** and **execution**. + +- **Activation**: The graph's execution starts by activating certain data nodes. Once a data node is activated, its **output** nodes and their **dependent** data nodes are recursively activated until all dependencies are satisfied. After activation, nodes with all dependencies ready enter the **execution** phase, which can occur in parallel. + +- **Execution**: The result of a compute node's execution is reflected by sending outputs to downstream data nodes. When all of a compute node's dependent data nodes are ready, the node will start **executing**. + +## Data Structure + +![Structure](images/structure.png) + +- **GraphBuilder** & **Graph**: The `GraphBuilder` holds the static structure of the graph. Each time a graph is built, it creates a runtime instance. The runtime graph references some static information from the `GraphBuilder`, so their lifecycles must be managed carefully. The runtime graph contains dynamic activation and execution states, as well as the actual data nodes. Each runtime instance must be used exclusively, typically managed through a pooling mechanism. + +- **GraphVertex**, **GraphDependency**, and **GraphData**: These represent compute nodes, data nodes, and their relationships. They store activation and execution states, as well as the actual data. + +- **GraphProcessor**: Each `GraphVertex` has an exclusive `GraphProcessor`, allowing it to safely use member variables to manage intermediate data during computation. + +## Lock-Free DAG Derivation Algorithm + +![Concurrent DAG](images/concurrent_dag.png) + +In typical DAG execution, a global lock is used to establish a critical section, where state changes are performed. The `GraphEngine` employs a specialized design to avoid creating this critical section, theoretically improving concurrent DAG execution efficiency and supporting finer-grained, highly parallel DAG designs. + +- **Dependency Counting**: Each dependency relationship maintains a state counter. Activation and execution are expressed through atomic state changes on this counter, and state transitions are determined based on the target value after the change. + +- **Unconditional Dependencies**: Activation increments the counter by 1, and data readiness decrements it by 1. When any operation results in a value of 0, the dependency is considered ready. + +- **Conditional Dependencies**: These are more complex because they can result in either a true or false state once ready. Activation increments the counter by 2, and readiness decrements it by 1. If the condition is true, readiness decrements the counter by another 1; if false, it decrements by 2. The final state after readiness will always be 0, while the final state after activation will be either -1 or 0, and the transition only triggers once. + +This design ensures that activation and execution are derived without duplication or omission. The only operations are atomic state changes on individual dependencies, enabling high concurrency scalability. diff --git a/docs/anyflow/overview.md b/docs/anyflow/overview.zh-cn.md similarity index 99% rename from docs/anyflow/overview.md rename to docs/anyflow/overview.zh-cn.md index 1b8638fd..e6315f28 100644 --- a/docs/anyflow/overview.md +++ b/docs/anyflow/overview.zh-cn.md @@ -1,3 +1,5 @@ +**[[English]](overview.en.md)** + # 概览 ## 基础概念 diff --git a/docs/anyflow/processor.en.md b/docs/anyflow/processor.en.md new file mode 100644 index 00000000..1e150129 --- /dev/null +++ b/docs/anyflow/processor.en.md @@ -0,0 +1,354 @@ +**[[简体中文]](processor.zh-cn.md)** + +# processor + +## config + +```c++ +#include "babylon/anyflow/vertex.h" + +using babylon::anyflow::GraphProcessor; +using babylon::Any; + +class DemoProcessor : public GraphProcessor { + // Node configuration pre-processing function, called during the GraphBuilder::finish stage + // It will only be called once, even for multiple Graph instances. Defined as a const interface since it only deals with configuration pre-processing, without setting the state of each specific GraphProcessor instance. + // Default: The default implementation of the virtual function directly forwards origin_option. + virtual int config(const Any& origin_option, Any& option) const noexcept override { + // Retrieve the actual configuration type + const T* conf = origin_option.get(); + + ... // Perform pre-processing, such as loading dictionary models, etc. + + // Finally, the pre-processing result can be set as the final configuration. The result can be shared across different GraphProcessor instances. + option = some_shared_static_data; + // Return 0 for success, otherwise, GraphBuilder::finish will fail. + return 0; + } +}; +``` + +## ANYFLOW_INTERFACE + +### DEPEND & EMIT + +```c++ +#include "babylon/anyflow/vertex.h" + +using babylon::anyflow::GraphProcessor; + +class DemoProcessor : public GraphProcessor { + ANYFLOW_INTERFACE( + // Declare a member variable of type const T* a. + // Accepts input set by GraphVertexBuilder's named_depend("a"). + // During GraphProcessor::process execution, it guarantees a usable pointer to a. + ANYFLOW_DEPEND_DATA(T, a) + // Declare a member variable ::anyflow::OutputData x. + // Binds to output set by GraphVertexBuilder's named_emit("x"). + // During GraphProcessor::process execution, the output publisher can be accessed via x.emit(). + ANYFLOW_EMIT_DATA(T, x) + ); +}; +``` + +### MUTABLE + +```c++ +#include "babylon/anyflow/vertex.h" + +using babylon::anyflow::GraphProcessor; + +class DemoProcessor : public GraphProcessor { + ANYFLOW_INTERFACE( + // Declare a member variable T* a. + // Accepts input set by GraphVertexBuilder's named_depend("a"). + // During GraphProcessor::process execution, it guarantees a usable pointer to a. + // + // The difference from ANYFLOW_DEPEND_DATA is that it checks whether other GraphProcessors also depend on the same input. + // If there are other dependencies, it considers a risk of competing modifications, and the framework will refuse to run. + ANYFLOW_DEPEND_MUTABLE_DATA(T, a) + ); +}; +``` + +### CHANNEL + +```c++ +#include "babylon/anyflow/vertex.h" + +using babylon::anyflow::GraphProcessor; + +class DemoProcessor : public GraphProcessor { + ANYFLOW_INTERFACE( + // Declare a ChannelConsumer a member variable. + // Accepts input set by GraphVertexBuilder's named_depend("a"). + // During GraphProcessor::process execution, data can be consumed step by step using a.consume for read-only input const T*. + // One data set can be consumed by multiple GraphProcessors, with each processor handling the entire data set independently. + ANYFLOW_DEPEND_CHANNEL(T, a) + // Declare a MutableChannelConsumer b member variable. + // Accepts input set by GraphVertexBuilder's named_depend("b"). + // During GraphProcessor::process execution, mutable input data T* can be consumed step by step using b.consume. + // + // The difference from ANYFLOW_DEPEND_CHANNEL is that it checks for unique consumption of mutable data. + // If the dependency source is shared, it considers a risk of competing modifications, and the framework will refuse to run. + ANYFLOW_DEPEND_MUTABLE_CHANNEL(T, b) + // Declare an OutputChannel x member variable. + // Accepts output set by GraphVertexBuilder's named_emit("x"). + // During GraphProcessor::process execution, data can be published step by step by opening the channel through x.open(). + ANYFLOW_EMIT_CHANNEL(T, x) + ); +}; +``` + +## setup + +```c++ +#include "babylon/anyflow/vertex.h" + +using babylon::anyflow::GraphProcessor; + +class DemoProcessor : public GraphProcessor { + // Initialization function + // Each specific GraphProcessor instance for the same node will be called once to complete its own state setup. + // Default: No operation. + virtual int setup() noexcept override { + // Retrieve the actual configuration type. + // If the config function is overridden, T will be the result processed by the config function. + const T* conf = option(); + + ... // Perform initialization, such as creating working buffers. + ... // The set members can be repeatedly used in the process. + + // Output data x will perform a "reset" action after each execution. + // If T::clear or T::reset exists, they will be used for the reset. + // Otherwise, the default reset action is destruction and reconstruction. + // If special reset and reuse behavior is needed, a custom reset function can be registered. + x.set_on_reset([] (T* value) { + ... // Custom method to reset and clear a value. + }); + + // Return 0 for success, otherwise, GraphBuilder::build will fail. + return 0; + } + + ANYFLOW_INTERFACE( + ANYFLOW_EMIT_DATA(T, x) + ); +}; +``` + +## reset + +```c++ +#include "babylon/anyflow/vertex.h" + +using babylon::anyflow::GraphProcessor; + +class DemoProcessor : public GraphProcessor { + // Reset function, called during Graph::reset to clear the state of GraphProcessor to be reusable. + // Default: No operation. + virtual void reset() noexcept override { + // Perform necessary workspace cleanup to ensure readiness for further processing. + string.clear(); + } + + ::std::vector string; +}; +``` + +## process + +### basic + +```c++ +#include "babylon/anyflow/vertex.h" + +using babylon::anyflow::GraphProcessor; + +class DemoProcessor : public GraphProcessor { + // Actual processing function to get inputs and calculate the outputs. + virtual int process() noexcept override { + // Configuration information can also be retrieved during processing. + const T* conf = option(); + + // Before calling process, all inputs are guaranteed to be ready. + // Here, const T* a is guaranteed to be correctly filled and ready for use. + const T& value = *a; + + // Dependencies declared with ANYFLOW_DEPEND_MUTABLE_DATA are modifiable. + // However, there must be only one consumer to ensure safety. + T& value = *b; + + // Complete the function logic using other custom member variables. + ... + + // The simplest copy/move output; after this line, x is published. + *x.emit() = result; + *x.emit() = std::move(result); + + // For more fine-grained control, the committer can manage the output lifecycle and publishing timing. + { + auto committer = x.emit(); + // The committer can be used like a T* to call T::func_of_data. + committer->func_of_data(); + + ... // Further construction of output data. + } // Upon destruction, x is automatically published. + + // Alternatively, the committer's publishing timing can be manually controlled. + auto committer = x.emit(); + ... // Build output data. + // Manually publish data before destruction. + committer.release(); + + return 0; + } + + ANYFLOW_INTERFACE( + ANYFLOW_DEPEND_DATA(T, a) + ANYFLOW_DEPEND_MUTABLE_DATA(T, b) + ANYFLOW_EMIT_DATA(T, x) + ); +}; +``` + +### memory pool + +```c++ +#include "babylon/anyflow/vertex.h" + +using babylon::anyflow::GraphProcessor; + +class DemoProcessor : public GraphProcessor { + virtual int process() noexcept override { + // Use new T(args...) to create a T instance and register it for destruction during Graph::reset. + T* instance = create_object(args...); + + // If T is a protobuf message, a specialized Arena-based construction will be used. + // Arena memory will be cleaned up during Graph::reset. + M* message = create_object(); + + // If T is a container from std::pmr::, it will be constructed using std::pmr::polymorphic_allocator. + // The underlying memory resource will be cleaned up during Graph::reset. + auto* vec = create_object>(); + + // For pre-C++17, when the memory resource mechanism is unavailable, + // T can use babylon::SwissAllocator to achieve the same effect as pmr, + // potentially avoiding the virtual function overhead of pmr for a slight performance gain. + auto* vec = create_object>>(); + + return 0; + } +}; +``` + +### reference output + +```c++ +#include "babylon/anyflow/vertex.h" + +using babylon::anyflow::GraphProcessor; + +class DemoProcessor : public GraphProcessor { + virtual int process() noexcept override { + ... + + // Input data can be forwarded directly without copying; after publishing, x and a refer to the same underlying data. + // Reference publishing will record the const state of the data; if const data is forwarded, downstream processors cannot declare it as MUTABLE, even with unique dependencies. + x.emit().ref(*a); + + // Non-const data can be published as a reference, allowing downstream processors to declare it as MUTABLE and access a mutable pointer. + x.emit().ref(*b); + + // Besides input data, reference publishing can be applied to any data whose lifetime exceeds the current execution. + // For example, it can be applied to static constants or member variables of GraphProcessor, etc. + x.emit().ref(local_value); + + return 0; + } + + ANYFLOW_INTERFACE( + ANYFLOW_DEPEND_DATA(T, a) + ANYFLOW_DEPEND_MUTABLE_DATA(T, b) + ANYFLOW_EMIT_DATA(T, x) + ); +}; +``` + +### channel + +```c++ +#include "babylon/anyflow/vertex.h" + +using babylon::anyflow::GraphProcessor; +using babylon::anyflow::ChannelPublisher; + +class DemoPublishProcessor : public GraphProcessor { + using Iterator = ChannelPublisher::Iterator; + + virtual int process() noexcept override { + ... + + // To publish data, you first need to open the channel and get the publisher. + // Once the channel is opened, data is considered ready, and downstream processes will be triggered to start processing. + // This allows upstream publishing and downstream consumption to occur simultaneously. + auto publisher = x.open(); + + // Publish a single data copy or move. + publisher.publish(value); + publisher.publish(std::move(value)); + + // Batch data publishing. + publisher.publish_n(4, [] (Iterator iter, Iterator) { + // Batch publish data. + *iter++ = v1; + *iter++ = v2; + *iter++ = v3; + *iter++ = v4; + }); + + // You can call close to end the publishing. After closing, downstream processes will receive the corresponding signal and end their own processing. + // When the publisher is destructed, close will be called automatically, so you can control the publishing end through the publisher's lifecycle. + publisher.close(); + + return 0; + } + + ANYFLOW_INTERFACE( + ANYFLOW_EMIT_CHANNEL(T, x) + ); +}; + +class DemoConsumeProcessor : public GraphProcessor { + virtual int process() noexcept override { + ... + + // Upon execution, the upstream has already completed the open action, so you can start consuming the data step by step. + + // Consume 1 piece of data. If no new data is available, it will block and wait. + // When the upstream calls ChannelPublisher::close, consume will return nullptr. + const T* = a.consume(); + T* = b.consume(); + + // Batch consumption of data, it will block until a full batch is available or the upstream ends publishing. + auto range = a.consume(4); + auto range = b.consume(4); + // You can use size to get the actual size of the range and access the data via an index. + for (size_t i = 0; i < range.size(); ++i) { + // Only MUTABLE dependencies can consume MUTABLE data. + const T& value = range[i]; + T& value = range[i]; + } + if (range.size() < 4) { + // If the specified batch size cannot be met, it indicates that the upstream has ended, and you can exit the processing process. + } + + return 0; + } + + ANYFLOW_INTERFACE( + ANYFLOW_DEPEND_CHANNEL(T, a) + ANYFLOW_DEPEND_MUTABLE_CHANNEL(T, b) + ); +}; +``` diff --git a/docs/anyflow/processor.md b/docs/anyflow/processor.zh-cn.md similarity index 99% rename from docs/anyflow/processor.md rename to docs/anyflow/processor.zh-cn.md index 603bc283..9b23209f 100644 --- a/docs/anyflow/processor.md +++ b/docs/anyflow/processor.zh-cn.md @@ -1,3 +1,5 @@ +**[[English]](processor.en.md)** + # processor ## config diff --git a/docs/anyflow/quick_start.en.md b/docs/anyflow/quick_start.en.md new file mode 100644 index 00000000..6c5d0d85 --- /dev/null +++ b/docs/anyflow/quick_start.en.md @@ -0,0 +1,75 @@ +**[[简体中文]](quick_start.zh-cn.md)** + +# Quick Start + +## First AnyFlow Program + +Here’s a simple example of an AnyFlow program that performs addition using the framework: + +```c++ +#include "babylon/anyflow/builder.h" + +using babylon::anyflow::GraphBuilder; +using babylon::anyflow::GraphProcessor; + +// Implement a simple processor for addition by extending the base class +class PlusProcessor : public GraphProcessor { + // Implement the core computation + int process() noexcept override { + *z.emit() = *x + *y; // Add inputs x and y, then emit the result to z + return 0; + } + + // Define the function interface + // Automatically generates: + // - int* x: Input for the first operand + // - int* y: Input for the second operand + // - OutputData z: Output that holds the result of x + y + // The interface provides reflection capabilities, allowing "x", "y", and "z" to be referenced by their names during graph construction. + ANYFLOW_INTERFACE( + ANYFLOW_DEPEND_DATA(int, x) + ANYFLOW_DEPEND_DATA(int, y) + ANYFLOW_EMIT_DATA(int, z) + ) +}; + +int main(int, char**) { + // Initialize the graph builder + GraphBuilder builder; + + // Create a graph node + { + // Specify the processor factory for the node (PlusProcessor) + auto& v = builder.add_vertex([] { + return std::make_unique(); + }); + + // Bind the inputs "x" and "y" to data nodes "A" and "B" respectively + // Bind the output "z" to data node "C" + v.named_depend("x").to("A"); + v.named_depend("y").to("B"); + v.named_emit("z").to("C"); + } + + // Finish building the graph structure + builder.finish(); + + // Create a runtime instance of the graph + auto graph = builder.build(); + + // Access the graph data nodes for external manipulation + auto* a = graph->find_data("A"); + auto* b = graph->find_data("B"); + auto* c = graph->find_data("C"); + + // Set initial values for inputs A and B + *(a->emit()) = 1; + *(b->emit()) = 2; + + // Execute the graph for the data node C, using A and B as dependencies + graph->run(c); + + // The result should be 3 + return *c->value(); +} +``` diff --git a/docs/anyflow/fast_begin.md b/docs/anyflow/quick_start.zh-cn.md similarity index 97% rename from docs/anyflow/fast_begin.md rename to docs/anyflow/quick_start.zh-cn.md index b99d3678..61b1257a 100644 --- a/docs/anyflow/fast_begin.md +++ b/docs/anyflow/quick_start.zh-cn.md @@ -1,3 +1,5 @@ +**[[English]](quick_start.en.md)** + # 快速开始 ## 第一个anyflow程序 diff --git a/docs/application_context.en.md b/docs/application_context.en.md new file mode 100644 index 00000000..288e6f74 --- /dev/null +++ b/docs/application_context.en.md @@ -0,0 +1,118 @@ +**[[English]](application_context.en.md)** + +# application_context + +## Principle + +![](images/application_context_1.png) + +ApplicationContext: This is the core IOC container, typically implemented as a global singleton. If isolated component spaces are required, multiple containers can be created. It provides two API interfaces for external use: **register component** and **retrieve component**. + +Register component: Registers a specific type of component into the container under a given name. A component can be registered with multiple base types simultaneously, allowing the user to retrieve the component using any of its base types. + +Retrieve component: Retrieves a component from the container by its name and type. Components can share a reference through the **singleton pattern** or create independent instances via the **factory pattern**. **Component creation** is handled by the component itself, making the IOC model transparent to the user. + +![](images/application_context_2.png) + +Component creation: When creating an instance, a component can retrieve its dependencies from the ApplicationContext, triggering a new retrieval process and continuing recursively until the entire dependency tree is resolved. Component creation and dependency resolution are implemented using **static reflection**. + +Static reflection: The target component type does not need to inherit from a predefined framework base class. Instead, customization is achieved by checking whether the type provides a **protocol function**. The two protocol mechanisms for customization are **initialization** and **autowiring**. + +Initialization: The initialization process is customized using the protocol function `initialize`. Four function signatures are supported. The passed-in ApplicationContext can be used to address further dependencies, and Any provides the necessary configuration for initialization. Returning `0` indicates success. +```c++ +int initialize(ApplicationContext&, const Any&); +int initialize(ApplicationContext&); +int initialize(const Any&); +int initialize(); +``` + +Autowiring: For dependencies that do not require dynamic configuration, autowiring can be simplified using the macro `BABYLON_AUTOWIRE` for declarative implementation. +```c++ +class T { + ... + BABYLON_AUTOWIRE( + BABYLON_MEMBER(DependType, _member1, "component name1") + BABYLON_MEMBER(DependType, _member2, "component name2") + ) +}; +``` + +## Usage + +### Implementing a component + +```c++ +#include "babylon/application_context.h" + +using ::babylon::ApplicationContext; + +// Implementing a component +class SomeComponent : public SomeBase, public SomeOtherBase { // No need to inherit any framework-specific base class + ... // Arbitrary code + + // [Optional] Define an initialization function + // This can be used to initialize the component with configuration + // and programmatically assemble dependencies + int initialize(ApplicationContext& context, const Any& option) { + // Retrieve other components by name and type + auto other_component_pointer = context.get("OtherComponentName"); + // Retrieve configuration (e.g., from YAML in production) + auto config_pointer = option.get(); + ... // Additional initialization steps + } + + // [Optional] Declare dependency autowiring + // If both autowiring and an initialization function are defined, autowiring runs first + BABYLON_AUTOWIRE( + // Define a member + // ApplicationContext::ScopedComponent _member_name; + // And assemble as follows + // _member_name = context.get_or_create("OtherComponentName"); + BABYLON_MEMBER(OtherComponentType, _member_name, "OtherComponentName") + ... // More autowiring + ) +}; + +// Register the component with the singleton ApplicationContext +BABYLON_REGISTER_COMPONENT(SomeComponent); // Register by type +BABYLON_REGISTER_COMPONENT(SomeComponent, "SomeComponentName"); // Register with a name +BABYLON_REGISTER_COMPONENT(SomeComponent, "SomeComponentName", SomeBase, SomeOtherBase, ...); // Register with a set of base types + +// Register with factory mode, disabling singleton use +BABYLON_REGISTER_FACTORY_COMPONENT(SomeComponent); // Register by type +BABYLON_REGISTER_FACTORY_COMPONENT(SomeComponent, "SomeComponentName"); // Register with a name +BABYLON_REGISTER_FACTORY_COMPONENT(SomeComponent, "SomeComponentName", SomeBase, SomeOtherBase, ...); // Register with a set of base types + +// Dynamic registration during program startup is also possible +ApplicationContext::instance().register_component( + ApplicationContext::DefaultComponentHolder::create(), + "SomeComponentName"); +``` + +### Retrieving a component + +```c++ +#include "babylon/application_context.h" + +using ::babylon::ApplicationContext; + +// Singleton mode, returns SomeComponent* +// If it does not exist or fails to initialize, returns nullptr +instance = ApplicationContext::instance().get(); // Can be used if only one component of this type exists in the ApplicationContext +instance = ApplicationContext::instance().get("SomeComponentName"); // If multiple components of the same type exist, use the name to distinguish + +// Factory mode, returns ApplicationContext::ScopedComponent +// This is essentially a std::unique_ptr with a specialized deleter +// If it does not exist or fails to initialize, returns null +instance = ApplicationContext::instance().create(); +instance = ApplicationContext::instance().create("SomeComponentName"); + +// Compatibility mode, returns ApplicationContext::ScopedComponent +// Attempts singleton mode first; if singleton mode is disabled, falls back to factory mode +// If it does not exist or fails to initialize, returns null +instance = ApplicationContext::instance().get_or_create(); +instance = ApplicationContext::instance().get_or_create("SomeComponentName"); + +// For more details, see the comments +// Unit test: test/test_application_context.cpp +``` diff --git a/docs/application_context.md b/docs/application_context.zh-cn.md similarity index 99% rename from docs/application_context.md rename to docs/application_context.zh-cn.md index 3e87df1c..d71fea46 100644 --- a/docs/application_context.md +++ b/docs/application_context.zh-cn.md @@ -1,3 +1,5 @@ +**[[English]](application_context.en.md)** + # application_context ## 原理 diff --git a/docs/arenastring.en.md b/docs/arenastring.en.md new file mode 100644 index 00000000..1d4dd10e --- /dev/null +++ b/docs/arenastring.en.md @@ -0,0 +1,43 @@ +**[[简体中文]](arenastring.zh-cn.md)** + +# Arena String + +## Principle + +In version 3.x and beyond, Google's Protocol Buffer serialization library introduced Arena allocation, which allows the dynamic memory of members in the generated `Message` subclasses to be collectively allocated within a single Arena. For complex structures, this reduces the frequency of dynamic memory allocations, minimizes global contention, and significantly reduces or even eliminates destructor overhead. + +However, the `string` type fields, expressed using `std::string`, cannot take full advantage of Arena-based memory management. This limitation prevents the Arena mechanism from being fully optimized in certain scenarios. See [protobuf/issues/4327](https://github.com/protocolbuffers/protobuf/issues/4327). Internally, Google implemented a hack to specialize `std::string` for this purpose, although this was not released in the open-source version. From the undocumented Donated String mechanism, we can infer their approach. See the related interfaces and comments in [google/protobuf/inlined_string_field.h](https://github.com/protocolbuffers/protobuf/blob/main/src/google/protobuf/inlined_string_field.h) and [google/protobuf/arenastring.h](https://github.com/protocolbuffers/protobuf/blob/main/src/google/protobuf/arenastring.h). + +Using the exposed internal interfaces, Babylon implemented this hack via patching. The implementation simulates `std::string` and supports the two major standard libraries, GNU `libstdc++` and LLVM `libc++`. The patch is not compatible with other `std::string` implementations, but it covers most mainstream production environments. + +## Specific Changes + +1. Based on the `ArenaStringPtr/InlinedStringField` DonatedString mechanism, string/bytes fields can now be allocated on Arena when using Arena. +2. Note: According to the DonatedString mechanism, when returning a `std::string*` for user operations, the string is first copied back into a proper `std::string` to ensure the returned instance can be operated on correctly. +3. Introduced the `-DPROTOBUF_MUTABLE_DONATED_STRING` flag, which modifies the return value of `RepeatedPtrField/ExtensionSet`'s `Add/Mutable` interfaces to `MaybeArenaStringAccessor` when enabled. +4. Added the `cc_mutable_donated_string = true` option. When enabled, the `add/mutable` interface for `string/bytes` fields in generated Messages returns a `MaybeArenaStringAccessor`. +5. `MaybeArenaStringAccessor` simulates the access interface of `std::string*` and ensures that reallocation still uses Arena, avoiding dynamic allocation and copying. +6. DonatedString allocation supports: + - `string/bytes` fields + - Repeated `string/bytes` fields + - `anyof string/bytes` fields + - Extension fields +7. DonatedString allocation does not support: + - Map fields + - Unknown fields + These continue to use the default `std::string` allocation mechanism. + +## Application Method + +The patch is versioned and maintained in the built-in repository and can be directly used with the [bzlmod](https://bazel.build/external/module) mechanism: + +- Add the repository registry: +``` +# in .bazelrc +common --registry=https://baidu.github.io/babylon/registry +``` +- Apply the patch dependency: +``` +# in MODULE.bazel +bazel_dep(name = 'protobuf', version = '27.5.arenastring') +``` diff --git a/docs/arenastring.md b/docs/arenastring.zh-cn.md similarity index 96% rename from docs/arenastring.md rename to docs/arenastring.zh-cn.md index 2c0275b0..9c64ffbc 100644 --- a/docs/arenastring.md +++ b/docs/arenastring.zh-cn.md @@ -1,3 +1,5 @@ +**[[English]](arenastring.en.md)** + # arenastring ## 原理 @@ -29,5 +31,5 @@ common --registry=https://baidu.github.io/babylon/registry - 应用补丁依赖项 ``` # in MODULE.bazel -bazel_dep(name = 'protobuf', version = '27.3.arenastring') +bazel_dep(name = 'protobuf', version = '27.5.arenastring') ``` diff --git a/docs/concurrent/README.en.md b/docs/concurrent/README.en.md new file mode 100644 index 00000000..4acdfea0 --- /dev/null +++ b/docs/concurrent/README.en.md @@ -0,0 +1,18 @@ +**[[简体中文]](README.zh-cn.md)** + +# concurrent + +Thread-safe containers that support multi-threaded parallel operations. + +- [bounded_queue](bounded_queue.en.md) +- [counter](counter.en.md) +- [deposit_box](deposit_box.en.md) +- [epoch](epoch.en.md) +- [execution_queue](execution_queue.en.md) +- [garbage_collector](garbage_collector.en.md) +- [id_allocator](id_allocator.en.md) +- [object_pool](object_pool.en.md) +- [thread_local](thread_local.en.md) +- [transient_hash_table](transient_hash_table.en.md) +- [transient_topic](transient_topic.en.md) +- [vector](vector.en.md) diff --git a/docs/concurrent/README.md b/docs/concurrent/README.md new file mode 120000 index 00000000..b636b478 --- /dev/null +++ b/docs/concurrent/README.md @@ -0,0 +1 @@ +README.zh-cn.md \ No newline at end of file diff --git a/docs/concurrent/README.zh-cn.md b/docs/concurrent/README.zh-cn.md new file mode 100644 index 00000000..3e6ada86 --- /dev/null +++ b/docs/concurrent/README.zh-cn.md @@ -0,0 +1,18 @@ +**[[English]](README.en.md)** + +# concurrent + +支持多线程并行操作的线程安全容器 + +- [bounded_queue](bounded_queue.zh-cn.md) +- [counter](counter.zh-cn.md) +- [deposit_box](deposit_box.zh-cn.md) +- [epoch](epoch.zh-cn.md) +- [execution_queue](execution_queue.zh-cn.md) +- [garbage_collector](garbage_collector.zh-cn.md) +- [id_allocator](id_allocator.zh-cn.md) +- [object_pool](object_pool.zh-cn.md) +- [thread_local](thread_local.zh-cn.md) +- [transient_hash_table](transient_hash_table.zh-cn.md) +- [transient_topic](transient_topic.zh-cn.md) +- [vector](vector.zh-cn.md) diff --git a/docs/concurrent/bounded_queue.en.md b/docs/concurrent/bounded_queue.en.md new file mode 100644 index 00000000..d972bc13 --- /dev/null +++ b/docs/concurrent/bounded_queue.en.md @@ -0,0 +1,63 @@ +**[[简体中文]](bounded_queue.zh-cn.md)** + +# Bounded Queue + +## Principle + +An MPMC queue implemented on a circular array, based on the principles outlined in GlobalBalancer, featuring the following characteristics: + +1. The publish operation is wait-free when the queue is not full. +2. The consume operation is wait-free when the queue is not empty. +3. For blocking publish and consume operations, subsequent actions are wait-free after recovery from blocking. + +Here are some comparative evaluations of the actual performance. + +## Usage Example + +```c++ +#include + +using ::babylon::ConcurrentBoundedQueue; +// Explicitly define a queue +using Queue = ConcurrentBoundedQueue<::std::string>; +Queue queue; + +// Set the queue capacity; pushing will block if the capacity is exceeded +queue.reserve_and_clear(1024); + +// Single element push +queue.push("10086"); +// The callback version will invoke the user function to fill the data after obtaining publish rights +// However, be careful not to perform time-consuming operations in the callback, as the underlying slot will not be released until the callback function returns +queue.push([] (::std::string& target) { + target.assign("10086"); +}); + +// Batch push +queue.push_n(vec.begin(), vec.end()); +// The callback version will invoke the user function to fill the data after obtaining publish rights +// Note that the callback function may be called multiple times; do not assume the operable data range within a single callback +queue.push_n([] (Queue::Iterator iter, Queue::Iterator end) { + while (iter < end) { + *iter++ = "10086"; + } +}, push_num); + +// Single element pop +queue.pop(&str); +// The callback version will invoke the user function to process the data after obtaining consume rights +// However, be careful not to perform time-consuming operations in the callback, as the underlying slot will not be released until the callback function returns +queue.pop([] (::std::string& source) { + work_on_source(source); +}); + +// Batch pop +queue.pop_n(vec.begin(), vec.end()); +// The callback version will invoke the user function to process the data after obtaining publish rights +// Note that the callback function may be called multiple times; do not assume the operable data range within a single callback +queue.push_n([] (Queue::Iterator iter, Queue::Iterator end) { + while (iter < end) { + work_on_source(*iter); + } +}, pop_num); +``` diff --git a/docs/concurrent/bounded_queue.md b/docs/concurrent/bounded_queue.zh-cn.md similarity index 95% rename from docs/concurrent/bounded_queue.md rename to docs/concurrent/bounded_queue.zh-cn.md index e982819f..af5a642e 100644 --- a/docs/concurrent/bounded_queue.md +++ b/docs/concurrent/bounded_queue.zh-cn.md @@ -1,3 +1,5 @@ +**[[English]](bounded_queue.en.md)** + # bounded_queue ## 原理 @@ -58,7 +60,4 @@ queue.push_n([] (Queue::Iterator iter, Queue::Iterator end) { work_on_source(*iter); } }, pop_num); - -// 单测test/test_concurrent_bounded_queue.cpp -// 压测bench/bench_concurrent_queue.cpp ``` diff --git a/docs/concurrent/counter.en.md b/docs/concurrent/counter.en.md new file mode 100644 index 00000000..66ec7491 --- /dev/null +++ b/docs/concurrent/counter.en.md @@ -0,0 +1,56 @@ +**[[简体中文]](counter.zh-cn.md)** + +# Counter + +## Principle + +A high-concurrency counter that specializes in a write-heavy, read-light model based on `thread_local`. Each thread's operations on the counter are effective independently in its TLS. The final result is collected only during read operations, which helps to alleviate contention caused by multiple threads accessing the same memory area through TLS isolation. + +## Usage Example + +```c++ +#include + +using babylon::ConcurrentAdder; +using babylon::ConcurrentMaxer; +using babylon::ConcurrentSummer; + +// Construct an adder; the accumulated result is a signed 64-bit number, initialized to 0 +ConcurrentAdder var; +// Multi-threaded addition and subtraction +thread1: + var << 100; +thread2: + var << -20; +// Get the current accumulated result +var.value(); // == 80 + +// Construct a maximum value recorder; the result is a signed 64-bit number +ConcurrentMaxer var; +// Get the recorded maximum value; returns 0 if no records exist during the period +ssize_t v = var.value(); +// Get the recorded maximum value; returns false if no records exist during the period +var.value(v); +// Multi-threaded recording +thread1: + var << 100; +thread2: + var << -20; +// Get the current recorded maximum value +var.value(); // == 100 +// Clear recorded results and start a new recording period +var.reset(); +var << 50; +var.value(); // == 50 + +// Construct a summer, which provides basic accumulation functionality and can also be used to calculate averages +// The result is a signed 64-bit number, defaulting to 0 +ConcurrentSummer var; +// Multi-threaded summation +thread1: + var << 100; +thread2: + var << -20; +// Get the current sum +var.value(); // {sum: 80, num: 2} +``` diff --git a/docs/concurrent/counter.md b/docs/concurrent/counter.zh-cn.md similarity index 97% rename from docs/concurrent/counter.md rename to docs/concurrent/counter.zh-cn.md index c54cb297..68a26ce2 100644 --- a/docs/concurrent/counter.md +++ b/docs/concurrent/counter.zh-cn.md @@ -1,3 +1,5 @@ +**[[English]](counter.en.md)** + # counter ## 原理 diff --git a/docs/concurrent/deposit_box.en.md b/docs/concurrent/deposit_box.en.md new file mode 100644 index 00000000..0ad1f721 --- /dev/null +++ b/docs/concurrent/deposit_box.en.md @@ -0,0 +1,59 @@ +**[[简体中文]](deposit_box.zh-cn.md)** + +# Deposit Box + +## Principle + +Sometimes we need a design pattern where multiple callers dynamically compete to complete the same piece of logic. Typical implementations include timeout actions (one valid, one invalid) or backup request actions (two valid, first come first served). In principle, the mechanism required for this pattern is very similar to [std::call_once](https://en.cppreference.com/w/cpp/thread/call_once), but there are a few key differences: +- The latecomers do not need to wait for the executor to finish; they can simply abandon their execution. +- Since latecomers do not require any execution actions or results, the executor can release resources for reuse earlier. Accordingly, latecomers must ensure that they do not modify resources that may have already been reused. + +![](images/deposit_box.png) + +This is implemented by organizing actual data with [IdAllocator](id_allocator.en.md) and [ConcurrentVector](vector.en.md), enabling aggregate storage and fast access based on index. In each round, the take action competes for ownership through a CAS increment of the version number, with latecomers touching only the CAS action and not the data part. The version number itself is also stored in the slot of [ConcurrentVector](vector.en.md), ensuring that the latecomer’s CAS action is legitimate, while the monotonic increase characteristic of the version number eliminates the ABA problem for latecomers. + +## Usage Example + +### DepositBox + +```c++ +#include + +using ::babylon::DepositBox; + +// Only supports usage through a global singleton +auto& box = DepositBox::instance(); + +// Allocate a slot and construct an element in it using Item(...), returning an id for future competitive retrieval of this element +// Concurrent multiple emplace actions are thread-safe +auto id = box.emplace(...); + +{ + // The retrieval operation can compete to execute on the same id, and is thread-safe + auto accessor = box.take(id); + // Whether the accessor is non-null can determine if it is the first visitor to acquire ownership + // A non-null accessor can further be used to access the element + if (accessor) { + accessor->item_member_function(...); + some_function_use_item(*accessor); + } +} // The accessor destructs, releasing the slot; the element pointer is no longer available + +///////////////////////////// Advanced Usage ///////////////////////////////////// + +// Non-RAII mode, directly returns the element pointer Item* +auto item = box.take_released(id); +if (item) { + item->item_member_function(...); + some_function_use_item(*item); +} +// Ownership must be explicitly released when no longer needed +box.finish_released(id); + +// This does not check or operate on the version number part and directly obtains the element pointer corresponding to the id slot, thus cannot safely be concurrent with take actions +// In some scenarios where elements cannot be fully prepared via the constructor, further incremental operations on the element can be performed after allocating the slot +// However, the user must ensure that the id has not yet been given to a competing visitor for a take action +auto item = box.unsafe_get(id); +item->item_member_function(...); +some_function_use_item(*item); +``` diff --git a/docs/concurrent/deposit_box.md b/docs/concurrent/deposit_box.zh-cn.md similarity index 79% rename from docs/concurrent/deposit_box.md rename to docs/concurrent/deposit_box.zh-cn.md index fa565cbc..bc2437ef 100644 --- a/docs/concurrent/deposit_box.md +++ b/docs/concurrent/deposit_box.zh-cn.md @@ -1,14 +1,16 @@ +**[[English]](deposit_box.en.md)** + # deposit_box ## 原理 -有时我们需要一种多个调用者动态竞争完成相同一段逻辑的设计模式,典型类似实现timeout动作(一个有效,一个无效)或者backup request动作(两个有效,先到先得);从原理上,这种模式需要的机制和[std::call_once](https://en.cppreference.com/w/cpp/thread/call_once)非常类似,但是会有以下几个不同点: +有时我们需要一种多个调用者动态竞争完成相同一段逻辑的设计模式,典型类似实现timeout动作(一个有效,一个无效)或者backup request动作(两个有效,先到先得);从原理上,这种模式需要的机制和[std::call_once](https://zh.cppreference.com/w/cpp/thread/call_once)非常类似,但是会有以下几个不同点: - 后来者并不需要等待执行者完成,单纯放弃自己的运行即可; - 由于后来者原理上不需要任何执行动作和结果,执行者可以更早释放资源进行复用,相应地后来者需要确保不会修改可能已经被复用的资源; ![](images/deposit_box.png) -采用[IdAllocator](id_allocator.md)和[ConcurrentVector](vector.md)组织实际的数据,实现聚集存储和基于序号的快速访问;每一个轮次中的take动作通过对版本号的CAS自增实现归属权竞争,后来者除CAS动作外不碰触数据部分;版本号自身同样存储在[ConcurrentVector](vector.md)的槽位内部,确保后来者的CAS动作本身合法,而版本号本身的单调递增特性排除了后来者的ABA问题; +采用[IdAllocator](id_allocator.zh-cn.md)和[ConcurrentVector](vector.zh-cn.md)组织实际的数据,实现聚集存储和基于序号的快速访问;每一个轮次中的take动作通过对版本号的CAS自增实现归属权竞争,后来者除CAS动作外不碰触数据部分;版本号自身同样存储在[ConcurrentVector](vector.zh-cn.md)的槽位内部,确保后来者的CAS动作本身合法,而版本号本身的单调递增特性排除了后来者的ABA问题; ## 用法示例 diff --git a/docs/concurrent/epoch.en.md b/docs/concurrent/epoch.en.md new file mode 100644 index 00000000..77401723 --- /dev/null +++ b/docs/concurrent/epoch.en.md @@ -0,0 +1,60 @@ +**[[简体中文]](epoch.zh-cn.md)** + +# Epoch + +## Principle + +In typical lock-free structure implementations, there are generally two aspects that need to be addressed: +- Reducing the critical section of each concurrent access action to converge into a single CAS or FAA action, ultimately utilizing hardware atomic instructions for lock-free cooperative access. How this is achieved is often directly related to the target data structure and application scenario; most lock-free algorithms focus on this aspect of implementation. +- Since there are no programmable multi-instruction critical sections, in the implementation of "removal" actions, while it can be ensured that concurrent accesses can correctly and completely retrieve the elements "before" or "after" the action, it is impossible to track whether the removed elements are "still held," thus unable to confirm "when they can be released." Most lock-free algorithm descriptions assume that an external environment has implemented automatic identification and release of useless elements, essentially automatic garbage collection. + +Lock-free algorithm garbage collection is less related to data structures and application scenarios and is generally considered as a separate topic. For example, the article [Performance of memory reclamation for lockless synchronization](https://sysweb.cs.toronto.edu/publication_files/0000/0159/jpdc07.pdf) provides a review of some typical lock-free reclamation methods. It mentions several classic reclamation schemes: QSBR (Quiescent-state-based reclamation), EBR (Epoch-based reclamation), and HPBR (Hazard-pointer-based reclamation). + +Epoch is a modified implementation of the EBR mechanism, which focuses on reducing the overhead caused by mechanisms through the separation of access, elimination, and reclamation actions. The main differences are: +- Changing from the classic three-epoch loop to an infinitely increasing one, supporting asynchronous reclamation. +- Accesses not involving elimination no longer advance the epoch, reducing the overhead of a memory barrier. +- Access, elimination, and reclamation actions all support aggregated execution, further reducing the overhead of memory barriers. + +![](images/epoch.png) + +## Usage Example + +```c++ +#include "babylon/concurrent/epoch.h" + +using ::babylon::Epoch; + +// Define an Epoch instance +Epoch epoch; + +// Use Epoch to open a thread-local critical section before accessing the shared structure +{ + std::lock_guard lock {epoch}; + ... // Elements of the shared structure can be accessed within the critical section +} // After the critical section ends, the obtained element pointer can no longer be used + +// In addition to thread-local mode, an Accessor can also be actively created to track the critical section +auto accessor = epoch.create_accessor(); +{ + std::lock_guard lock {accessor}; + ... // Elements of the shared structure can be accessed within the critical section +} // After the critical section ends, the obtained element pointer can no longer be used + +{ + accessor.lock(); + Task task {::std::move(accessor), ...} // The critical section can be transferred by moving the accessor + ... // It can be transferred to an asynchronous thread, etc. The critical section ends after accessor.unlock() +} // If transferred, the critical section will not end here + +// Elimination operation +... // Operate on the shared structure, removing some elements +// Advance the epoch and return the minimum epoch at which the previously removed elements can be reclaimed +auto lowest_epoch = epoch.tick(); + +// The lowest_epoch is associated with the eliminated elements, which can be synchronously or asynchronously checked for actual reclamation +auto low_water_mark = epoch.low_water_mark(); +// When low_water_mark exceeds lowest_epoch, the elements can be safely reclaimed +if (lowest_epoch <= low_water_mark) { + ... // Elements corresponding to lowest_epoch can be reclaimed +} +``` diff --git a/docs/concurrent/epoch.md b/docs/concurrent/epoch.zh-cn.md similarity index 99% rename from docs/concurrent/epoch.md rename to docs/concurrent/epoch.zh-cn.md index bc0a728e..b0bcb973 100644 --- a/docs/concurrent/epoch.md +++ b/docs/concurrent/epoch.zh-cn.md @@ -1,3 +1,5 @@ +**[[English]](epoch.en.md)** + # epoch ## 原理 diff --git a/docs/concurrent/execution_queue.en.md b/docs/concurrent/execution_queue.en.md new file mode 100644 index 00000000..f0690e49 --- /dev/null +++ b/docs/concurrent/execution_queue.en.md @@ -0,0 +1,49 @@ +**[[简体中文]](execution_queue.zh-cn.md)** + +# Execution Queue + +## Principle + +This wraps the [ConcurrentBoundedQueue](bounded_queue.en.md) to implement an on-demand activation model for MPSC (Multi-Producer Single-Consumer) consumers: +- Each time data is produced, an atomic increment is performed on the pending counter. If the value before incrementing is 0, a rising edge is triggered to start the consumer thread. +- The consumer thread continuously consumes data; when there is no data to consume, it exchanges the counter value with 0. +- If the exchanged value remains unchanged from the value recorded before the last consumption, the consumer thread exits. +- Otherwise, it enters the next round of the consumption loop. + +This is mainly used to support scenarios with a large number of low-activity queues, saving inactive listener consumer threads. Its functionality is similar to [bthread::ExecutionQueue](https://github.com/apache/brpc/blob/master/docs/en/execution_queue.md), but: +- Consumers submit through the [Executor](../executor.en.md) interface, supporting the use of custom thread/coroutine mechanisms. +- It maintains wait-free submission for producers without causing consumer blocking during contention. + +![](images/bthread_execution_queue.png) + +## Usage Example + +```c++ +#include "babylon/concurrent/execution_queue.h" + +using ::babylon::ConcurrentExecutionQueue; + +// Explicitly define a queue +using Queue = ConcurrentExecutionQueue; +Queue queue; + +// Set the queue capacity to N +// The consumer uses some_executor for execution +// Register a lambda function for consumption +queue.initialize(N, some_executor, [] (Queue::Iterator iter, Queue::Iterator end) { + // Consume the data in the range + while (iter != end) { + T& item = *iter; + do_sth_with(item); + ++iter; + } +}); + +// Produce a piece of data and start background consumption as needed +queue.execute("10086"); +... + +// Wait for all currently published data to be consumed +// Note that this does not include stop semantics; you can repeatedly execute & join +queue.join(); +``` diff --git a/docs/concurrent/execution_queue.md b/docs/concurrent/execution_queue.zh-cn.md similarity index 85% rename from docs/concurrent/execution_queue.md rename to docs/concurrent/execution_queue.zh-cn.md index 8dd1a27f..cc224f5c 100644 --- a/docs/concurrent/execution_queue.md +++ b/docs/concurrent/execution_queue.zh-cn.md @@ -1,15 +1,17 @@ +**[[English]](execution_queue.en.md)** + # execution_queue ## 原理 -包装[ConcurrentBoundedQueue](bounded_queue.md),实现按需激活MPSC消费者的模式 +包装[ConcurrentBoundedQueue](bounded_queue.zh-cn.md),实现按需激活MPSC消费者的模式 - 每次生产数据时对待处理计数器原子自增,如果自增前为0,边沿触发,启动消费者线程 - 消费者线程持续消费数据,当无数据可消费时,将计数器值和0交换 - 如果交换得到的值和最近一次消费前记录的值无变化,退出消费者线程 - 否则,进入下一轮消费循环 主要用于支持大量低活队列的情况,节省不活跃的监听消费线程,功能和[bthread::ExecutionQueue](https://github.com/apache/brpc/blob/master/docs/cn/execution_queue.md)类似,但是 -- 消费者通过[Executor](../executor.md)接口提交,支持使用自定义线程/协程机制 +- 消费者通过[Executor](../executor.zh-cn.md)接口提交,支持使用自定义线程/协程机制 - 保持生产者wait-free提交的同时,不会在竞争时引起消费者阻塞 ![](images/bthread_execution_queue.png) diff --git a/docs/concurrent/garbage_collector.en.md b/docs/concurrent/garbage_collector.en.md new file mode 100644 index 00000000..d59e6744 --- /dev/null +++ b/docs/concurrent/garbage_collector.en.md @@ -0,0 +1,36 @@ +**[[简体中文]](garbage_collector.zh-cn.md)** + +# Garbage Collector + +## Principle + +Based on the [Epoch](epoch.en.md) mechanism, a classic synchronized reclamation mechanism can be implemented. However, the modifications in Epoch focus more on achieving asynchronous reclamation. The GarbageCollector is the implementation of this asynchronous reclamation scheme. It primarily designs the retire operation interface, which packages reclamation actions into tasks and binds them to the corresponding epoch before placing them into a [ConcurrentBoundedQueue](bounded_queue.en.md) for asynchronous reclamation. An independent asynchronous reclamation thread continuously monitors the current epoch's low water mark, retrieves tasks from the queue, validates them, and ultimately executes the reclamation tasks. + +## Usage Example + +```c++ +#include "babylon/concurrent/garbage_collector.h" + +using ::babylon::Epoch; +using ::babylon::GarbageCollector; + +// Define an instance of GarbageCollector, with the template parameter being the type of the reclamation action instance +// Requires std::invocable(Reclaimer) +GarbageCollector garbage_collector; + +// Adjust the asynchronous queue length; the default length is 1, but it usually needs to be set longer based on actual conditions +// When the asynchronous queue is full, the retire actions will start blocking until the asynchronous reclamation completes +garbage_collector.set_queue_capacity(...); + +// Start the asynchronous reclamation thread +garbage_collector.start(); + +// Execute the retire action, which internally performs an Epoch::tick and binds the reclaimer to the queue +// For bulk reclamation, you can perform an Epoch::tick externally and pass the returned value as lowest_epoch +// Retire actions are not required to be executed within the epoch critical section; they are even encouraged to be executed outside the critical section to avoid deadlock caused by a too-short queue during blocking +garbage_collector.retire(::std::move(reclaimer)); +garbage_collector.retire(::std::move(reclaimer), lowest_epoch); + +// End the asynchronous reclamation thread, waiting for all queued retire tasks to complete reclamation +garbage_collector.stop(); +``` diff --git a/docs/concurrent/garbage_collector.md b/docs/concurrent/garbage_collector.zh-cn.md similarity index 68% rename from docs/concurrent/garbage_collector.md rename to docs/concurrent/garbage_collector.zh-cn.md index 4910073b..0dc51eef 100644 --- a/docs/concurrent/garbage_collector.md +++ b/docs/concurrent/garbage_collector.zh-cn.md @@ -1,8 +1,10 @@ +**[[English]](garbage_collector.en.md)** + # garbage_collector ## 原理 -基于[Epoch](epoch.md)机制,可以实现经典的同步回收机制,不过Epoch的修改更着重于实现异步回收,GarbageCollector是这个异步回收方案的实现;主要设计了retire操作接口,将回收动作打包成任务并绑定对应的epoch之后放入[ConcurrentBoundedQueue](bounded_queue.md)等待异步回收;独立的异步回收线程持续进行当前epoch最低水位的检测,并从队列获取、校验并最终执行回收任务; +基于[Epoch](epoch.zh-cn.md)机制,可以实现经典的同步回收机制,不过Epoch的修改更着重于实现异步回收,GarbageCollector是这个异步回收方案的实现;主要设计了retire操作接口,将回收动作打包成任务并绑定对应的epoch之后放入[ConcurrentBoundedQueue](bounded_queue.zh-cn.md)等待异步回收;独立的异步回收线程持续进行当前epoch最低水位的检测,并从队列获取、校验并最终执行回收任务; ## 用法示例 diff --git a/docs/concurrent/id_allocator.en.md b/docs/concurrent/id_allocator.en.md new file mode 100644 index 00000000..e305bb15 --- /dev/null +++ b/docs/concurrent/id_allocator.en.md @@ -0,0 +1,71 @@ +**[[简体中文]](id_allocator.zh-cn.md)** + +# Id Allocator + +## Principle + +The Id Allocator is used to assign a unique numerical identifier to each instance of a resource, with values concentrated in the range [0, total number of instances). This concept is similar to Unix's File Descriptor. To address the ABA problem that arises from resource reuse during creation and destruction, a versioning mechanism is added. It is primarily used to accelerate certain resource management operations. + +For example, in scenarios where additional information needs to be recorded for resources (such as attaching connection status information to sockets), the continuity of numerical identifiers allows for addressing using arrays rather than hash tables, improving time and space efficiency. + +The implementation uses a lock-free stack to organize free identifiers, combined with an atomic variable. + +![](images/id_allocator.png) + +## Usage Example + +### IdAllocator + +```c++ +#include + +using ::babylon::IdAllocator; + +// Define an allocator that supports both 32-bit and 16-bit versions; the 16-bit version is primarily used for thread ID implementations (it's unlikely that a reasonably designed program would use more than 65536 threads simultaneously) +IdAllocator allocator; +IdAllocator allocator; + +// Allocate an id, ensuring the id value is unique within the currently allocated but unfreed set; ensure version + id is unique within the visible competition range of the program +auto versioned_value = allocator.allocate(); +// versioned_value.value = allocated id value +// versioned_value.version = allocated version + +// Deallocate an id +allocator.deallocate(versioned_value); +// You can also pass just the id value, as it can be guaranteed to be unique within the allocated set; the actual deallocation interface does not use the version part +allocator.deallocate(VersionedValue{value}); + +// Get the upper limit of allocated ids; the range [0, end_id) indicates the range of previously allocated id values +// This can be used for traversing operations +auto end_value = allocator.end(); + +// Traverse the currently allocated but unfreed ids +allocator.for_each([] (uint32_t begin, uint32_t end) { + // [begin, end) indicates a range of active ids + // The callback will be called multiple times for each non-empty range during a single for_each call +}); +``` + +### ThreadId + +Utilize `babylon::IdAllocator` to assign a unique id for each thread, taking advantage of the characteristics of `babylon::IdAllocator` to ensure that the ids are as small and continuous as possible. + +```c++ +#include + +using ::babylon::ThreadId; + +// Obtain the current thread's id; when a thread exits, its used id will be reused by subsequent new threads +VersionedValue thread_id = ThreadId::current_thread_id(); + +// Get the upper limit of previously active thread ids; the range [0, end_id) indicates the range of ids that have been active +// "Previously active" is defined as having called ThreadId::current_thread_id +// However, these threads may currently still be running or may have already exited +uint16_t end_id = ThreadId::end(); + +// Traverse the currently active thread ids, defined as those that have called ThreadId::current_thread_id and have not exited +ThreadId::for_each([] (uint16_t begin, uint16_t end) { + // [begin, end) indicates a range of active thread ids + // The callback will be called multiple times for each non-empty range during a single for_each call +}); +``` diff --git a/docs/concurrent/id_allocator.md b/docs/concurrent/id_allocator.zh-cn.md similarity index 98% rename from docs/concurrent/id_allocator.md rename to docs/concurrent/id_allocator.zh-cn.md index 7987655a..f86727cf 100644 --- a/docs/concurrent/id_allocator.md +++ b/docs/concurrent/id_allocator.zh-cn.md @@ -1,3 +1,5 @@ +**[[English]](id_allocator.en.md)** + # id_allocator ## 原理 diff --git a/docs/concurrent/index.md b/docs/concurrent/index.md deleted file mode 100644 index 530a9492..00000000 --- a/docs/concurrent/index.md +++ /dev/null @@ -1,16 +0,0 @@ -# concurrent - -支持多线程并行操作的线程安全容器 - -- [bounded_queue](bounded_queue.md) -- [counter](counter.md) -- [deposit_box](deposit_box.md) -- [epoch](epoch.md) -- [execution_queue](execution_queue.md) -- [garbage_collector](garbage_collector.md) -- [id_allocator](id_allocator.md) -- [object_pool](object_pool.md) -- [thread_local](thread_local.md) -- [transient_hash_table](transient_hash_table.md) -- [transient_topic](transient_topic.md) -- [vector](vector.md) diff --git a/docs/concurrent/object_pool.en.md b/docs/concurrent/object_pool.en.md new file mode 100644 index 00000000..cb844d7e --- /dev/null +++ b/docs/concurrent/object_pool.en.md @@ -0,0 +1,62 @@ +**[[简体中文]](object_pool.en.md)** + +# Object Pool + +## Principle + +The object pool is a typical implementation designed to support application scenarios where: + +1. A parallel set of computations requires a specific type of resource. +2. Instances of this resource need to be exclusively used while in operation. +3. Creating and maintaining such resources is relatively "expensive." + +Typical examples include sockets used for communication and storage structures used in model inference. + +The underlying implementation is based on a bounded queue to provide wait-free resource allocation and return. Two modes are wrapped based on typical scenarios: + +1. **Strict Limited Mode**: A fixed number of resource instances (N) are pre-injected. If the number of applicants exceeds N, subsequent applicants must wait for others to return resources before obtaining one. +2. **Automatic Creation Mode**: When there are insufficient resources in the object pool, applicants will receive newly created temporary resource instances. When the free amount in the object pool exceeds N, returned resource instances will be released. + +## Usage Example + +```c++ +#include "babylon/concurrent/object_pool.h" + +// The default constructed object pool has a capacity of 0 and must be configured before use +::babylon::ObjectPool pool; + +// Set the maximum capacity +pool.reserve_and_clear(N); + +// Set the recycling function; once set, the recycling function will be automatically called for instances entering the object pool +// Note: In automatic creation mode, instances that are directly destroyed after overflow will also be cleaned up before destruction +pool.set_recycler([] (R& resource) { + // Clean up the resource instance +}); + +///////////////// +// Strict Limited Mode +for (...) { + // Pre-inject a fixed number of instances + pool.push(std::make_unique(...)); +} +parallel loop: + // Use pop to get an instance; if the object pool is empty, it will block until an instance is returned + auto ptr = pool.pop(); + ptr->...; // The return value is a smart pointer with a custom deleter + // Instances are automatically returned upon destruction +///////////////// + +///////////////// +// Automatic Creation Mode +// Enable automatic creation mode by setting a constructor callback +pool.set_creator([] { + return new R(...); +}); +parallel loop: + // Use pop to get an instance; if the object pool is empty, the constructor callback will be automatically called + auto ptr = pool.pop(); + ptr->...; // The return value is a smart pointer with a custom deleter + // Instances are automatically returned upon destruction; if the number of instances in the pool exceeds capacity N, the excess will be destroyed directly +///////////////// +``` diff --git a/docs/concurrent/object_pool.md b/docs/concurrent/object_pool.zh-cn.md similarity index 97% rename from docs/concurrent/object_pool.md rename to docs/concurrent/object_pool.zh-cn.md index ed86f199..c1b86d51 100644 --- a/docs/concurrent/object_pool.md +++ b/docs/concurrent/object_pool.zh-cn.md @@ -1,3 +1,5 @@ +**[[English]](object_pool.en.md)** + # object_pool ## 原理 @@ -57,6 +59,4 @@ parallel loop: ptr->...; // 返回值为定制Deleter的智能指针 // 析构时实例自动归还,池内实例超出容量N后,超出部分会直接销毁 ///////////////// - -// 单测test/test_concurrent_object_pool.cpp ``` diff --git a/docs/concurrent/thread_local.en.md b/docs/concurrent/thread_local.en.md new file mode 100644 index 00000000..a6f85c54 --- /dev/null +++ b/docs/concurrent/thread_local.en.md @@ -0,0 +1,101 @@ +**[[简体中文]](thread_local.zh-cn.md)** + +# Thread Local Storage + +## Principle + +Thread Local Storage (TLS) mechanisms are often used as caches to accelerate high-concurrency operations. POSIX and compilers provide corresponding support, but each has its limitations: + +1. The `pthread_key_create` mechanism is flexible but has a strict maximum limit, which can be restrictive in scenarios requiring a large number of TLS instances for acceleration. +2. Although the total number of `thread_local` instances is not limited, it must be determinable at compile time, which is not suitable for scenarios needing a dynamic number of TLS instances. +3. Both mechanisms lack traversal functionality, making them unsuitable for certain application scenarios. + +To address these issues, `EnumerableThreadLocal` implements a TLS mechanism at the application level using `vector` and `id_allocator`. It mainly provides the ability to create an arbitrary number of instances dynamically and allows for efficient traversal. + +### EnumerableThreadLocal + +![](images/thread_local.png) + +### CompactEnumerableThreadLocal + +![](images/compact_thread_local.png) + +To efficiently implement TLS, it is often necessary to isolate data between threads by Cache Line. For a large number of small TLS instances (e.g., a `size_t` counter), this can lead to significant waste. `CompactEnumerableThreadLocal` reduces this waste by packing multiple small TLS instances together to share a single Cache Line. + +## Usage Example + +### EnumerableThreadLocal + +```c++ +#include + +using ::babylon::EnumerableThreadLocal; + +// Define a type aggregator +EnumerableThreadLocal storage; +// By default, there is no cache line isolation; you can implement it manually if needed +EnumerableThreadLocal> storage; + +// Get local data +// Thread 1 +size_t& local_value = storage.local(); +local_value = 3; + +// Thread 2 +size_t& local_value = storage.local(); +local_value = 4; + +// Aggregate all local data +size_t sum = 0; +storage.for_each([&] (size_t* begin, size_t* end) { + while (begin != end) { + sum += *begin++; + } +}); +// sum = 7 + +// When Thread 2 exits +// Aggregate all local data again +size_t sum_all = 0; +size_t sum_alive = 0; +storage.for_each([&] (size_t* begin, size_t* end) { + while (begin != end) { + sum_all += *begin++; + } +}); +storage.for_each_alive([&] (size_t* begin, size_t* end) { + while (begin != end) { + sum_alive += *begin++; + } +}); +// sum_all = 7, sum_alive = 3 +``` + +### CompactEnumerableThreadLocal + +```c++ +#include + +using ::babylon::CompactEnumerableThreadLocal; + +// Define a type aggregator +// The template parameter specifies how many cache lines a block contains; in scenarios where many instances need to be created, +// More cache lines will make memory more compact and speed up dense traversal +CompactEnumerableThreadLocal storage; + +// Get local data +// Thread 1 +size_t& local_value = storage.local(); +local_value = 3; + +// Thread 2 +size_t& local_value = storage.local(); +local_value = 4; + +// Aggregate all local data +size_t sum = 0; +storage.for_each([&] (size_t& value) { + sum += value; +}); +// sum = 7 +``` diff --git a/docs/concurrent/thread_local.md b/docs/concurrent/thread_local.zh-cn.md similarity index 98% rename from docs/concurrent/thread_local.md rename to docs/concurrent/thread_local.zh-cn.md index 3ff823e7..27d96839 100644 --- a/docs/concurrent/thread_local.md +++ b/docs/concurrent/thread_local.zh-cn.md @@ -1,3 +1,5 @@ +**[[English]](thread_local.en.md)** + # thread_local ## 原理 diff --git a/docs/concurrent/transient_hash_table.en.md b/docs/concurrent/transient_hash_table.en.md new file mode 100644 index 00000000..6825e67b --- /dev/null +++ b/docs/concurrent/transient_hash_table.en.md @@ -0,0 +1,100 @@ +**[[简体中文]](transient_hash_table.zh-cn.md)** + +# Transient Hash Table + +## Principle + +`ConcurrentFixedSwissTable` is based on Google's SwissTable, utilizing control bytes to implement fine-grained spinlocks at the slot level, specifically designed for high-concurrency lookup and insertion operations. However, it lacks support for deletion and automatic rehashing. + +`ConcurrentTransientHashSet` and `ConcurrentTransientHashMap` build upon `ConcurrentFixedSwissTable`, using a simple lock-and-copy mechanism to provide automatic rehashing capabilities, enhancing their practicality. As indicated by their names, these structures do not support deletion and are intended for short-lived concurrent constructions and lookups, making it easy to clear them when no longer needed. + +![](images/transient_hash_table.png) + +## Usage Example + +```c++ +#include + +using ::babylon::ConcurrentTransientHashSet; +using ::babylon::ConcurrentTransientHashMap; + +// The default initial capacity is very small (16 or 32). For single-use, specify an appropriate size based on the scenario. +// For repeated use, `clear` will retain the previous capacity. +ConcurrentTransientHashSet<::std::string> set; +ConcurrentTransientHashSet<::std::string> set(1024); +ConcurrentTransientHashMap<::std::string, ::std::string> map; +ConcurrentTransientHashMap<::std::string, ::std::string> map(1024); + +// Lookup and insertion are thread-safe +set.emplace("10086"); +map.emplace("10086", "10010"); +set.find("10086"); // != set.end(); +map.find("10086"); // != map.end(); + +// Traversal is possible +for (auto& value : set) { + // value == "10086" +} +for (auto& pair : map) { + // pair.first == "10086" + // pair.second == "10010" +} + +// Clear for the next reuse +set.clear(); +map.clear(); +``` + +## Performance Evaluation + +The benchmark code can be found in `bench/bench_concurrent_hash_table.cpp`. It randomly generates 1 million `uint64_t` data, forming a specified duplication rate and evenly distributing it among multiple threads for concurrent insertion. It evaluates the number of operations per second (single-threaded) and CPU usage per 1,000 operations. The performance of several typical open-source concurrent hash tables is compared in pure concurrent insertion scenarios, including TBB, Folly, and highly-rated personal libraries such as `parallel-hashmap` and `libcuckoo`. + +### Performance Table (Hit Ratio = 0.01) + +| Threads | 1 | 4 | 16 | 1 | 4 | | 16 | | +|-------------------------------------|--------------------|--------------|---------------|--------------------|--------------|-------|---------------|-------| +| | QPS | QPS | QPS | QPS | QPS | CPU | QPS | CPU | +| `tbb::concurrent_unordered_set` | 2136 | 2923 (730) | 4065 (254) | 1760 | 2380 (595) | 1.096 | 4651 (290) | 1.33 | +| `tbb::concurrent_hash_set` | 5154 | 4255 (1063) | 4807 (300) | 5181 | 3546 (221) | 0.828 | 4566 (285) | 2.03 | +| `phmap::parallel_flat_hash_set` | 27855 | 14224 (3556) | 14814 (925) | 18148 | 12870 (3217) | 0.294 | 15455 (965) | 0.961 | +| `folly::ConcurrentHashMap` | 4310 | 10952 (2738) | 34129 (2133) | 4132 | 11723 (2930) | 0.124 | 36231 (2264) | 0.049 | +| `libcuckoo::cuckoohash_map` | 4424 | 8928 (2232) | 31746 (1984) | 5263 | 14792 (3698) | 0.223 | 55248 (3453) | 0.166 | +| `folly::AtomicUnorderedInsertMap` | 5847 | 9090 (2272) | 29673 (1854) | 10460 | 29761 (7440) | 0.095 | 62111 (3881) | 0.116 | +| `folly::AtomicHashMap` | 7936 | 19455 (4863) | 61349 (3834) | 6896 | 20366 (5091) | 0.145 | 65789 (4111) | 0.110 | +| `babylon::ConcurrentTransientHashSet` | 24509 | 39062 (9765) | 129870 (8116) | 17825 | 36231 (9057) | 0.096 | 125944 (7871) | 0.110 | + +### Performance Table (Hit Ratio = 0.5) + +| Threads | 1 | 4 | 16 | 1 | 4 | | 16 | | +|-------------------------------------|--------------------|---------------|----------------|--------------------|---------------|-------|----------------|-------| +| | QPS | QPS | QPS | QPS | QPS | CPU | QPS | CPU | +| `tbb::concurrent_unordered_set` | 2962 | 3921 (980) | 6711 (419) | 2518 | 4671 (1168) | 0.584 | 5988 (374) | 1.42 | +| `tbb::concurrent_hash_set` | 5882 | 6211 (1552) | 8333 (820) | 5847 | 6172 (1543) | 0.506 | 8333 (520) | 1.21 | +| `phmap::parallel_flat_hash_set` | 32467 | 21929 (5482) | 22172 (1385) | 30769 | 22222 (5555) | 0.172 | 23364 (1460) | 0.643 | +| `folly::ConcurrentHashMap` | 4405 | 10526 (2631) | 33003 (2062) | 4149 | 10752 (2688) | 0.178 | 34482 (2155) | 0.073 | +| `libcuckoo::cuckoohash_map` | 5952 | 12804 (3201) | 46511 (2906) | 6849 | 16949 (4237) | 0.203 | 61349 (3834) | 0.174 | +| `folly::AtomicUnorderedInsertMap` | 12690 | 27700 (6925) | 54644 (3415) | 17857 | 45248 (11312) | 0.068 | 72992 (4562) | 0.134 | +| `folly::AtomicHashMap` | 11025 | 27397 (6849) | 88495 (5530) | 11198 | 30769 (7692) | 0.102 | 99009 (6188) | 0.081 | +| `babylon::ConcurrentTransientHashSet` | 29239 | 49504 (12376) | 166666 (10416) | 33222 | 52910 (13227) | 0.069 | 172413 (10775) | 0.081 | + +### Performance Table (Hit Ratio = 0.94) + +| Threads | 1 | 4 | 16 | 1 | 4 | | 16 | | +|-------------------------------------|--------------------|----------------|----------------|--------------------|----------------|-------|----------------|-------| +| | QPS | QPS | QPS | QPS | QPS | CPU | QPS | CPU | +| `tbb::concurrent_unordered_set` | 7633 | 16666 (4166) | 30211 (1888) | 7751 | 15847 (3961) | 0.224 | 25125 (1570) | 0.498 | +| `tbb::concurrent_hash_set` | 12722 | 18903 (4729) | 43290 (2705) | 14164 | 20120 (5030) | 0.181 | 61728 (3858) | 0.181 | +| `folly::ConcurrentHashMap` | 7812 | 16447 (4111) | 51282 (3205) | ```markdown +| 8474 | 16806 (4201) | 0.174 | 51282 (3205) | 0.129 | +| `libcuckoo::cuckoohash_map` | 10460 | 20325 (5081) | 72992 (4562) | 9900 | 21052 (5263) | 0.171 | 75757 (4734) | 0.171 | +| `folly::AtomicUnorderedInsertMap` | 36900 | 91743 (22935) | 85470 (5341) | 48309 | 129198 (32299) | 0.026 | 90909 (5681) | 0.155 | +| `phmap::parallel_flat_hash_set` | 61349 | 80645 (20161) | 135501 (8468) | 79365 | 96153 (24038) | 0.040 | 123456 (7716) | 0.123 | +| `folly::AtomicHashMap` | 18348 | 57471 (14367) | 204081 (12755) | 23923 | 78740 (19685) | 0.046 | 264550 (16534) | 0.041 | +| `babylon::ConcurrentTransientHashSet` | 59523 | 133333 (33333) | 401606 (25100) | 80645 | 173010 (43252) | 0.022 | 502512 (31407) | 0.025 | + +## Overall Evaluation + +1. TBB's two concurrent structures have a relatively complete interface but suffer from poor performance across the board due to their theoretical implementation. In low concurrency/high hit scenarios, `phmap`, which wraps the Swiss table with simple sharded locks, performs decently but shows significant performance degradation as concurrency increases. +2. Folly’s implementations demonstrate a high level of quality overall. The Atomic series performs well in concurrent scenarios with known sizes. Notably, `folly::ConcurrentHashMap` offers stable iterator functionality with deletion support through hazard pointers, maintaining good concurrency capabilities, making it quite versatile. CMU's `libcuckoo` provides a surprisingly good engineering implementation with concurrent performance exceeding that of Folly when deletion is not a concern. +3. In deletion-unsupported scenarios, Folly’s Atomic series performs well; however, combining the control bits of Swiss tables and separate data design, `babylon::ConcurrentTransientHashSet` supports arbitrary keys while further enhancing throughput. +``` diff --git a/docs/concurrent/transient_hash_table.md b/docs/concurrent/transient_hash_table.zh-cn.md similarity index 99% rename from docs/concurrent/transient_hash_table.md rename to docs/concurrent/transient_hash_table.zh-cn.md index 473b0eb2..8e7bdbe0 100644 --- a/docs/concurrent/transient_hash_table.md +++ b/docs/concurrent/transient_hash_table.zh-cn.md @@ -1,3 +1,5 @@ +**[[English]](transient_hash_table.en.md)** + # transient_hash_table ## 原理 diff --git a/docs/concurrent/transient_topic.en.md b/docs/concurrent/transient_topic.en.md new file mode 100644 index 00000000..08ca7453 --- /dev/null +++ b/docs/concurrent/transient_topic.en.md @@ -0,0 +1,57 @@ +**[[简体中文]](transient_topic.zh-cn.md)** + +# Transient Topic + +## Principle + +The MPSC (Multiple Producer, Single Consumer) Pub-Sub Topic implemented on continuous space does not support multi-threaded contention on the consumer side but allows multiple consumers to independently and repeatedly subscribe once. The core difference from a typical Pub-Sub Topic is its commitment to support "transience," meaning it is expected to be used periodically, with a limited amount of published data in each usage cycle, clearing all data before the next cycle arrives. For example, it can support localized "transient" Pub-Sub parallel computation within a single RPC. + +In implementation, a vector is used to store the published data, and a mechanism similar to a bounded queue with segmented positions is used to coordinate publishing and consumption. + +![](images/transient_topic.png) + +## Usage Example + +```c++ +#include + +using ::babylon::ConcurrentTransientTopic; + +// Explicitly define a Topic +ConcurrentTransientTopic<::std::string> topic; + +// Reserve space of length N +// When repeatedly using the same topic instance, it retains the previous space +// Therefore, it is generally unnecessary to specifically reserve +topic.reserve(N); + +// Publish data, thread-safe +threads: + topic.publish(V); // Single publish + // Batch publish + topic.publish_n(N, [] (Iter begin, Iter end) { + ... // Fill the results to be published in [begin, end), which will be officially published upon return + // This may be called multiple times, each time passing a sub-range, with the total reaching N + }); + +// End publishing; consumers will be notified of the end and will exit the consumption loop +topic.close(); + +// Create a consumer +// Multiple creations can yield multiple independent consumers, each can consume the full data set once +auto consumer = topic.subscribe(); + +// Consume 1 element; returns a pointer to the element, blocking and waiting if there is no publication +// Returns nullptr after consumption +auto item = consumer.consume(); + +// Batch consume num elements; returns a consumable range, blocking and waiting if there is insufficient publication +// Unless the queue is closed early, the returned range may be less than num +auto range = consumer.consume(num); +for (size_t i = 0; i < range.size(); ++i) { + auto& item = range[i]; // Get a reference to the i-th element in this batch +} + +// Clear the queue for reuse in the next publication +topic.clear(); +``` diff --git a/docs/concurrent/transient_topic.md b/docs/concurrent/transient_topic.zh-cn.md similarity index 98% rename from docs/concurrent/transient_topic.md rename to docs/concurrent/transient_topic.zh-cn.md index d4b6eebc..b33ba8c5 100644 --- a/docs/concurrent/transient_topic.md +++ b/docs/concurrent/transient_topic.zh-cn.md @@ -1,3 +1,5 @@ +**[[English]](transient_topic.en.md)** + # transient_topic ## 原理 diff --git a/docs/concurrent/vector.en.md b/docs/concurrent/vector.en.md new file mode 100644 index 00000000..363be6af --- /dev/null +++ b/docs/concurrent/vector.en.md @@ -0,0 +1,107 @@ +**[[简体中文]](vector.zh-cn.md)** + +# Vector + +## Principle + +Based on a segmented array, this implementation provides random access performance close to `std::vector`, as well as thread-safe size growth capabilities, ensuring that the addresses of elements obtained will not become invalid due to subsequent growth of the container. The core differences from `std::vector` include: + +1. `size == capacity == n * block_size`, with size growth occurring in bulk increments defined by a pre-specified block size. +2. `&v[i] + 1 != &v[i + 1]`, allowing for random access, but the underlying space may not be contiguous. +3. Concurrent operations such as `v.ensure(i)` are thread-safe and guarantee that any address obtained through `&v.ensure(i)` will not become invalid due to subsequent growth. +4. Access and growth operations are wait-free (excluding the implementation of memory allocation and element construction). + +![](images/concurrent_vector.png) + +## Usage Example + +```c++ +#include + +using ::babylon::ConcurrentVector; + +// Static block_size +ConcurrentVector vector; // Statically specify block_size as 128, must be 2^n + +// Dynamic block_size +ConcurrentVector vector; // ConcurrentVector(1024) +ConcurrentVector vector; // ConcurrentVector(1024), 0 indicates dynamic block_size, default +ConcurrentVector vector(block_size_hint); // Actual block_size will round up block_size_hint to 2^n + +// Extend capacity +vector.reserve(10010); // Ensure size grows to accommodate at least 10010 elements + +// Random access with potential capacity extension +vector.ensure(10086).assign("10086"); // If current size is insufficient to hold element 10086, it will first extend the underlying storage size, then return a reference to element 10086 + +// Random access without capacity checking, generally used when index < size is already known +vector[10086].assign("10086"); // If current size is insufficient to hold element 10086, it may cause an out-of-bounds error + +// When multiple indices are expected to be accessed in a short period, a snapshot can be obtained to avoid repeated access to the segment mapping table +auto snapshot = vector.snapshot(); +auto snapshot = vector.reserved_snapshot(30); // Ensure elements [0, 30) are accessible +for (size_t i = 0; i < 30; ++i) { + snapshot[i] = ... // Each access will no longer re-fetch the segment mapping table, speeding up access +} + +// Copy from contiguous space similar to vector +// Optimized for the underlying segmented contiguity, similar to std::copy_n +vector.copy_n(iter, size, offset); +// Further details can be found in the comments +// Unit tests in test/test_concurrent_vector.cpp +``` + +## Performance Evaluation + +### Sequential Write + +``` +==================== batch 100 ==================== +std::vector loop assign use 0.212214 +std::vector fill use 0.211325 +babylon::ConcurrentVector loop assign use 1.26182 +babylon::ConcurrentVector snapshot loop assign use 1.05421 +babylon::ConcurrentVector fill use 0.219594 +==================== batch 10000 ==================== +std::vector loop assign use 0.288137 +std::vector fill use 0.281818 +babylon::ConcurrentVector loop assign use 1.18824 +babylon::ConcurrentVector snapshot loop assign use 0.965977 +babylon::ConcurrentVector fill use 0.304165 +``` + +### Sequential Read + +``` +==================== batch 100 ==================== +std::vector loop read use 0.255723 +babylon::ConcurrentVector loop read use 1.36107 +babylon::ConcurrentVector snapshot loop read use 1.06447 +==================== batch 10000 ==================== +std::vector loop read use 0.27499 +babylon::ConcurrentVector loop read use 1.22806 +babylon::ConcurrentVector snapshot loop read use 0.952212 +``` + +### 12 Concurrent Reads and Writes + +``` +==================== seq_cst rw batch 2 ==================== +std::vector use 0.342871 +std::vector aligned use 0.0452792 +babylon::ConcurrentVector ensure use 0.463758 +babylon::ConcurrentVector [] use 0.357992 +babylon::ConcurrentVector snapshot [] use 0.419337 +babylon::ConcurrentVector aligned ensure use 0.045025 +babylon::ConcurrentVector aligned [] use 0.047975 +babylon::ConcurrentVector aligned snapshot [] use 0.0898667 +==================== seq_cst rw batch 20 ==================== +std::vector use 0.0754283 +std::vector aligned use 0.0624383 +babylon::ConcurrentVector ensure use 0.0718946 +babylon::ConcurrentVector [] use 0.0718017 +babylon::ConcurrentVector snapshot [] use 0.0634408 +babylon::ConcurrentVector aligned ensure use 0.0610958 +babylon::ConcurrentVector aligned [] use 0.0681283 +babylon::ConcurrentVector aligned snapshot [] use 0.0622529 +``` diff --git a/docs/concurrent/vector.md b/docs/concurrent/vector.zh-cn.md similarity index 99% rename from docs/concurrent/vector.md rename to docs/concurrent/vector.zh-cn.md index c944f08b..5efaa47a 100644 --- a/docs/concurrent/vector.md +++ b/docs/concurrent/vector.zh-cn.md @@ -1,3 +1,5 @@ +**[[English]](vector.en.md)** + # vector ## 原理 diff --git a/docs/coroutine/README.en.md b/docs/coroutine/README.en.md new file mode 100644 index 00000000..12568ece --- /dev/null +++ b/docs/coroutine/README.en.md @@ -0,0 +1,24 @@ +**[[简体中文]](README.zh-cn.md)** + +# Coroutine + +## Principle + +A coroutine mechanism implemented based on the [C++20](https://en.cppreference.com/w/cpp/20) [coroutine](https://en.cppreference.com/w/cpp/language/coroutines) standard. According to the standard, a coroutine function, in addition to containing the `co_xxx` keyword expressions, also has special requirements for its return type. These requirements can generally be divided into two main categories: one supports a single return value coroutine, commonly referred to as a `task`, while the other supports multiple return values, typically called a `generator`. + +![promise](images/promise.png) +![awaitable](images/awaitable.png) + +The [coroutine](https://en.cppreference.com/w/cpp/language/coroutines) standard does not directly define the coroutine mechanism but rather abstracts and standardizes the API part of the coroutine mechanism while separating the fine-grained SPI (Service Provider Interface) part, which remains invisible to the user. The coroutine mechanism operates through the following flow: User -> API -> Compiler -> SPI -> Coroutine mechanism. The overall operation mode is: + +- The end user expresses coroutine semantics using unified keyword operators, such as `co_await` and `co_return`. +- The end user defines the coroutine mechanism being utilized through the coroutine function’s return type. +- The compiler handles the coroutine suspension and resumption in stages, according to the operator semantics defined in the standard. +- The compiler invokes the corresponding functions of the coroutine framework at specific standard points before suspension and after resumption. + +## Submodule Documentation + +- [task](task.en.md) +- [future_awaitable](future_awaitable.en.md) +- [cancellable](cancellable.en.md) +- [futex](futex.en.md) diff --git a/docs/coroutine/README.md b/docs/coroutine/README.md deleted file mode 100644 index 3d9ba55b..00000000 --- a/docs/coroutine/README.md +++ /dev/null @@ -1,21 +0,0 @@ -# coroutine - -## 原理 - -基于[C++20](https://en.cppreference.com/w/cpp/20)的[coroutine](https://en.cppreference.com/w/cpp/language/coroutines)标准,实现的一套协程机制;按照标准,一个协程函数除了内部包含`co_xxx`关键字语句之外,返回值也有一系列特殊要求;要求整体可以分为两个大类,一类用来支持单一返回值协程,返回值一般称为task;另一类用来支持多次返回值协程,一般被称为generator; - -![](images/promise.png) -![](images/awaitable.png) - -[coroutine](https://en.cppreference.com/w/cpp/language/coroutines)标准并非直接定义了协程机制,而是更为抽象地统一了协程机制的API部分,并将细粒度的SPI部分分离到用户不可见的部分;最终通过用户 -> API -> 编译器 -> SPI -> 协程机制来进行运转;整体工作模式为 -- 最终用户通过统一关键字操作符表达语义,例如`co_await`和`co_return` -- 最终用户通过协程函数返回类型,表达实际对接的协程机制 -- 编译器按照操作符语义,按标准分阶段完成协程的中断和恢复 -- 编译器在中断和恢复前后的若干标准点位,调用实际协程提供框架的相应功能 - -## 子模块文档 - -- [task](task.md) -- [future_awaitable](future_awaitable.md) -- [cancellable](cancellable.md) -- [futex](futex.md) diff --git a/docs/coroutine/README.md b/docs/coroutine/README.md new file mode 120000 index 00000000..b636b478 --- /dev/null +++ b/docs/coroutine/README.md @@ -0,0 +1 @@ +README.zh-cn.md \ No newline at end of file diff --git a/docs/coroutine/README.zh-cn.md b/docs/coroutine/README.zh-cn.md new file mode 100644 index 00000000..7d781539 --- /dev/null +++ b/docs/coroutine/README.zh-cn.md @@ -0,0 +1,23 @@ +**[[English]](README.en.md)** + +# coroutine + +## 原理 + +基于[C++20](https://en.cppreference.com/w/cpp/20)的[coroutine](https://en.cppreference.com/w/cpp/language/coroutines)标准,实现的一套协程机制;按照标准,一个协程函数除了内部包含`co_xxx`关键字语句之外,返回值也有一系列特殊要求;要求整体可以分为两个大类,一类用来支持单一返回值协程,返回值一般称为task;另一类用来支持多次返回值协程,一般被称为generator; + +![](images/promise.png) +![](images/awaitable.png) + +[coroutine](https://en.cppreference.com/w/cpp/language/coroutines)标准并非直接定义了协程机制,而是更为抽象地统一了协程机制的API部分,并将细粒度的SPI部分分离到用户不可见的部分;最终通过用户 -> API -> 编译器 -> SPI -> 协程机制来进行运转;整体工作模式为 +- 最终用户通过统一关键字操作符表达语义,例如`co_await`和`co_return` +- 最终用户通过协程函数返回类型,表达实际对接的协程机制 +- 编译器按照操作符语义,按标准分阶段完成协程的中断和恢复 +- 编译器在中断和恢复前后的若干标准点位,调用实际协程提供框架的相应功能 + +## 子模块文档 + +- [task](task.zh-cn.md) +- [future_awaitable](future_awaitable.zh-cn.md) +- [cancellable](cancellable.zh-cn.md) +- [futex](futex.zh-cn.md) diff --git a/docs/coroutine/cancellable.en.md b/docs/coroutine/cancellable.en.md new file mode 100644 index 00000000..104fad1d --- /dev/null +++ b/docs/coroutine/cancellable.en.md @@ -0,0 +1,44 @@ +**[[简体中文]](cancellable.zh-cn.md)** + +# Cancellable + +## Principle + +In the standard [coroutine](https://zh.cppreference.com/w/cpp/language/coroutines) mechanism, `co_await` behaves similarly to [std::future::get](https://zh.cppreference.com/w/cpp/thread/future/get), where it waits for the target awaitable to complete before resuming execution. However, there are cases where we need the ability to end the wait early, such as in timeout scenarios. + +![cancellable](images/cancellable.png) + +The Cancellable implementation wraps a regular awaitable and adds cancellation support by inserting a proxy awaitable. The proxy awaitable forwards `co_await` to the wrapped awaitable and eventually propagates the resume action back to the original coroutine that initiated the `co_await`. However, the proxy awaitable does not store the coroutine handle locally. Instead, it places the handle in a [DepositBox](../concurrent/deposit_box.zh-cn.md) and transmits it to the cancellation trigger source, such as a registered timer. When the wrapped awaitable triggers the resume, or when the cancellation source triggers the cancel, they compete for the coroutine handle stored in the [DepositBox](../concurrent/deposit_box.zh-cn.md). The winning party resumes the coroutine. + +## Usage Example + +```c++ +#include "babylon/coroutine/task.h" +#include "babylon/coroutine/cancellable.h" + +using ::babylon::coroutine::Task; +using ::babylon::coroutine::Cancellable; + +using Cancellation = typename Cancellable::Cancellation; + +Task<...> some_coroutine(...) { + ... + // Wrap the original awaitable as Cancellable + auto optional_value = co_await Cancellable(::std::move(a)).on_suspend( + // Use a callback function to receive the corresponding cancellation handle after the coroutine is suspended + [&](Cancellation cancel) { + // Typically, the cancel handle is registered to a timer, which calls cancel() after a specified time to initiate cancellation + // From the moment the callback is executed, cancel is usable. You can even invoke cancel() directly within the callback, though it's usually unnecessary. + on_timer(cancel, 100ms); + } + ); + // If 'a' completes first, a non-empty value is returned for further operations + if (optional_value) { + optional_value->item_member_function(...); + some_function_use_item(*optional_value); + } else { + // A null value indicates that the cancellation action was triggered first + } + ... +} +``` diff --git a/docs/coroutine/cancellable.md b/docs/coroutine/cancellable.zh-cn.md similarity index 98% rename from docs/coroutine/cancellable.md rename to docs/coroutine/cancellable.zh-cn.md index 924b43af..33cda435 100644 --- a/docs/coroutine/cancellable.md +++ b/docs/coroutine/cancellable.zh-cn.md @@ -1,3 +1,5 @@ +**[[English]](cancellable.en.md)** + # cancellable ## 原理 diff --git a/docs/coroutine/futex.en.md b/docs/coroutine/futex.en.md new file mode 100644 index 00000000..dc3feb17 --- /dev/null +++ b/docs/coroutine/futex.en.md @@ -0,0 +1,51 @@ +**[[简体中文]](futex.zh-cn.md)** + +# Futex + +## Principle + +The standard [coroutine](https://en.cppreference.com/w/cpp/language/coroutines) mechanism's `co_await` is essentially aligned with the [std::future](https://en.cppreference.com/w/cpp/thread/future) parallel synchronization model. However, more complex synchronization models, such as [std::mutex](https://en.cppreference.com/w/cpp/thread/mutex) or [std::condition_variable](https://en.cppreference.com/w/cpp/thread/condition_variable), can be unified through a mechanism similar to [futex(2)](https://man7.org/linux/man-pages/man2/futex.2.html). + +![futex](images/futex.png) + +The implementation uses a `std::mutex` for each futex instance to manage value checks and to chain together waiting callbacks atomically. The [DepositBox](../concurrent/deposit_box.en.md) is used to ensure the uniqueness of cancellation and wake-up actions. + +## Usage Example + +```c++ +#include "babylon/coroutine/task.h" +#include "babylon/coroutine/futex.h" + +using ::babylon::coroutine::Task; +using ::babylon::coroutine::Futex; + +using Cancellation = Futex::Cancellation; + +// Futex is initialized with an internal value of 0 +Futex futex; + +// Read and write futex internal value +futex.value() = ...; +// Atomically read and write futex internal value +futex.atomic_value().xxxx(...); + +Task<...> some_coroutine(...) { + ... + // Atomically check if the internal value is equal to expected_value. + // If true, suspend the coroutine, otherwise continue execution. + co_await futex.wait(expected_value).on_suspend( + // Use a callback function to receive the cancellation handle after the coroutine is suspended. + // Note that if the coroutine is not suspended, the callback won't be called. + [&](Cancellation cancel) { + // Typically, the cancel handle is registered to a timer mechanism, and after a specified time, cancel() is invoked to trigger cancellation. + // From the moment the callback is executed, cancel is usable. You can even invoke cancel() directly within the callback, though it's generally unnecessary. + on_timer(cancel, 100ms); + } + ); + // Several scenarios could lead to execution reaching this point: + // 1. The expected_value was not met. + // 2. After suspension, futex.wake_one or futex.wake_all was called. + // 3. After suspension, cancel() was invoked. + ... +} +``` diff --git a/docs/coroutine/futex.md b/docs/coroutine/futex.zh-cn.md similarity index 81% rename from docs/coroutine/futex.md rename to docs/coroutine/futex.zh-cn.md index de5f50a7..ecfb7e94 100644 --- a/docs/coroutine/futex.md +++ b/docs/coroutine/futex.zh-cn.md @@ -1,12 +1,14 @@ +**[[English]](futex.en.md)** + # futex ## 原理 -标准[coroutine](https://en.cppreference.com/w/cpp/language/coroutines)机制中的`co_await`基本对标了[std::future](https://en.cppreference.com/w/cpp/thread/future)并行同步模式;但更多复杂的同步模式例如[std::mutex](https://en.cppreference.com/w/cpp/thread/mutex)或者[std::condition_variable](https://en.cppreference.com/w/cpp/thread/condition_variable)的支持可以统一通过类似[futex(2)](https://man7.org/linux/man-pages/man2/futex.2.html)的机制统一支持; +标准[coroutine](https://zh.cppreference.com/w/cpp/language/coroutines)机制中的`co_await`基本对标了[std::future](https://zh.cppreference.com/w/cpp/thread/future)并行同步模式;但更多复杂的同步模式例如[std::mutex](https://zh.cppreference.com/w/cpp/thread/mutex)或者[std::condition_variable](https://zh.cppreference.com/w/cpp/thread/condition_variable)的支持可以统一通过类似[futex(2)](https://man7.org/linux/man-pages/man2/futex.2.html)的机制统一支持; ![](images/futex.png) -实现上通过每个futex实例伴随一个`std::mutex`来实现值检测和等待回调链串联原子性;通过[DepositBox](../concurrent/deposit_box.md)实现取消和唤醒的唯一性; +实现上通过每个futex实例伴随一个`std::mutex`来实现值检测和等待回调链串联原子性;通过[DepositBox](../concurrent/deposit_box.zh-cn.md)实现取消和唤醒的唯一性; ## 用法示例 diff --git a/docs/coroutine/future_awaitable.en.md b/docs/coroutine/future_awaitable.en.md new file mode 100644 index 00000000..c79df0d7 --- /dev/null +++ b/docs/coroutine/future_awaitable.en.md @@ -0,0 +1,41 @@ +**[[简体中文]](future_awaitable.zh-cn.md)** + +# FutureAwaitable + +## Principle + +The callback mechanism of [Future](../future.en.md) makes it well-suited to be an awaitable in a coroutine, largely because the internal structure of coroutines also follows the future/promise design pattern. Two versions of future awaitables are provided: `FutureAwaitable` in exclusive mode, where one future corresponds to a single awaiter (which suffices for most use cases), and `SharedFutureAwaitable` in shared mode, where multiple awaiters can await the same future and resume execution once the future is fulfilled. + +## Usage Example + +```c++ +#include "babylon/coroutine/task.h" +#include "babylon/future.h" + +using ::babylon::coroutine::Task; +using ::babylon::Future; + +::babylon::Future future = ... + +// Exclusive mode: The value is moved from the future. +Task<...> some_coroutine(...) { + ... + // This suspends the coroutine, allowing other queued tasks to execute. + T value = co_await ::std::move(future); + // When promise.set_value(...) is called, the coroutine resumes execution. + // The coroutine will return to the executor it was originally bound to. + // Using co_await with Future&& results in a T&&, so T must be used to receive the result. + ... +} + +// Shared mode: The value is exposed by reference from the future and must be used with caution regarding modification safety. +Task<...> some_coroutine(...) { + ... + // This suspends the coroutine, allowing other queued tasks to execute. + T& value = co_await future; + // When promise.set_value(...) is called, the coroutine resumes execution. + // The coroutine will return to the executor it was originally bound to. + // Using co_await with Future& results in a T& reference, and T's lifecycle is controlled by the future. + ... +} +``` diff --git a/docs/coroutine/future_awaitable.md b/docs/coroutine/future_awaitable.zh-cn.md similarity index 67% rename from docs/coroutine/future_awaitable.md rename to docs/coroutine/future_awaitable.zh-cn.md index 3bd26d71..a660a1cb 100644 --- a/docs/coroutine/future_awaitable.md +++ b/docs/coroutine/future_awaitable.zh-cn.md @@ -1,8 +1,10 @@ +**[[English]](future_awaitable.en.md)** + # future_awaitable ## 原理 -[Future](../future.md)本身的回调机制使其非常适合成为一个协程的awaitable,一定程度上也是因为协程内部本身也采用future/promise设计模式的原因;对于[Future](../future.md)进行awaitable的包装提供了两个不同版本,FutureAwaitable为独占模式,即一个Future对应单一Awaiter,一般情况下独占模式可以满足大多数需求;另一个版本是SharedFutureAwaitable共享模式,同一个future可以对应多个Awaiter,完成时同时恢复多个协程; +[Future](../future.zh-cn.md)本身的回调机制使其非常适合成为一个协程的awaitable,一定程度上也是因为协程内部本身也采用future/promise设计模式的原因;对于[Future](../future.zh-cn.md)进行awaitable的包装提供了两个不同版本,FutureAwaitable为独占模式,即一个Future对应单一Awaiter,一般情况下独占模式可以满足大多数需求;另一个版本是SharedFutureAwaitable共享模式,同一个future可以对应多个Awaiter,完成时同时恢复多个协程; ## 用法示例 diff --git a/docs/coroutine/task.en.md b/docs/coroutine/task.en.md new file mode 100644 index 00000000..8b09a5d2 --- /dev/null +++ b/docs/coroutine/task.en.md @@ -0,0 +1,118 @@ +**[[简体中文]](task.zh-cn.md)** + +# Task + +## Principle + +The execution of coroutine tasks is designed to rely on an [Executor](../executor.en.md). A root coroutine must be submitted to an [Executor](../executor.en.md) for execution, after which the coroutine will be bound to that specific executor. During coroutine execution, sub-coroutines can be started by `co_await` another task, which will, by default, also be bound to the same executor. However, it is possible to bind a sub-coroutine to a different executor. When a task is suspended and later resumed, it will always return to the executor to which it was originally bound. + +A coroutine task can `co_await` another task or other built-in Babylon objects, such as [Future](../future.en.md). Additionally, template specialization enables custom types to be integrated into the framework as awaitables. + +## Usage Example + +### Run Task + +```c++ +#include "babylon/coroutine/task.h" +#include "babylon/executor.h" + +using ::babylon::coroutine::Task; +using ::babylon::Executor; + +// Any Executor implementation can be used to submit coroutines +Executor& executor = ...; + +// Supports basic functions +struct S { + static Task<...> coroutine_plain_function(...) { + ... + } +}; +// When submitting a coroutine to an Executor, it returns a Future<...> wrapping the co_return result, instead of Task<...>. +// The returned future can be retrieved with get or wait_for, similar to submitting regular functions. +auto future = executor.execute(S::coroutine_plain_function, ...); +// You can also submit without waiting for completion. +auto success = executor.submit(S::coroutine_plain_function, ...); + +// Supports member functions +struct S { + Task<...> coroutine_member_function(...) { + ... + } +} s; +auto future = executor.execute(&S::coroutine_member_function, &s, ...); + +// Supports operator() +struct S { + Task<...> operator()(...) { + ... + } +} s; +auto future = executor.execute(s, ...); + +// Supports lambda functions +auto future = executor.execute([&](...) -> Task<...> { + ... +}); + +// Launch sub-coroutines +Task<...> some_coroutine(...) { + ... + // Suspends some_coroutine and switches to coroutine_member_function + ... = co_await s.coroutine_member_function(...); + // After coroutine_member_function completes, it switches back + ... +} + +// Launch sub-coroutines on a different Executor +Task<...> some_coroutine(...) { + ... + // Suspends some_coroutine and coroutine_member_function will be executed on other_executor + ... = co_await s.coroutine_member_function(...).set_executor(other_executor); + // After coroutine_member_function completes, it resumes some_coroutine + // some_coroutine still returns to its originally bound executor + ... +} + +//////////////////////////////// Advanced Usage ////////////////////////////////// + +// By default, coroutine execution state is destroyed along with the task, but it can be explicitly released and a handle can be obtained. +std::coroutine_handle<...> handle = some_coroutine(...).release(); +// The handle can be transferred to another thread and resumed, which is what the Executor internally does. +handle.resume(); +// No need to call destroy; when not co_awaited as a sub-coroutine, it will automatically be destroyed after completion. +// handle.destroy(); +``` + +### Task co_await Custom Type + +```c++ +#include "babylon/coroutine/task.h" + +using ::babylon::coroutine::Task; + +// For types that already meet the awaitable standards, no specialization is required. +// For example, tasks from other coroutine mechanisms can be directly co_awaited. +Task<...> some_coroutine(...) { + ... + ... = co_await some_awaitable; + ... +} + +// For types that do not meet the awaitable standard, a custom wrapper can be used to provide support. +template <> +class BasicCoroutinePromise::Transformer { + public: + // The first parameter is the promise of the coroutine initiating the co_await. It can be ignored if not needed. + // In Task's own await_transform, the promise is used to inherit the Executor. + static SomeCustomAwaitable await_transform(BasicCoroutinePromise&, + SomeCustomType&& some_custom_type) { + return to_custom_awaitable(::std::move(some_custom_type)); + } +}; +Task<...> some_coroutine(...) { + ... + ... = co_await some_custom_type; + ... +} +``` diff --git a/docs/coroutine/task.md b/docs/coroutine/task.zh-cn.md similarity index 81% rename from docs/coroutine/task.md rename to docs/coroutine/task.zh-cn.md index 61f47644..e47cc5d7 100644 --- a/docs/coroutine/task.md +++ b/docs/coroutine/task.zh-cn.md @@ -1,10 +1,12 @@ +**[[English]](task.en.md)** + # task ## 原理 -协程Task的执行方式设计为依托[Executor](../executor.md)完成;一个根协程需要通过提交到一个[Executor](../executor.md)来得到执行,提交后协程将被设定为绑定到对应[Executor](../executor.md)上;协程执行中,可以通过`co_await`另一个task的方式启动子协程,子协程默认也绑定到同一个[Executor](../executor.md)上,但也支持绑定到其他[Executor](../executor.md);当一个Task中断并恢复后,会确保回到绑定的[Executor](../executor.md)内; +协程Task的执行方式设计为依托[Executor](../executor.zh-cn.md)完成;一个根协程需要通过提交到一个[Executor](../executor.zh-cn.md)来得到执行,提交后协程将被设定为绑定到对应[Executor](../executor.zh-cn.md)上;协程执行中,可以通过`co_await`另一个task的方式启动子协程,子协程默认也绑定到同一个[Executor](../executor.zh-cn.md)上,但也支持绑定到其他[Executor](../executor.zh-cn.md);当一个Task中断并恢复后,会确保回到绑定的[Executor](../executor.zh-cn.md)内; -协程Task默认支持`co_await`另一个Task以及一些其他babylon内置对象,例如[Future](../future.md);此外,也提供了基于模板特化的定制能力,用来支持用户将其他自定义类型接入框架变成awaitable; +协程Task默认支持`co_await`另一个Task以及一些其他babylon内置对象,例如[Future](../future.zh-cn.md);此外,也提供了基于模板特化的定制能力,用来支持用户将其他自定义类型接入框架变成awaitable; ## 用法示例 diff --git a/docs/executor.en.md b/docs/executor.en.md new file mode 100644 index 00000000..b6aa2e16 --- /dev/null +++ b/docs/executor.en.md @@ -0,0 +1,79 @@ +**[[简体中文]](executor.zh-cn.md)** + +# Executor + +## Principle + +`std::async` introduced an asynchronous execution framework based on the future/promise design pattern. However, its default mechanism creates a new thread for each asynchronous execution, and the built-in policy design lacks user extensibility, making it impractical for production environments. + +To provide a practical solution for asynchronous execution in a future/promise design pattern, and to simplify the interface of asynchronous frameworks, the Executor implements a user-extensible asynchronous execution framework. It includes common executors such as a serial executor and a thread pool executor, making asynchronous programming easier. The built-in thread pool executor uses a `ConcurrentBoundedQueue` to support stronger concurrency performance. + +## Usage + +### Basic Usage + +```c++ +#include + +using ::babylon::Executor; +using ::babylon::InplaceExecutor; +using ::babylon::ThreadPoolExecutor; + +// Submit a task for execution and get a future +Executor* executor = ... +auto future = executor->execute([] { + return 1 + 1; +}); +// The submitted function will be executed by the executor +// The execution may be asynchronous or synchronous, depending on the executor implementation +// But you can interact with and retrieve the result using the returned future +future.get(); // == 2 + +// Submit a task without tracking its future +Executor* executor = ... +int value = 0; +/* 0 == */ executor->submit([&] { + value = 1 + 1; +}); +// The submitted function will be executed by the executor, but no future is returned to track it +// The execution may be asynchronous or synchronous, depending on the executor implementation +// Mainly used to reduce the overhead of constructing a future when tracking is unnecessary +value; // == 0 +usleep(100000); // Demonstrating asynchronous execution effect; in practice, callback chains or other patterns are used +value; // == 2 + +// Inplace executor that directly executes the function on the current thread and returns after completion +// Mainly used in unit testing or debugging scenarios +InplaceExecutor executor; + +// A drawback of the inplace executor is that submitting tasks within a task may cause recursion, leading to stack overflow +// In production environments, if you use a serial executor, you need to enable breadth-first expansion to avoid recursion +InplaceExecutor executor {true}; + +// A practical thread pool executor +ThreadPoolExecutor executor; +// The thread pool must be initialized with the number of threads and queue capacity before use +// Submitting new tasks after the queue is full will block until space is freed +executor.initialize(thread_num, queue_capacity); +... // Use the executor +// Wait for all submitted tasks to complete and shut down the execution threads +executor.stop(); +``` + +### Extend a New Executor + +```c++ +#include + +using ::babylon::Executor; +using ::babylon::MoveOnlyFunction; + +class MyExecutor : public Executor { + int invoke(MoveOnlyFunction&& function) noexcept override { + function(); // Can execute the function in-place + s.saved_function = std::move(function); // Or move it to an asynchronous environment + ... + return 0; // Return 0 to confirm successful submission + } +}; +``` diff --git a/docs/executor.md b/docs/executor.zh-cn.md similarity index 98% rename from docs/executor.md rename to docs/executor.zh-cn.md index ff89a6ab..f623af2e 100644 --- a/docs/executor.md +++ b/docs/executor.zh-cn.md @@ -1,3 +1,5 @@ +**[[English]](executor.en.md)** + # executor ## 原理 diff --git a/docs/future.en.md b/docs/future.en.md new file mode 100644 index 00000000..289b605a --- /dev/null +++ b/docs/future.en.md @@ -0,0 +1,95 @@ +**[[简体中文]](future.zh-cn.md)** + +# Future + +## Principle + +This `Future` implementation is modeled after `std::future` with additional features, including: + +- Support for custom `SchedInterface` through template parameters, enabling usage in coroutine environments such as `bthread`. A practical example of combining this with `bthread` can be found in [example/use-with-bthread](https://github.com/baidu/babylon/tree/main/example/use-with-bthread). +- Added `on_finish`/`then` functionalities to enable asynchronous chaining of tasks. + +## Usage + +### Future + +```c++ +#include + +using ::babylon::Future; +using ::babylon::Promise; + +{ + Promise promise; + auto future = promise.get_future(); + ::std::thread thread([&] () { + // Perform some asynchronous operations + ... + // Set the final value + promise.set_value(10086); + }); + future.get(); // Waits for set_value, result == 10086 +} + +{ + // Example using an XSchedInterface coroutine mechanism for synchronization + Promise promise; + auto future = promise.get_future(); + XThread thread([&] () { + // Perform some asynchronous operations + ... + // Set the final value + promise.set_value(10086); + }); + future.get(); // Waits for set_value (using XSchedInterface for coroutine synchronization without occupying pthread workers), result == 10086 +} + +{ + Promise promise; + auto future = promise.get_future(); + // Move-capture promise to avoid destruction out of scope + ::std::thread thread([promise = ::std::move(promise)] () mutable { + // Perform some asynchronous operations + ... + // Set the final value + promise.set_value(10086); + }); + Promise promise2; + auto future2 = promise2.get_future(); + future.on_finish([promise2 = ::std::move(promise2)] (int&& value) { + // Called by set_value, with value == 10086 + promise2.set_value(value + 10010); + }); + // After on_finish, future is no longer available, cannot be ready or get + future2.get(); // Waits for set_value, result == 20096 +} + +// For further details, see comments and test cases +// Unit tests: test/test_future.cpp +``` + +### CountDownLatch + +```c++ +#include + +using ::babylon::CountDownLatch; + +{ + // Expecting to join 10 asynchronous results + CountDownLatch<> latch(10); + auto future = latch.get_future(); + ::std::vector<::std::thread> threads; + for (size_t i = 0; i < 10; ++i) { + threads.emplace_back([&] () { + // Each asynchronous result reports completion with count_down + latch.count_down(); + }); + } + future.get(); // Waits until the count decrements to 0 + + future.on_finish([] (int) { + // It's also a future, so you can chain further asynchronous tasks + }); +} +``` diff --git a/docs/future.md b/docs/future.zh-cn.md similarity index 97% rename from docs/future.md rename to docs/future.zh-cn.md index fbaf3967..018e3d60 100644 --- a/docs/future.md +++ b/docs/future.zh-cn.md @@ -1,3 +1,5 @@ +**[[English]](future.en.md)** + # future ## 原理 @@ -91,7 +93,4 @@ using ::babylon::CountDownLatch; // 也是个future,所以也可以进行异步串联 }); } - -// 更说明见注释 -// 单测test/test_future.cpp ``` diff --git a/docs/logging/README.en.md b/docs/logging/README.en.md new file mode 100644 index 00000000..82302235 --- /dev/null +++ b/docs/logging/README.en.md @@ -0,0 +1,39 @@ +**[[简体中文]](README.zh-cn.md)** + +# Logging + +## Background and Principles + +In server-side applications, logging typically involves decoupling the process of generating log entries from the actual writing to disk due to the unpredictable time required to complete write operations, which are influenced by various kernel and device factors. Most independent logging frameworks, such as [spdlog](https://github.com/gabime/spdlog) and [boost.log](https://github.com/boostorg/log), include built-in asynchronous mechanisms. Another widely used logging framework, [glog](https://github.com/google/glog), does not have a built-in asynchronous solution, but it offers extension points. In practice, frameworks like [Apollo](https://github.com/ApolloAuto/apollo/blob/master/cyber/logger/async_logger.h) and [brpc](https://github.com/apache/brpc/blob/master/src/butil/logging.cc) often include built-in asynchronous plugins. + +However, common implementations tend to have a few typical performance bottlenecks: +- The synchronization mechanism used to decouple log assembly from log writing often relies on lock-based synchronization, which can degrade performance under high contention. +- Logs are often stored in memory blocks of variable lengths. This design usually involves dynamic memory allocation and deallocation, with cross-thread transfers that can bypass thread-local memory caches in allocators. +- Some implementations overlook the global lock issue associated with `localtime` calculations, which can also lead to multithreading contention. + +![](images/logging-classic.png) + +A noteworthy logging framework, [NanoLog](https://github.com/PlatformLab/NanoLog), avoids the above-mentioned memory issues by using a thread-local caching mechanism. It also uses a unique static format spec to reduce the amount of information written to disk, which results in improved performance. However, this optimization is restricted to `printf`-like scenarios, and it is less compatible with streaming serialization systems (e.g., `operator<<`), limiting its applicability. Nevertheless, in scenarios where these restrictions are acceptable, NanoLog’s approach provides excellent performance by effectively addressing typical performance bottlenecks caused by contention. + +![](images/logging-nano.png) + +Thread-local caching is a valuable optimization technique to reduce lock contention, but when combined with production environment challenges such as thread scaling and sporadic device delays, it requires significant increases in thread cache space to adapt. To address this, a solution combining a unified lock-free queue and a fixed-size lock-free memory pool is proposed: `AsyncFileAppender`. On the front end, a custom `streambuf` implemented on fixed-size paged memory captures log output into a paged-managed `LogEntry`. This `LogEntry` is then pushed into a central lock-free queue for asynchronous decoupling. The central `Appender` backend consumes these entries and completes the log writing, eventually releasing the pages back to the fixed-size memory pool for reuse in future log entries. + +![](images/logging-async.png) + +On top of `AsyncFileAppender`, a separate `Logger` layer was designed with two primary goals: +- The `Logger` layer adopts a hierarchical tree-based concept similar to [log4j](https://github.com/apache/logging-log4j2), offering complex management capabilities that are relatively rare in the C++ ecosystem. It aims to provide a similar framework but with memory management principles better aligned with C++. +- This decouples the `AsyncFileAppender` mechanism from the actual logging interface, offering a clean interface that can be integrated into existing logging frameworks in production environments. Even within Baidu, the most common usage of `AsyncFileAppender` is as a bottom-layer asynchronous solution integrated into existing internal logging systems, rather than as a standalone logging solution. +- The Babylon project also requires logging functionality, and a lightweight `Logger` layer allows users to integrate with their existing logging systems without requiring a complete switch to the `AsyncFileAppender` mechanism. For mature systems, this provides a more user-friendly integration method by offering flexibility in choosing the logging solution. + +![](images/logging-logger.png) + +## Documentation + +- [Logger](logger.en.md) +- [AsyncFileAppender](async_file_appender.en.md) + +## Examples + +- [Use async logger](../../example/use-async-logger) +- [Use with glog](../../example/use-with-glog) diff --git a/docs/logging/README.md b/docs/logging/README.md new file mode 120000 index 00000000..b636b478 --- /dev/null +++ b/docs/logging/README.md @@ -0,0 +1 @@ +README.zh-cn.md \ No newline at end of file diff --git a/docs/logging/index.md b/docs/logging/README.zh-cn.md similarity index 97% rename from docs/logging/index.md rename to docs/logging/README.zh-cn.md index e00e9c80..a0b98899 100644 --- a/docs/logging/index.md +++ b/docs/logging/README.zh-cn.md @@ -1,3 +1,5 @@ +**[[English]](README.en.md)** + # logging ## 背景和原理 @@ -28,8 +30,8 @@ ## 功能文档 -- [logger](logger.md) -- [async_file_appender](async_file_appender.md) +- [logger](logger.zh-cn.md) +- [async_file_appender](async_file_appender.zh-cn.md) ## 典型用例 diff --git a/docs/logging/async_file_appender.en.md b/docs/logging/async_file_appender.en.md new file mode 100644 index 00000000..f140928d --- /dev/null +++ b/docs/logging/async_file_appender.en.md @@ -0,0 +1,127 @@ +**[[简体中文]](async_file_appender.zh-cn.md)** + +# async_file_appender + +## LogEntry & LogStreamBuffer + +`LogStreamBuffer` is an implementation of `std::stringbuf`, with actual memory management handled in fixed-size paged memory. `LogEntry` serves as the maintenance structure for this fixed-size paged memory. + +### Usage Example + +```c++ +#include "babylon/logging/log_entry.h" + +using babylon::LogStreamBuffer; +using babylon::LogEntry; + +// Before using LogStreamBuffer, a PageAllocator must be set +PageAllocator& page_allocator = ... +LogStreamBuffer buffer; +buffer.set_page_allocator(page_allocator); + +// The buffer can be reused multiple times +loop: + buffer.begin(); // Each use requires begin to trigger preparation actions + buffer.sputn(...); // Writing can proceed afterward; typically, this is not called directly, but acts as the underlying mechanism of LogStream + LogEntry& entry = buffer.end(); // Writing is complete, returning the final assembled result + ... // LogEntry itself is only the size of one cache line, allowing for lightweight copying and transferring + +consumer: + ::std::vector iov; + // Generally, LogEntry is transferred to the consumer via an asynchronous queue + LogEntry& entry = ... + // Append into an iovec structure for easy integration with writev + entry.append_to_iovec(page_allocator.page_size(), iov); +``` + +## FileObject + +`FileObject` is an abstraction for log writing targets, providing a usable file descriptor (fd) externally. For scenarios requiring rotation, it manages the rotation and old file handling internally. + +```c++ +#include "babylon/logging/file_object.h" + +using babylon::FileObject; + +class CustomFileObject : public FileObject { + // Core functionality function, must be called by the upper layer before each write to obtain the file descriptor + // This function performs file rotation checks and other operations internally, returning the final prepared descriptor + // Since file rotation may occur, the return value is a tuple of new and old descriptors (fd, old_fd) + // fd: + // >=0: Current file descriptor, subsequent writes by the caller are initiated using this descriptor + // < 0: An exception occurred, and the file cannot be opened + // old_fd: + // >=0: File switching has occurred, returning the previous file descriptor + // Usually caused by file rotation; the caller needs to perform a close action + // Before closing, the caller can perform final write operations, etc. + // < 0: No file switching has occurred + virtual ::std::tuple check_and_get_file_descriptor() noexcept override { + ... + } +}; +``` + +## RollingFileObject + +Implements a `FileObject` for rolling files, supporting rotation based on time intervals and providing capabilities for quantitative retention and cleanup. + +```c++ +#include "babylon/logging/rolling_file_object.h" + +using babylon::RollingFileObject; + +RollingFileObject object; +object.set_directory("dir"); // Directory for log files +object.set_file_pattern("name.%Y-%m-%d"); // Log file name template, supports strftime syntax + // When the time-driven file name changes, file rotation occurs +object.set_max_file_number(7); // Maximum number of files to retain + +// Actual files will be written with names like this +// dir/name.2024-07-18 +// dir/name.2024-07-19 + +// Calling this interface during startup scans the directory and records existing files matching the pattern +// It adds them to the tracking list to support continued proper file retention during restart scenarios +object.scan_and_tracking_existing_files(); + +loop: + // Check if the tracked list has exceeded the retention limit; if so, perform cleanup + object.delete_expire_files(); + // In some scenarios, processes may simultaneously output many log files + // Actively calling this allows for background thread implementation of all log expiration deletions + ... + sleep(1); +``` + +## AsyncFileAppender & AsyncLogStream + +`AsyncFileAppender` implements queued transmission of `LogEntry` and performs asynchronous writing to `FileObject`. `AsyncLogStream` wraps `AsyncFileAppender`, `FileObject`, and `LogStreamBuffer`, connecting to the Logger mechanism. + +```c++ +#include "babylon/logging/async_log_stream.h" + +using babylon::AsyncFileAppender; +using babylon::AsyncLogStream; +using babylon::FileObject; +using babylon::LoggerBuilder; +using babylon::PageAllocator; + +// Prepare a PageAllocator and a FileObject& +PageAllocator& page_allocator = ... +FileObject& file_object = ... + +AsyncFileAppender appender; +appender.set_page_allocator(page_allocator); +// Set the queue length +appender.set_queue_capacity(65536); +appender.initialize(); + +// Combine AsyncFileAppender and FileObject to create an AsyncLogStream capable of generating a Logger +LoggerBuilder builder; +builder.set_log_stream_creator(AsyncLogStream::creator(appender, object)); +LoggerManager::instance().set_root_builder(::std::move(builder)); +LoggerManager::instance().apply(); + +// Logging macros will start taking effect afterward +BABYLON_LOG(INFO) << ... +``` diff --git a/docs/logging/async_file_appender.md b/docs/logging/async_file_appender.zh-cn.md similarity index 99% rename from docs/logging/async_file_appender.md rename to docs/logging/async_file_appender.zh-cn.md index 83bc2d68..b09728a1 100644 --- a/docs/logging/async_file_appender.md +++ b/docs/logging/async_file_appender.zh-cn.md @@ -1,3 +1,5 @@ +**[[English]](async_file_appender.en.md)** + # async_file_appender ## LogEntry&LogStreamBuffer diff --git a/docs/logging/logger.en.md b/docs/logging/logger.en.md new file mode 100644 index 00000000..ad217a03 --- /dev/null +++ b/docs/logging/logger.en.md @@ -0,0 +1,148 @@ +**[[简体中文]](logger.zh-cn.md)** + +# logger + +## LogStream + +`LogStream` is designed as a baseline interface for implementing streaming log macros, providing two extension points through inheritance: +- It uses `std::streambuf` as the underlying buffer provider, facilitating integration with other streaming log ecosystems. +- It introduces `begin/end` plugin points, allowing non-streaming log ecosystems to format data in a thread-local buffer before exporting it completely. Native implementations can also leverage the `begin/end` plugin points to create custom layouts. + +### Usage Example + +```c++ +#include "babylon/logging/log_stream.h" + +using babylon::LogStream; + +// Implement a meaningful LogStream through inheritance +class SomeLogStream : public LogStream { + // The base class constructor must receive a usable subclass of std::streambuf + // All write operations ultimately affect this buffer + SomeLogStream() : LogStream(stringbuf) {} + + // Additional actions taken at the beginning and end of a log transaction + virtual void do_begin() noexcept override { + *this << ... // Typical usage is to implement the prefix output for the log header + } + virtual void do_end() noexcept override { + write('\n'); // Generally, a newline is needed for text logs + // No default implementation is provided to also express non-text logs + ... // Typically, the final submission to the actual backend of the logging system is required + } + + ... + + // Here, std::stringbuf is used as an example; it can actually be any custom stream buffer + std::stringbuf stringbuf; +}; + +// Typically, LogStream is not used directly, but rather through logging macros provided by Logger +// The macro will automatically manage the calls to begin/end +LogStream& ls = ... +ls.begin(); +ls << ... +ls.end(); + +// In addition to stream operators, printf-like formatting actions are also supported +// The actual formatting functionality is provided by absl::Format +ls.format("some spec %d", value); +``` + +## Logger & LoggerBuilder + +The actual visible logging actions are performed by `Logger`, not directly using `LogStream`. The `Logger` mainly provides two capabilities: +- The log stream `LogStream` is generally non-thread-safe; `Logger` uses `ThreadLocal` for competition protection. +- It allows setting log levels and configuring separate `LogStream`s for each level, primarily supporting scenarios where two different level streams may need to be written simultaneously in no-flush mode (e.g., logging a warning during the assembly of an info log). + +`Logger` is solely for use; a `Logger` is created through `LoggerBuilder`. + +### Usage Example + +```c++ +#include "babylon/logging/logger.h" + +using babylon::Logger; +using babylon::LoggerBuilder; +using babylon::LogSeverity; + +LoggerBuilder builder; +// Set a unified LogStream for all log levels +// Since an instance will be created once for each thread, +// a function to create the instance is passed in instead of a pre-created instance +builder.set_log_stream_creator([] { + auto ls = ... + return std::unique_ptr(ls); +}); +// A separate LogStream can also be set for a specific level +builder.set_log_stream_creator(LogSeverity::INFO, [] { + auto ls = ... + return std::unique_ptr(ls); +}); +// Set the minimum log level; any logs below this level will be treated with an empty LogStream, regardless of settings +// The log macro will also recognize the minimum level and skip the entire stream operation for lower levels +// LogSeverity includes four levels: {DEBUG, INFO, WARNING, FATAL} +builder.set_min_severity(LogSeverity min_severity); + +// Construct a usable Logger according to the settings +Logger logger = builder.build(); + +// Use the logger to print logs at the specified level +// *** can append other business common headers, adhering to the no-flush rule to output only once +// ... is the normal log stream input +BABYLON_LOG_STREAM(logger, INFO, ***) << ... +``` + +## LoggerManager + +`LoggerManager` maintains a hierarchical Logger tree. It is used for configuration during initialization and for obtaining Logger instances from various levels at runtime. The transition between configuration and runtime is completed through explicit initialization actions. +- An unconfigured Logger tree will cause any obtained Logger to exhibit default behavior, outputting to standard error. +- Once the Logger is initialized, all previously obtained Loggers that exhibited default behavior will switch to the actual initialized Logger in a thread-safe manner. +- New Logger nodes can be dynamically added to the Logger tree or their log levels changed, also in a thread-safe manner. + +### Usage Example + +```c++ +#include "babylon/logging/logger.h" + +using babylon::LoggerManager; + +// LoggerManager is used via a global singleton +auto& manager = LoggerManager::instance(); + +// Obtain a logger at a certain level based on its name +// The name is used to find the configuration hierarchically; for example, for name = "a.b.c", +// it will attempt "a.b.c" -> "a.b" -> "a" -> root in sequence. +// The hierarchy also supports "a::b::c" -> "a::b" -> "a" -> root. +auto& logger = manager.get_logger("..."); +// Directly obtain the root logger +auto& root_logger = manager.get_root_logger(); + +// Any Logger obtained before any setup actions are in default state, all outputting to standard error + +// Construct the builder that will take effect +LoggerBuilder&& builder = ... +// Set the root logger +manager.set_root_builder(builder); +// Set a logger for a specific level +manager.set_builder("a.b.c", builder); +// All settings will only take effect after apply +manager.apply(); + +// After the setup is complete, any Logger obtained "before" will change from the default state to the correct state +// The transition is thread-safe, and Loggers in use will be handled correctly +// Any Logger obtained "after" will be in the correctly configured state + +// The default macros will also use the root logger +BABYLON_LOG(INFO) << ... +// Use the specified logger +BABYLON_LOG_STREAM(logger, INFO) << ... + +// Generally, the end of a statement is treated as the end of the log submission +// Supports using noflush for gradual assembly +BABYLON_LOG(INFO) << "some " << ::babylon::noflush; +BABYLON_LOG(INFO) << "thing"; // Outputs [header] some thing + +// Supports printf-like formatting functionality, with support provided by the abseil-cpp library +BABYLON_LOG(INFO).format("hello %s", world).format(" +%d", 10086); +``` diff --git a/docs/logging/logger.md b/docs/logging/logger.zh-cn.md similarity index 99% rename from docs/logging/logger.md rename to docs/logging/logger.zh-cn.md index e0fb8e38..459f54cc 100644 --- a/docs/logging/logger.md +++ b/docs/logging/logger.zh-cn.md @@ -1,3 +1,5 @@ +**[[English]](logger.en.md)** + # logger ## LogStream diff --git a/docs/reusable/README.en.md b/docs/reusable/README.en.md new file mode 100644 index 00000000..88d7693f --- /dev/null +++ b/docs/reusable/README.en.md @@ -0,0 +1,12 @@ +**[[简体中文]](README.zh-cn.md)** + +# reusable + +Memory Pool and Perfect Rebuild Mechanism + +- [allocator](allocator.en.md) +- [manager](manager.en.md) +- [memory_resource](memory_resource.en.md) +- [page_allocator](page_allocator.en.md) +- [traits](traits.en.md) +- [vector](vector.en.md) diff --git a/docs/reusable/README.md b/docs/reusable/README.md new file mode 120000 index 00000000..b636b478 --- /dev/null +++ b/docs/reusable/README.md @@ -0,0 +1 @@ +README.zh-cn.md \ No newline at end of file diff --git a/docs/reusable/README.zh-cn.md b/docs/reusable/README.zh-cn.md new file mode 100644 index 00000000..b99a6d96 --- /dev/null +++ b/docs/reusable/README.zh-cn.md @@ -0,0 +1,12 @@ +**[[English]](README.en.md)** + +# reusable + +内存池和完美重建机制 + +- [allocator](allocator.zh-cn.md) +- [manager](manager.zh-cn.md) +- [memory_resource](memory_resource.zh-cn.md) +- [page_allocator](page_allocator.zh-cn.md) +- [traits](traits.zh-cn.md) +- [vector](vector.zh-cn.md) diff --git a/docs/reusable/allocator.en.md b/docs/reusable/allocator.en.md new file mode 100644 index 00000000..cf117775 --- /dev/null +++ b/docs/reusable/allocator.en.md @@ -0,0 +1,71 @@ +**[[简体中文]](allocator.zh-cn.md)** + +# allocator + +## MonotonicAllocator + +A memory allocator based on `memory_resource`, supporting the construction of STL containers, similar to `std::pmr::polymorphic_allocator`. The main differences are: + +- Added `create_object` interface, which allows constructing instances with managed lifetimes, similar to `google::protobuf::Arena`'s `CreateMessage` functionality. +- Compatible with a non-polymorphic interface mode, which can save some virtual function overhead. +- Specially wraps `babylon::SwissMemoryResource` to provide one-stop support for constructing protobuf message types. + +## Usage + +```c++ +#include + +using ::babylon::ExclusiveMonotonicMemoryResource; +using ::babylon::MonotonicAllocator; +using ::babylon::MonotonicMemoryResource; +using ::babylon::SwissAllocator; +using ::babylon::SwissMemoryResource; + +// An allocator for type T +// Uses M as the underlying memory resource, with M defaulting to MonotonicMemoryResource +// The constructor can take a subclass, and the actual allocation is polymorphic +MonotonicAllocator allocator(memory_resource); + +// You can also specify an actual subclass, allowing allocation to bypass virtual functions +MonotonicAllocator allocator(memory_resource); + +// Supports basic allocator functionality +auto* ptr = allocator.allocate(1); +allocator.construct(ptr); +... +allocator.destroy(ptr); + +// Supports allocation and construction in one step +auto* ptr = allocator.new_object(args...); +... +allocator.destroy_object(ptr); + +// Supports lifetime management +// Actual destruction occurs when memory_resource.release() is called +auto* ptr = allocator.create_object(args...); + +// Memory allocated remains valid until memory_resource is released +// Managed instances are also destructed at this point +memory_resource.release(); + +// Supports "uses allocator" construction protocol +struct S { + using allocator_type = MonotonicMemoryResource<>; + S(const std::string_view sv, allocator_type allocator) : + allocator(allocator) { + buffer = allocator.allocate(sv.size() + 1); + memcpy(buffer, sv.data(), sv.size()); + buffer[sv.size()] = '\0'; + } + allocator_type allocator; + char* buffer; +}; +// Construct an instance using the allocator +auto* s = allocator.create_object("12345"); +s->buffer // "12345" and allocated on memory_resource + +// Supports protobuf messages +SwissAllocator<> swiss_allocator(swiss_memory_resource); +auto* m = swiss_allocator.create_object(); +m->GetArena(); // Constructed in the Arena built into SwissMemoryResource +``` diff --git a/docs/reusable/allocator.md b/docs/reusable/allocator.zh-cn.md similarity index 97% rename from docs/reusable/allocator.md rename to docs/reusable/allocator.zh-cn.md index 9b7de688..cac85f55 100644 --- a/docs/reusable/allocator.md +++ b/docs/reusable/allocator.zh-cn.md @@ -1,3 +1,5 @@ +**[[English]](allocator.en.md)** + # allocator ## MonotonicAllocator @@ -66,7 +68,4 @@ s->buffer // "12345"且分配在memory_resource上 SwissAllocator<> swiss_allocator(swiss_memory_resource); auto* m = swiss_allocator.create_object(); m->GetArena(); // 构造在SwissMemoryResource内置的Arena上 - -// 更说明见注释 -// 单测test/test_reusable_allocator.cpp ``` diff --git a/docs/reusable/index.md b/docs/reusable/index.md deleted file mode 100644 index 43518a52..00000000 --- a/docs/reusable/index.md +++ /dev/null @@ -1,10 +0,0 @@ -# reusable - -内存池和完美重建机制 - -- [allocator](allocator.md) -- [manager](manager.md) -- [memory_resource](memory_resource.md) -- [page_allocator](page_allocator.md) -- [traits](traits.md) -- [vector](vector.md) diff --git a/docs/reusable/manager.en.md b/docs/reusable/manager.en.md new file mode 100644 index 00000000..96ee2d2c --- /dev/null +++ b/docs/reusable/manager.en.md @@ -0,0 +1,43 @@ +**[[简体中文]](manager.zh-cn.md)** + +# manager + +## ReusableManager + +A "perfect reuse" object manager that internally contains a `MonotonicMemoryResource` memory pool, allowing instances to be created based on the pool. The result returned is an accessor to the corresponding instance, not the instance pointer itself. This is because the manager periodically reconstructs instances according to the "perfect reuse" protocol, meaning the instance pointers are not fixed during this process. + +- The instances are required to comply with the "perfect reuse" protocol, i.e., `ReusableTraits::REUSABLE`. +- The instances must support `MonotonicMemoryResource`. + +In practical applications, a `ReusableManager` instance is set up according to each business lifecycle. For example, in an RPC server, a `ReusableManager` can be placed in the context of each request, and the types used during the request are managed by the `ReusableManager`. After the request is processed, `ReusableManager` is cleared, potentially triggering reconstruction. In the long run, as `ReusableManager` is continuously reused for processing requests, it can achieve nearly zero dynamic memory allocation. + +## Example Usage + +```c++ +#include "reusable/manager.h" +#include "reusable/vector.h" + +using ::babylon::SwissManager; +using ::babylon::SwissString; + +// Create a manager +SwissManager manager; + +// Use the manager to create an instance +auto pstring = manager.create(); + +// Use it like a regular instance pointer +pstring->resize(10); + +// Finish using, and clear the manager +// Ensure that no instances returned by create are being concurrently used when calling clear +manager.clear() + +// At this point, the instance has reverted to its initial state and may have been reconstructed +// Do not use any raw pointers obtained before +// The returned instance accessor can continue to be used as normal +pstring->resize(100); + +// Then clear again +manager.clear(); +``` diff --git a/docs/reusable/manager.md b/docs/reusable/manager.zh-cn.md similarity index 96% rename from docs/reusable/manager.md rename to docs/reusable/manager.zh-cn.md index 4284eb35..3ba5dd07 100644 --- a/docs/reusable/manager.md +++ b/docs/reusable/manager.zh-cn.md @@ -1,3 +1,5 @@ +**[[English]](manager.en.md)** + # manager ## ReusableManager @@ -37,7 +39,4 @@ pstring->resize(100); // 然后再清理 manager.clear(); - -// 更说明见注释 -// 单测test/test_reusable_manager.cpp ``` diff --git a/docs/reusable/memory_resource.en.md b/docs/reusable/memory_resource.en.md new file mode 100644 index 00000000..4371b008 --- /dev/null +++ b/docs/reusable/memory_resource.en.md @@ -0,0 +1,80 @@ +**[[简体中文]](memory_resource.zh-cn.md)** + +# memory_resource + +## Principle + +Monotonic memory resources, functionally equivalent to `std::pmr::monotonic_buffer_resource`, are derived from `std::pmr::memory_resource` to support the `std::pmr::polymorphic_allocator` mechanism. They adopt a unified container type and support interfacing with different allocator implementations. The main differences are: + +- Similar to `google::protobuf::Arena`, it provides the ability to register destructor functions, allowing it to manage the lifecycle of instances in addition to memory allocation. +- Offers both thread-safe and non-thread-safe implementations, with the thread-safe version using `thread_local` caching to reduce contention. +- It uses a fixed-size page allocator (`page_allocator`) for underlying allocation, unlike the variable-size allocation used by `std::pmr::monotonic_buffer_resource` and `google::protobuf::Arena`, reducing the pressure on `malloc`. + +### ExclusiveMonotonicBufferResource + +![](images/exclusive.png) + +An exclusive monotonic memory resource, the basic implementation of monotonic memory resources. It is not thread-safe, and it batches multiple small allocation requests into full-page allocations from the underlying system, optimizing for scenarios with frequent small allocations. + +### SharedMonotonicBufferResource + +![](images/shared.png) + +Composed of a series of thread-local exclusive memory resources. Each thread uses its corresponding exclusive resource for allocations. + +### SwissMemoryResource + +A lightweight extension of `SharedMonotonicBufferResource`, which also supports being used as a `google::protobuf::Arena`. This is achieved by patching protobuf’s internal implementation, with the patch automatically applied when possible. If the patch is not applicable due to version issues or link order, it falls back to using a real `google::protobuf::Arena`, still ensuring functionality, though memory allocation is not uniformly managed through the `PageAllocator`. + +## Usage + +```c++ +#include + +using ::babylon::PageAllocator; +using ::babylon::ExclusiveMonotonicBufferResource; +using ::babylon::SharedMonotonicBufferResource; +using ::babylon::SwissMemoryResource; + +// By default, new/delete are used to allocate memory from the system in whole pages +ExclusiveMonotonicBufferResource resource; +SharedMonotonicBufferResource resource; +SwissMemoryResource resource; + +// Specify a page allocator to use for aggregating small allocations into full-page allocations, replacing the default SystemPageAllocator +PageAllocator& page_allocator = get_some_allocator(); +resource.set_page_allocator(page_allocator); + +// Specify an upstream allocator for large memory allocations, replacing the default std::pmr::new_delete_resource(); +std::pmr::memory_resource& memory_resource = get_some_resource(); +resource.set_upstream(memory_resource); + +// Can be used as a memory_resource to support std::pmr containers directly +::std::pmr::vector<::std::pmr::string> pmr_vector(&resource); + +// Can also be used directly for memory allocation +resource.allocate(bytes, alignment); + +// Alignment can be further accelerated by passing it as a template parameter +resource.allocate(bytes); + +// Destructor functions can be registered with the memory resource, and they will be called when release is invoked +// instance.~T() will be called for destruction, but note that this only calls the destructor, and does not attempt to free the instance's memory +// This is typically used for interfaces that mimic the Create semantics of google::protobuf::Arena +T* instance; +resource.register_destructor(instance); + +// A type-erased version of the destructor registration, supporting custom destruction methods +void destruct(void* instance) { + reinterpret_cast(instance)->~T(); +} +resource.register_destructor(instance, destruct); + +// On release, all registered destructors are invoked first, and then the allocated memory is collectively freed +// Ensure that any std::pmr containers using this memory resource are destroyed before release is called +resource.release(); + +// Unique to SwissMemoryResource: it can be implicitly converted for use as an arena +google::protobuf::Arena& arena = swiss_memory_resource; +T* message_on_arena = google::protobuf::Arena::CreateMessage(&arena); +``` diff --git a/docs/reusable/memory_resource.md b/docs/reusable/memory_resource.zh-cn.md similarity index 98% rename from docs/reusable/memory_resource.md rename to docs/reusable/memory_resource.zh-cn.md index 35a17346..76c4965f 100644 --- a/docs/reusable/memory_resource.md +++ b/docs/reusable/memory_resource.zh-cn.md @@ -1,3 +1,5 @@ +**[[English]](memory_resource.en.md)** + # memory_resource ## 原理 @@ -75,7 +77,4 @@ resource.release(); // SwissMemoryResource独有功能,可以隐式转换为arena使用 google::protobuf::Arena& arena = swiss_memory_resource; T* message_on_arena = google::protobuf::Arena::CreateMessage(&arena); - -// 更说明见注释 -// 单测test/test_reusable_memory_resource.cpp ``` diff --git a/docs/reusable/page_allocator.en.md b/docs/reusable/page_allocator.en.md new file mode 100644 index 00000000..4d34bd80 --- /dev/null +++ b/docs/reusable/page_allocator.en.md @@ -0,0 +1,55 @@ +**[[简体中文]](page_allocator.zh-cn.md)** + +# page_allocator + +## Principle + +![](images/page_allocator.png) + +A manager for allocating and freeing fixed-size memory blocks. Unlike general-purpose `malloc`, its fixed-size nature avoids complex implementations such as buddy algorithms, making it lighter and faster. In practical use, scattered small memory allocations are managed in aggregate through higher-level mechanisms like `memory_resource`. + +### SystemPageAllocator + +An allocator that allocates and frees memory in fixed system page sizes. It directly interfaces with `operator new` and `operator delete` at the system level. + +### CachedPageAllocator + +Frees memory blocks by caching them internally using `babylon::ConcurrentBoundedQueue` for reuse. When the cache overflows or underflows, it requests and releases memory from a lower-level allocator, such as `SystemPageAllocator`. + +## Usage + +```c++ +#include + +using ::babylon::PageAllocator; +using ::babylon::SystemPageAllocator; +using ::babylon::CachedPageAllocator; + +// The system page allocator is accessed as a singleton +auto& system_page_allocator = SystemPageAllocator::instance(); + +// The cached page allocator requires explicit construction +CachedPageAllocator cached_page_allocator; +// Set the upstream allocator from which memory blocks are retrieved, defaults to SystemPageAllocator::instance() +cached_page_allocator.set_upstream(page_allocator); +// Set the cache capacity +cached_page_allocator.set_free_page_capacity(128); + +// Retrieve the page size, which defaults to the system page size, typically 4096 +auto size = page_allocator.page_size(); + +// Allocation/Deallocation +void* pages[100]; +page_allocator.allocate(pages, 100); +page_allocator.deallocate(pages, 100); + +// Get the current number of cached pages and cache capacity +cached_page_allocator.free_page_num(); +cached_page_allocator.free_page_capacity(); + +// Retrieve the current cache hit data +auto summary = cached_page_allocator.cache_hit_summary(); +// summary.sum is the total number of cache hits +// summary.num is the total number of calls +// sum / num gives the hit rate, and calling this periodically to record the difference allows for hit rate monitoring +``` diff --git a/docs/reusable/page_allocator.md b/docs/reusable/page_allocator.zh-cn.md similarity index 96% rename from docs/reusable/page_allocator.md rename to docs/reusable/page_allocator.zh-cn.md index a6f8ce91..a3465c50 100644 --- a/docs/reusable/page_allocator.md +++ b/docs/reusable/page_allocator.zh-cn.md @@ -1,3 +1,5 @@ +**[[English]](page_allocator.en.md)** + # page_allocator ## 原理 @@ -50,7 +52,4 @@ auto summary = cached_page_allocator.cache_hit_summary(); // summary.sum为总命中量 // summary.num为总调用量 // sum / num可以得到命中率,周期调用并记录差值可以得到命中率监控 - -// 更说明见注释 -// 单测test/test_page_allocator.cpp ``` diff --git a/docs/reusable/traits.en.md b/docs/reusable/traits.en.md new file mode 100644 index 00000000..95227ae7 --- /dev/null +++ b/docs/reusable/traits.en.md @@ -0,0 +1,108 @@ +**[[简体中文]](traits.zh-cn.md)** + +# traits + +## Principle + +This describes the traits required for achieving "perfect reuse," which combines the advantages of both memory pools and object pools. The concept is characterized by: + +- Memory allocation on a contiguous memory pool, implemented via an accompanying allocator. +- Support for logical clearing, which means not actually destroying contained elements or freeing memory, but instead resetting usage markers to return to a logically initialized state. These elements can be reused when accessed again. +- Instances in the memory pool may intermittently exceed the previous maximum capacity, leading to new element allocations and breaking memory pool continuity, resulting in memory holes and waste. This problem is solved by periodically "recursively extracting capacity" and "reserving capacity" for new instances while maintaining memory continuity. + +![](images/reuse.png) + +This image illustrates the meaning of logical clearing. Unlike typical `vector` operations, after shrinking in size via `clear` or `pop_back`, the objects inside are not actually destroyed; they are simply reset. Thus, when this slot is reused, the already constructed elements can be used directly. + +![](images/reconstruct.png) + +This image illustrates the concept of reconstruction, where an auxiliary structure records the capacity of the memory and recursively records the capacity of each element. With this structure, the capacity can be recorded and restored in subsequent reconstructions, maintaining memory continuity. This concept is implemented through traits extraction rather than base class inheritance, primarily to support third-party implementations that cannot be modified, such as `google::protobuf::Message`. + +## Usage + +### Application Interface + +```c++ +#include +#include + +using ::babylon::ReusableTraits; +using ::babylon::Reuse; +using ::babylon::SwissAllocator; +using ::babylon::SwissMemoryResource; + +SwissMemoryResource resource; +SwissAllocator allocator; + +// REUSABLE is used to check if a type T implements the "perfect reuse" protocol +// Basic types, PODs, and Message types have built-in support by default +ReusableTraits::REUSABLE + +// A reusable instance is constructed similarly to a regular instance +auto instance = allocator.create_object(...); + +// Repeated usage and clearing +loop: + ... // Use the instance + // Logically clear an instance + Reuse::reconstruct(*instance, allocator); + +// Structure to record instance capacity +Reuse::AllocationMetadata meta; + +// Extract capacity +Reuse::update_allocation_metadata(*instance, meta); + +// Then, memory resources can be fully released +resource.release(); + +// Restore instance with recorded capacity and return to a contiguous memory state +instance = Reuse::create_with_allocation_metadata(allocator, meta); + +... // Proceed to the next usage cycle + +// For more details, see the comments +// Unit test: test/test_reusable_traits.cpp +``` + +### Extended Interface + +```c++ +#include +#include + +using ::babylon::ReusableTraits; +using ::babylon::SwissAllocator; + +// Custom members supporting reuse +class SomeClass { +public: + // This definition is not required for reuse traits, but for classes with dynamic memory, + // the allocator can be used to chain memory pools. + using allocator_type = SwissAllocator<>; + + // Define a structure to store capacity metadata + struct AllocationMetadata { + ... + }; + + // Function to extract capacity + void update_allocation_metadata(AllocationMetadata& meta) const { + ... + } + + // Constructor to restore capacity + SomeClass(const AllocationMetadata& meta) { + ... + } + // Constructor to restore capacity with a memory pool allocator + SomeClass(const AllocationMetadata& meta, allocator_type allocator) { + ... + } + + // Logical clearing function + void clear() { + ... + } +}; +``` diff --git a/docs/reusable/traits.md b/docs/reusable/traits.zh-cn.md similarity index 99% rename from docs/reusable/traits.md rename to docs/reusable/traits.zh-cn.md index 554fbe45..8ac61be1 100644 --- a/docs/reusable/traits.md +++ b/docs/reusable/traits.zh-cn.md @@ -1,3 +1,5 @@ +**[[English]](traits.en.md)** + # traits ## 原理 diff --git a/docs/reusable/vector.en.md b/docs/reusable/vector.en.md new file mode 100644 index 00000000..f6e2073f --- /dev/null +++ b/docs/reusable/vector.en.md @@ -0,0 +1,28 @@ +**[[简体中文]](vector.zh-cn.md)** + +# vector + +A reusable `ReusableVector` that conforms to `ReusableTraits`. + +When performing operations like `clear` or `pop_back`, the existing contents are not destructed. Subsequent operations like `emplace_back` or `push_back` will attempt to reuse the structure using `ReusableTraits::reconstruct`. + +# Usage Example + +```c++ +#include "babylon/reusable/manager.h" +#include "babylon/reusable/vector.h" +#include "babylon/reusable/string.h" + +using ::babylon::SwissVector; +using ::babylon::SwissString; +using ::babylon::SwissManager; + +// Define a reusable manager +SwissManager manager; + +// Replace std::vector +auto pvector = manager.create>(); + +// Operate similarly to std::vector +pvector->emplace_back("10086"); +``` diff --git a/docs/reusable/vector.md b/docs/reusable/vector.zh-cn.md similarity index 91% rename from docs/reusable/vector.md rename to docs/reusable/vector.zh-cn.md index d27d9982..e70b8d45 100644 --- a/docs/reusable/vector.md +++ b/docs/reusable/vector.zh-cn.md @@ -1,3 +1,5 @@ +**[[English]](vector.en.md)** + # vector 可重用的ReusableVector,满足ReusableTratis @@ -23,7 +25,4 @@ auto pvector = manger.create>(); // 等用于std::vector进行操作 pvector->emplace_back("10086"); - -// 更说明见注释 -// 单测test/test_reusable_vector.cpp ``` diff --git a/docs/serialization.en.md b/docs/serialization.en.md new file mode 100644 index 00000000..34baefc6 --- /dev/null +++ b/docs/serialization.en.md @@ -0,0 +1,298 @@ +**[[简体中文]](serialization.zh-cn.md)** + +# serialization + +## Overview + +Google Protobuf offers a tag-based serialization mechanism that supports version compatibility through IDL-defined interfaces. However, there are scenarios where the IDL-defined serialization may not be sufficient: + +- For performance reasons, you may want to bypass Protobuf's internal memory organization for fundamental structures used in your program. +- Complex data structures in your program may not be directly expressible using Protobuf's `message` and `repeated` constructs. + +In such cases, custom structures require manual serialization, often necessitating the creation of parallel Protobuf messages and manual copying of values. To simplify the serialization of custom structures while retaining tag-based version compatibility, Babylon provides a set of declarative macros to support custom structure serialization. + +## Usage + +### Serialization Interface + +```c++ +#include + +using ::babylon::Serialization; +using ::babylon::SerializeTraits; + +// Serialization and deserialization are handled by the static functions of Serialization. +// The value must implement the babylon::SerializeTraits serialization protocol. +// The `success` return value indicates whether the operation succeeded. + +// Check if a type T is serializable. +// Even if a type is not serializable, related functions can still be called, but they will always return failure by default. +if (SerializeTraits::SERIALIZABLE) { +} + +// String-based serialization +std::string s; +success = Serialization::serialize_to_string(value, s); +success = Serialization::parse_from_string(s, value); + +// Coded stream-based serialization +google::protobuf::io::CodedOutputStream cos; +google::protobuf::io::CodedInputStream cis; +success = Serialization::serialize_to_coded_stream(value, cos); +success = Serialization::parse_from_coded_stream(cis, value); + +// Output debug string (string version) +std::string s; +success = Serialization::print_to_string(value, s); + +// Output debug string (stream version) +google::protobuf::io::ZeroCopyOutputStream os; +success = Serialization::print_to_stream(value, os); + +// The following types are natively supported: +// 1. Protobuf IDL-generated classes (subclasses of google::protobuf::MessageLite) +// 2. Basic types like bool, int*_t, uint*_t, float, double, enum, enum class, std::string +// 3. Basic containers: std::vector, std::unique_ptr +``` + +### Serialization Macro Declaration + +```c++ +#include + +// Custom type +class SomeType { +public: + ... // Regular method definitions + +private: + int32_t a; + float b; + std::vector c; + SomeOtherType d; // SomeOtherType must support serialization through a macro declaration + std::unique_ptr e; // Wrap one layer + SomeMessage f; // Proto IDL-defined classes are natively serializable + std::unique_ptr g; + ... + + // Declaring this makes SomeType serializable. + // Only the listed members will participate in the serialization process. + BABYLON_SERIALIZABLE(a, b, c, d, e, f, g, ...); +}; + +class SomeType { +public: + ... + +private: + ... + + // Declaring this makes SomeType cross-version serializable. + // Only the listed members will participate in the serialization process. + // Each member needs a unique tag number. + // During evolution, members can be added or removed, but tag numbers should not be reused. + BABYLON_COMPATIBLE((a, 1)(b, 2)(c, 3)...); +}; +``` + +### Including Base Class in Serialization + +```c++ +#include + +// A serializable base class +class BaseType { +public: + ... + +private: + ... + + // Normally defined as serializable + BABYLON_SERIALIZABLE(...); + // Or version compatible + BABYLON_COMPATIBLE(...); +}; + +class SubType : public BaseType { +public: + ... + +private: + ... + + // Serialize base class along with subclass members. + BABYLON_SERIALIZABLE_WITH_BASE(BaseType, ...); + // For version compatibility, BaseType is treated like a member and given a unique tag. + BABYLON_COMPATIBLE_WITH_BASE((BaseType, 1), ...); +}; +``` + +### Custom Serialization Function Implementation + +```c++ +#include + +// For complex types that cannot be easily adapted using macros, +// custom serialization support can be implemented directly. + +class SomeType { +public: + // You can omit this function. + // If implemented and returns true, the serialization will be version-compatible (equivalent to BABYLON_COMPATIBLE declaration). + // If it returns false, it behaves like the default BABYLON_SERIALIZABLE declaration. + static constexpr bool serialize_compatible() { + return ...; + } + void serialize(CodedOutputStream& os) const { + ... // Serialize this object and write to os + } + bool deserialize(CodedInputStream& is) { + ... // Read data from is and reconstruct this object + return ...; // Return true if successful, false if parsing fails + } + size_t calculate_serialized_size() const noexcept { + return ...; // Calculate and return the number of bytes after serialization + } + + ... // Other functions and member definitions +}; + +// If size calculation is complex, a caching mechanism can be used for optimization. +// You can declare additional functions to support this. +class SomeType { +public: + // Declare the complexity of size calculation: TRIVIAL, SIMPLE, or COMPLEX. + // TRIVIAL: Calculable at compile-time, SIMPLE: O(1), COMPLEX: O(n). + // If using a caching mechanism, this should represent the complexity of the cached calculation. + static constexpr int serialized_size_complexity() { + return SerializationHelper::SERIALIZED_SIZE_COMPLEXITY_SIMPLE; + } + + // Implement caching mechanism for size calculation. + // The framework guarantees that the calculate_serialized_size function will be called before serialize. + // In serialize, use serialized_size_cached for speedup by caching the previous calculate_serialized_size result. + size_t serialized_size_cached() const noexcept { + return ...; + } + + ... +}; +``` + +### Protocol Buffer Compatibility + +```c++ +//////////////////////////////////// +// test.proto Suppose the following proto definition exists +enum TestEnum { +}; + +message TestMessage { + optional bool b = 1; + optional int32 i32 = 4; + optional int64 i64 = 5; + optional uint32 u32 = 8; + optional uint64 u64 = 9; + optional float f = 16; + optional double d = 17; + optional TestEnum e = 18; + optional string s = 19; + optional bytes by = 20; + optional TestMessage m = 21; + + repeated bool rpb = 44 [packed = true]; + repeated int32 rpi32 = 47 [packed = true]; + repeated int64 rpi64 = 48 [packed = true]; + repeated uint32 rpu32 = 51 [packed = true]; + repeated uint64 rpu64 = 52 [packed = true]; + repeated float rpf = 59 [packed = true]; + repeated double rpd = 60 [packed = true]; + repeated TestEnum rpe = 61 [packed = true]; +}; +//////////////////////////////////// + +/////////////////////////////////// +// test.cc Suppose the following class definitions exist +#include + +enum TestEnum { +}; + +struct TestSubObject { + bool _b; + int32_t _i32; + int64_t _i64; + uint32_t _u32; + uint64_t _u64; + float _f; + double _d; + TestEnum _e; + ::std::string _s; + ::std::string _by; + ::std::vector<::std::string> _rs; + ::std::vector<::std::string> _rby; + ::std::vector _rpb; + ::std::vector _rpi32; + ::std::vector _rpi64; + ::std::vector _rpu32; + ::std::vector _rpu64; + ::std::vector _rpf; + ::std::vector _rpd; + ::std::vector _rpe; + + BABYLON_COMPATIBLE( + (_b, 1)(_i32, 4)(_i64, 5)(_u32, 8)(_u64, 9) + (_f, 16)(_d, 17)(_e, 18)(_s, 19)(_by, 20) + (_rs, 41)(_rby, 42) + (_rpb, 44)(_rpi32, 47)(_rpi64, 48)(_rpu32, 51)(_rpu64, 52) + (_rpf, 59)(_rpd, 60)(_rpe, 61) + ); +}; + +struct TestObject { + bool _b; + int32_t _i32; + int64_t _i64; + uint32_t _u32; + uint64_t _u64; + float _f; + double _d; + TestEnum _e; + ::std::string _s; + ::std::string _by; + TestSubObject _m; + ::std::vector<::std::string> _rs; + ::std::vector<::std::string> _rby; + ::std::vector _rm; + ::std::vector _rpb; + ::std::vector _rpi32; + ::std::vector _rpi64; + ::std::vector _rpu32; + ::std::vector _rpu64; + ::std::vector _rpf; + ::std::vector _rpd; + ::std::vector _rpe; + + BABYLON_COMPATIBLE( + (_b, 1)(_i32, 4)(_i64, 5)(_u32, 8)(_u64, 9) + (_f, 16)(_d, 17)(_e, 18)(_s, 19)(_by, 20) + (_m, 21)(_rs, 41)(_rby, 42)(_rm, 43) + (_rpb, 44)(_rpi32, 47)(_rpi64, 48)(_rpu32, 51)(_rpu64, 52) + (_rpf, 59)(_rpd, 60)(_rpe, 61) + ); +}; + +// struct -> message conversion for forward compatibility +TestObject s; +Serialization::serialize_to_string(s, str); +TestMessage m; +Serialization::parse_from_string(str, m); + +// message -> struct conversion for backward compatibility +TestMessage m; +Serialization::serialize_to_string(m, str); +TestObject s; +Serialization::parse_from_string(str, s); +/////////////////////////////////// +``` diff --git a/docs/serialization.md b/docs/serialization.zh-cn.md similarity index 99% rename from docs/serialization.md rename to docs/serialization.zh-cn.md index c42930ff..02a7d726 100644 --- a/docs/serialization.md +++ b/docs/serialization.zh-cn.md @@ -1,3 +1,5 @@ +**[[English]](serialization.en.md)** + # serialization ## 原理 diff --git a/docs/time.en.md b/docs/time.en.md new file mode 100644 index 00000000..e4caf26f --- /dev/null +++ b/docs/time.en.md @@ -0,0 +1,40 @@ +**[[简体中文]](time.zh-cn.md)** + +# Time + +## Principle + +When servers perform time formatting (a common scenario being log formatting), they often rely on the `localtime_r` function. However, `localtime_r` is constrained by POSIX standards, requiring `tzset` to be called each time, which checks for changes in the time zone setting files and causes a global lock. These operations make it unsuitable for high-concurrency scenarios. While setting the TZ environment variable can circumvent file check actions, the impacts of the global lock remain unavoidable. + +Typical industrial-grade logging systems usually address concurrency issues through replacement implementations. Known solutions include: + +- `absl::TimeZone`, which is part of google/cctz +- `apollo::cyber::common::LocalTime` + +These solutions eliminate the global lock. The implementation in `absl` is quite comprehensive, serving as a full-featured version of `localtime_r`. In contrast, the implementations in `comlog` and `apollo` simplify certain aspects, such as leap year calculations and daylight saving time support, optimizing performance relative to the full-featured version. + +Here, we propose a new optimization mechanism based on caching, maintaining full functionality while extending and further optimizing practical performance in the most common logging formatting domain. Since natural time typically progresses gradually, the relatively complex calculations for leap years and weeks are unlikely to trigger in most incremental scenarios, saving significant computational overhead. + +![](images/local_time.png) + +## Usage + +```c++ +#include "babylon/time.h" + +time_t time = ... // Obtain a timestamp + +tm local; +babylon::localtime(&time, &local); +// Functions identically to localtime_r(&time, &local), except +// - Each process will load the time zone file only once, and runtime modifications to the system time zone will not be hot-loaded. +``` + +## Functionality/Performance Comparison + +| | Time Zone Hot Switch | Leap Year Support | Daylight Saving Time Support | Week Support | Single-thread Performance | Four-thread Performance | +|-------------|:--------------------:|:-----------------:|:---------------------------:|:------------:|:------------------------:|:----------------------:| +| localtime_r | ✓ | ✓ | ✓ | ✓ | 282ns | 2061ns | +| absl | Not Supported | ✓ | ✓ | ✓ | 91ns | 92ns | +| apollo | Not Supported | 1901 ~ 2099 | Not Supported | Not Supported| 11ns | 11ns | +| babylon | Not Supported | ✓ | ✓ | ✓ | 7ns | 7ns | diff --git a/docs/time.md b/docs/time.zh-cn.md similarity index 96% rename from docs/time.md rename to docs/time.zh-cn.md index 290e28da..1e9278c1 100644 --- a/docs/time.md +++ b/docs/time.zh-cn.md @@ -1,3 +1,5 @@ +**[[English]](time.en.md)** + # time ## 原理 @@ -26,10 +28,6 @@ tm local; babylon::localtime(&time, &local); // 与localtime_r(&time, &local)效果一致,除了 // - 每个进程只会加载一次时区文件,运行时修改系统时区不会进行热加载 - -// 更说明见注释 -// 单测 test/test_time.cpp -// 性能对比 bench/bench_time.cpp ``` ## 功能/性能对比