Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add message pipe #27

Merged
merged 11 commits into from
Nov 30, 2024
Merged

Conversation

sitaowang1998
Copy link
Collaborator

@sitaowang1998 sitaowang1998 commented Nov 29, 2024

Description

Add task executor message through pipe.

Validation performed

  • lint:check
  • Unit test on devcontainer

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced new header and source files for enhanced error handling and messaging capabilities in the spider application.
    • Added asynchronous message handling functions for improved communication.
    • Enhanced build configuration with updated library dependencies and settings for position-independent code.
  • Bug Fixes

    • Updated error handling methods to ensure consistency and robustness in function execution and response generation.
  • Tests

    • Added new test cases for message-pipe functionality and updated existing tests for the FunctionManager to improve coverage and reliability.
  • Chores

    • Updated installation scripts for libraries to ensure compatibility and improved error handling.

Bump spdlog version to solve conflict with fmtlib. Add using external
fmt to spdlog build. Add PIC in fmt build. Add process to boost build.
Copy link

coderabbitai bot commented Nov 29, 2024

Walkthrough

The pull request introduces several enhancements to the build configuration and source files of the project. Key changes include updates to the CMakeLists.txt to enable Position Independent Code (PIC) and to expand the search for Boost components. New source files are added, including FunctionManager.cpp, BoostAsio.hpp, and TaskExecutorMessage.hpp, which enhance functionality related to error handling and message processing. Additionally, new test cases are introduced to validate the new features, and various scripts are updated for improved installation processes.

Changes

File Path Change Summary
CMakeLists.txt Added set(CMAKE_POSITION_INDEPENDENT_CODE ON), modified find_package(Boost ...), added add_compile_definitions(SPDLOG_FMT_EXTERNAL).
src/spider/CMakeLists.txt Updated SPIDER_CORE_SOURCES and SPIDER_WORKER_SOURCES, modified target sources and linked libraries for spider_worker.
src/spider/core/BoostAsio.hpp New header file added for Boost.Asio functionalities.
src/spider/worker/FunctionManager.cpp New source file added with functions for error handling and response creation.
src/spider/worker/FunctionManager.hpp Updated function signatures and added new methods for error and result handling.
src/spider/worker/TaskExecutorMessage.hpp New header file added defining enums and functions for message handling.
src/spider/worker/message_pipe.cpp Implemented message transmission and reception functionalities using Boost.Asio.
src/spider/worker/message_pipe.hpp New header file defining message communication functions.
tests/CMakeLists.txt Added new test source file and updated target sources and libraries for unit tests.
tests/worker/test-FunctionManager.cpp Updated test cases to use new response handling methods and added new tests.
tests/worker/test-MessagePipe.cpp New test file added for testing message-pipe functionality.
tools/scripts/lib_install/boost.sh Modified to include process library during Boost installation.
tools/scripts/lib_install/fmtlib.sh Added -DCMAKE_POSITION_INDEPENDENT_CODE=ON flag to CMake command.
tools/scripts/lib_install/linux/install-lib.sh Updated spdlog library version from 1.14.1 to 1.15.0.
tools/scripts/lib_install/spdlog.sh Added -DSPDLOG_FMT_EXTERNAL=ON flag to CMake command for external fmt library usage.

Possibly related PRs

Suggested reviewers

  • kirkrodrigues

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

🧹 Outside diff range and nitpick comments (25)
src/spider/.clang-format (1)

13-14: Remove unnecessary empty alternative in regex pattern

The regex pattern includes an empty alternative | at the start which serves no purpose and could be removed.

Apply this diff to simplify the pattern:

-  - Regex: "^<(|absl|boost|catch2|fmt|mariadb|msgpack|spdlog)"
+  - Regex: "^<(absl|boost|catch2|fmt|mariadb|msgpack|spdlog)"
src/spider/worker/message_pipe.hpp (1)

1-9: LGTM! Consider adding file documentation.

The include guards and header organization are well-structured. The IWYU pragmas indicate thoughtful dependency management.

Consider adding a brief file-level documentation comment explaining the purpose and usage of this message pipe facility:

 #ifndef SPIDER_WORKER_MESSAGE_PIPE_HPP
 #define SPIDER_WORKER_MESSAGE_PIPE_HPP
 
+/**
+ * @file message_pipe.hpp
+ * @brief Provides message passing functionality through pipes for worker communication
+ *
+ * This header defines interfaces for synchronous and asynchronous message passing
+ * using Boost.Asio pipes and POSIX stream descriptors.
+ */
+
 #include <functional>
tests/CMakeLists.txt (1)

16-20: Consider using CMake's path handling functions

The current path construction using string concatenation could be fragile. Consider using cmake_path() or file(REAL_PATH) for more robust path handling.

Here's a more robust approach:

 set(SPIDER_TEST_WORKER_SOURCES)
+set(SPIDER_SRC_DIR "${CMAKE_CURRENT_SOURCE_DIR}/../src/spider")
 foreach(worker_source ${SPIDER_WORKER_SOURCES})
-    list(APPEND SPIDER_TEST_WORKER_SOURCES "../src/spider/${worker_source}")
+    cmake_path(APPEND SPIDER_SRC_DIR "${worker_source}" OUTPUT_VARIABLE source_path)
+    list(APPEND SPIDER_TEST_WORKER_SOURCES "${source_path}")
 endforeach()
src/spider/CMakeLists.txt (2)

38-44: Separate headers from source files

Consider creating a separate SPIDER_WORKER_HEADERS variable for .hpp files to maintain better CMake structure.

 set(SPIDER_WORKER_SOURCES
     worker/message_pipe.cpp
+    CACHE INTERNAL
+    "spider worker source files"
+)
+
+set(SPIDER_WORKER_HEADERS
     worker/message_pipe.hpp
     worker/TaskExecutorMessage.hpp
     CACHE INTERNAL
-    "spider worker source files"
+    "spider worker header files"
)

50-59: Add version constraints for spdlog dependency

While Boost dependencies are well-specified, consider adding version constraints for spdlog to ensure compatibility.

+find_package(spdlog 1.9.0 REQUIRED)
 target_link_libraries(
     spider_worker
     PRIVATE
         Boost::headers
         Boost::filesystem
         Boost::process
         Boost::program_options
         Boost::system
         spdlog::spdlog
)
tools/scripts/lib_install/spdlog.sh (1)

Line range hint 1-7: Update dependency documentation

The script now requires the fmt library due to the external spdlog configuration, but this isn't documented in the dependencies section.

Add fmt to the dependencies list:

 # Dependencies:
 # - cmake
 # - curl
 # - g++
+# - fmt
 # NOTE: Dependencies should be installed outside the script to allow the script to be largely distro-agnostic
src/spider/worker/FunctionManager.cpp (1)

13-38: Consider adding error logging in the catch block

While the error handling is comprehensive, adding logging in the catch block would help with debugging when type errors occur.

    } catch (msgpack::type_error& e) {
+       // Log the error for debugging purposes
+       spdlog::debug("Failed to unpack message: {}", e.what());
        return std::nullopt;
    }
tests/worker/test-MessagePipe.cpp (3)

36-36: Consider making the timeout duration configurable.

The 5-second timeout is hardcoded. Consider making this a named constant or configuration parameter for better maintainability.

+    static constexpr auto RESPONSE_TIMEOUT = std::chrono::seconds(5);
-    REQUIRE(future.wait_for(std::chrono::seconds(5)) == std::future_status::ready);
+    REQUIRE(future.wait_for(RESPONSE_TIMEOUT) == std::future_status::ready);

41-52: Simplify nested validation logic.

The nested if statements can be simplified since we've already verified the presence of values with REQUIRE.

-    if (response_option.has_value()) {
-        msgpack::sbuffer const& response_buffer = response_option.value();
-        REQUIRE(spider::worker::TaskExecutorResponseType::Result
-                == spider::worker::get_response_type(response_buffer));
-        std::optional<std::tuple<std::string, int>> const parse_response
-                = spider::core::response_get_result<std::tuple<std::string, int>>(response_buffer);
-        REQUIRE(parse_response.has_value());
-        if (parse_response.has_value()) {
-            std::tuple<std::string, int> result = parse_response.value();
-            REQUIRE(cSampleResult == result);
-        }
-    }
+    msgpack::sbuffer const& response_buffer = response_option.value();
+    REQUIRE(spider::worker::TaskExecutorResponseType::Result
+            == spider::worker::get_response_type(response_buffer));
+    std::optional<std::tuple<std::string, int>> const parse_response
+            = spider::core::response_get_result<std::tuple<std::string, int>>(response_buffer);
+    REQUIRE(parse_response.has_value());
+    REQUIRE(cSampleResult == parse_response.value());

16-53: Consider adding more test cases for error scenarios.

The current test case only covers the happy path. Consider adding test cases for:

  • Invalid message format
  • Pipe disconnection
  • Timeout scenarios
  • Buffer overflow conditions

Would you like me to help generate additional test cases for these scenarios?

src/spider/worker/TaskExecutorMessage.hpp (4)

6-6: Consider documenting the IWYU pragma

While the IWYU pragma is correctly used, it would be helpful to add a brief comment explaining why this pragma is necessary for MsgPack.hpp.


9-16: Add documentation for enum values

Each response type should be documented to explain its purpose and when it's used. This helps maintainers understand the message protocol.

Example:

 enum class TaskExecutorResponseType : std::uint8_t {
+    /// Indicates an invalid or unrecognized response
     Unknown = 0,
+    /// Contains the successful execution result
     Result,
+    /// Contains error information from execution
     Error,
     // ... document other states
 };

40-54: Reduce code duplication with get_response_type

Consider creating a template function to handle the common message parsing logic.

+template<typename T>
+inline auto get_message_type(msgpack::sbuffer const& buffer) -> T {
+    // NOLINTBEGIN(cppcoreguidelines-pro-type-union-access,cppcoreguidelines-pro-bounds-pointer-arithmetic)
+    msgpack::object_handle const handle = msgpack::unpack(buffer.data(), buffer.size());
+    msgpack::object const object = handle.get();
+    if (object.type != msgpack::type::ARRAY || object.via.array.size < 2) {
+        return T::Unknown;
+    }
+    msgpack::object const header = object.via.array.ptr[0];
+    try {
+        return header.as<T>();
+    } catch (msgpack::type_error const&) {
+        return T::Unknown;
+    }
+    // NOLINTEND(cppcoreguidelines-pro-type-union-access,cppcoreguidelines-pro-bounds-pointer-arithmetic)
+}
+
+inline auto get_response_type(msgpack::sbuffer const& buffer) -> TaskExecutorResponseType {
+    return get_message_type<TaskExecutorResponseType>(buffer);
+}
+
+inline auto get_request_type(msgpack::sbuffer const& buffer) -> TaskExecutorRequestType {
+    return get_message_type<TaskExecutorRequestType>(buffer);
+}

1-70: Consider adding protocol versioning and documentation

To ensure forward compatibility and maintainability:

  1. Add a version field to the message protocol
  2. Create protocol documentation explaining the message format, types, and their use cases
  3. Consider adding a protocol validation layer

This will help with future protocol evolution and make it easier for new developers to understand the messaging system.

tests/worker/test-FunctionManager.cpp (3)

Line range hint 28-62: Consider enhancing error handling validation.

While the test covers both successful and error scenarios well, consider these improvements:

  1. Validate error message content in addition to error codes
  2. Simplify optional handling using std::optional methods

Here's a suggested improvement for the error handling section:

-    if (wrong_result_option.has_value()) {
-        REQUIRE(spider::core::FunctionInvokeError::WrongNumberOfArguments
-                == std::get<0>(wrong_result_option.value()));
-    }
+    REQUIRE(std::get<0>(wrong_result_option.value()) == 
+            spider::core::FunctionInvokeError::WrongNumberOfArguments);
+    REQUIRE(!std::get<1>(wrong_result_option.value()).empty());

64-75: Consider adding error case tests for tuple returns.

The happy path is well tested, but consider adding tests for:

  1. Wrong argument types
  2. Wrong number of arguments
  3. Edge cases with empty strings

Would you like me to provide example test cases for these scenarios?


Additional test coverage needed for Data handling

Looking at the codebase, while there are some Data-related tests in test-DataStorage.cpp, they don't fully cover edge cases. The Data class implementation in src/spider/core/Data.hpp shows that it accepts any string values without validation, making it important to test boundary conditions.

Specific test cases to add:

  • Empty strings for both key and value
  • Very large strings to ensure no implicit size limitations
  • Strings with special characters and Unicode content
  • Edge cases around UUID generation when using the default constructor
  • Duplicate key scenarios (partially covered in DataStorage tests)
🔗 Analysis chain

Line range hint 77-93: Consider adding edge cases for Data handling.

The current test verifies basic functionality well. Consider adding tests for:

  1. Empty data values
  2. Very large data values
  3. Special characters in data

Let's check if there are any existing edge case tests elsewhere:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for other Data-related tests
rg -l "TEST_CASE.*Data" tests/

Length of output: 32


Script:

#!/bin/bash
# Let's try a broader search for Data-related tests
rg "TEST_CASE.*[^a-zA-Z]Data" tests/ -A 5

# Also search for any Data usage in tests
rg "spider::core::Data" tests/ -A 2

Length of output: 3701


Script:

#!/bin/bash
# Let's examine the Data class implementation to understand what edge cases should be tested
rg -A 10 "class Data" src/

# Also check if there are any size limits or validation in Data implementation
ast-grep --pattern 'Data::Data($$$) {
  $$$
}'

Length of output: 1491

src/spider/worker/message_pipe.cpp (3)

19-19: Use std::size_t instead of size_t for consistency

Consider using std::size_t to explicitly specify the namespace and maintain consistency across the codebase.


21-46: Eliminate code duplication in send_message functions

The two send_message functions have identical implementations except for the type of the pipe parameter. Consider templating the function or using a common base class to reduce code duplication and improve maintainability.

Here's how you might refactor using a template:

template <typename Pipe>
auto send_message(Pipe& pipe, msgpack::sbuffer const& request) -> bool {
    try {
        size_t const size = request.size();
        std::string const size_str = fmt::format("{:016d}", size);
        boost::asio::write(pipe, boost::asio::buffer(size_str));
        boost::asio::write(pipe, boost::asio::buffer(request.data(), size));
        return true;
    } catch (boost::system::system_error const& e) {
        spdlog::error("Failed to send message: {}", e.what());
        return false;
    }
}

91-99: Consistent error logging for EOF condition

When checking for boost::asio::error::eof, ensure that the error logging is consistent with the synchronous version. Currently, the asynchronous function may not log all relevant errors.

Consider aligning the error handling with receive_message.

src/spider/worker/FunctionManager.hpp (5)

19-20: Include guard verification for "TaskExecutorMessage.hpp"

Ensure that "TaskExecutorMessage.hpp" has proper include guards or uses #pragma once to prevent multiple inclusions, which could lead to compilation issues.


99-103: Consolidate create_error_response and create_error_buffer to reduce duplication

Both functions serve similar purposes in creating error responses. Merging them could simplify the codebase and eliminate redundancy.


210-213: Include exception details in result parsing error messages

Adding e.what() to the error message would provide more context, aiding in debugging efforts.

Apply this diff to include exception details:

} catch (msgpack::type_error& e) {
    return create_error_response(
        FunctionInvokeError::ResultParsingError,
-       fmt::format("Cannot parse result.")
+       fmt::format("Cannot parse result: {}", e.what())
    );
}

215-218: Provide exception information in function execution errors

Including the exception message e.what() in the error response would offer valuable insights during debugging.

Apply this diff to enhance the error message:

} catch (std::exception& e) {
    return create_error_response(
        FunctionInvokeError::FunctionExecutionError,
-       "Function execution error"
+       fmt::format("Function execution error: {}", e.what())
    );
}

253-254: Avoid returning raw pointers from get_function

Returning a raw pointer may lead to safety issues. Consider returning a reference or an std::optional<const Function&> to enhance safety and express the possibility of a missing function.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between b7ce91e and f7f8e3f.

📒 Files selected for processing (16)
  • CMakeLists.txt (3 hunks)
  • src/spider/.clang-format (1 hunks)
  • src/spider/CMakeLists.txt (2 hunks)
  • src/spider/core/BoostAsio.hpp (1 hunks)
  • src/spider/worker/FunctionManager.cpp (1 hunks)
  • src/spider/worker/FunctionManager.hpp (6 hunks)
  • src/spider/worker/TaskExecutorMessage.hpp (1 hunks)
  • src/spider/worker/message_pipe.cpp (1 hunks)
  • src/spider/worker/message_pipe.hpp (1 hunks)
  • tests/CMakeLists.txt (1 hunks)
  • tests/worker/test-FunctionManager.cpp (4 hunks)
  • tests/worker/test-MessagePipe.cpp (1 hunks)
  • tools/scripts/lib_install/boost.sh (1 hunks)
  • tools/scripts/lib_install/fmtlib.sh (1 hunks)
  • tools/scripts/lib_install/linux/install-lib.sh (1 hunks)
  • tools/scripts/lib_install/spdlog.sh (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • src/spider/core/BoostAsio.hpp
🧰 Additional context used
🪛 cppcheck (2.10-2)
tests/worker/test-FunctionManager.cpp

[error] 28-28: syntax error

(syntaxError)

🔇 Additional comments (26)
src/spider/.clang-format (1)

13-14: Verify completeness of library headers list

Let's ensure all required libraries for the message pipe feature are included in the pattern.

✅ Verification successful

Library headers pattern in .clang-format is complete

The current pattern "^<(|absl|boost|catch2|fmt|mariadb|msgpack|spdlog)" correctly covers all third-party library headers used in the codebase. The verification shows:

  • All major libraries are included: Abseil (absl), Boost (boost), Catch2 (catch2), fmt (fmt), MariaDB (mariadb), MessagePack (msgpack), and spdlog (spdlog)
  • The remaining includes are either:
    • Standard C++ headers (like <vector>, <string>, etc.)
    • Project-specific includes using quotes
    • Nested includes from the already-covered libraries
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any missing library includes that should be added to .clang-format

# Search for library includes in source files
echo "Searching for library includes..."
rg -g '*.{cpp,hpp,h}' '^\s*#include\s*<([^>]+)>' -o | \
  sed -E 's/.*<([^/]+).*/\1/' | \
  sort -u | \
  grep -vE '^(absl|boost|catch2|fmt|mariadb|msgpack|spdlog|spider|std.*)'

echo "Checking CMake dependencies..."
# Look for find_package calls in CMake files
fd -e cmake -e txt -x grep -h "find_package" {}

Length of output: 1117


Script:

#!/bin/bash
# Let's check for any includes from Abseil, Boost, or other libraries that might be hidden in subdirectories
echo "Checking for library includes with path components..."
rg -g '*.{cpp,hpp,h}' '^\s*#include\s*<([^>]+)>' -o | \
  grep -v '^#include <[^/]*>' | \
  sort -u

# Also check for any third-party includes that might be using quotes instead of angle brackets
echo "Checking for quoted includes..."
rg -g '*.{cpp,hpp,h}' '^\s*#include\s*"([^"]+)"' -o | \
  grep -v '\.\./' | \
  grep -v '\./' | \
  sort -u

Length of output: 11602

tools/scripts/lib_install/linux/install-lib.sh (1)

24-24: Verify spdlog version compatibility

The version update to spdlog 1.15.0 looks good, but let's verify its stability and compatibility with the external fmt library requirement.

✅ Verification successful

The previous results show that spdlog 1.15.0 is a stable release, but there's a known issue with VS2017 and potential compatibility concerns with older fmt versions. Let's verify the fmt library configuration in the codebase to ensure compatibility.


Version update to spdlog 1.15.0 is compatible and properly configured

The update is safe to proceed because:

  • The project correctly uses fmt 8.0.1, which is compatible with spdlog 1.15.0
  • SPDLOG_FMT_EXTERNAL is properly configured in both CMake and build scripts
  • While there's an open VS2017 issue, it doesn't affect this Linux-specific installation script
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify spdlog 1.15.0 compatibility and release information

# Check if spdlog 1.15.0 is tagged as stable
curl -s "https://api.github.com/repos/gabime/spdlog/releases/tags/v1.15.0" | jq -r '.prerelease'

# Check minimum required fmt version for spdlog 1.15.0
rg -A 5 "FMT_VERSION|fmt_version" "$(fd -t f -e cmake -e txt CMakeLists)"

# Verify if any known issues exist with 1.15.0
curl -s "https://api.github.com/repos/gabime/spdlog/issues?state=all" | \
  jq -r '.[] | select(.title | contains("1.15.0")) | {title: .title, state: .state, labels: [.labels[].name]}'

Length of output: 877


Script:

#!/bin/bash
# Let's check for fmt library version requirements and usage in the codebase

# Search for fmt version requirements in CMake files
rg -A 3 "find_package\(fmt" 

# Check if SPDLOG_FMT_EXTERNAL is defined
rg "SPDLOG_FMT_EXTERNAL|FMT_EXTERNAL"

# Look for fmt library usage patterns
rg -l "fmt::|#include.*fmt"

Length of output: 568

src/spider/worker/message_pipe.hpp (2)

22-24: LGTM! Proper namespace and include guard closing.

The closing comments for both namespace and include guard enhance readability in larger codebases.


12-20: Verify error handling and message protocol details.

The boolean and optional return types suggest basic error handling, but we should verify the implementation details.

✅ Verification successful

Let me gather more information about the message protocol implementation.


Error handling and message protocol implementation are well-structured and robust

The implementation shows a solid message protocol with:

  • Fixed-size header (16 bytes) containing message length
  • Proper error handling for both sync and async operations
  • Comprehensive error logging for all failure scenarios
  • Graceful EOF handling
  • Consistent error propagation through boolean/optional returns

Error handling covers:

  • System errors during read/write operations
  • Header parsing failures
  • Size mismatches
  • EOF conditions
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check implementation details for error handling and message protocol

# Look for error handling patterns in the implementation
echo "Checking error handling implementation..."
rg -A 5 "catch|throw|error|fail" src/spider/worker/message_pipe.cpp

# Look for message format/protocol details
echo "Checking message protocol implementation..."
ast-grep --pattern 'msgpack::pack($_)'

Length of output: 2467


Script:

#!/bin/bash
# Check the message protocol implementation details
echo "Checking message protocol implementation in message_pipe.cpp..."
rg -A 10 "send_message|receive_message" src/spider/worker/message_pipe.cpp

# Look for message size/format constants
echo "Checking message format constants..."
rg "constexpr|#define.*size" src/spider/worker/message_pipe.cpp

Length of output: 2460

tests/CMakeLists.txt (1)

9-9: LGTM: Test file addition is well-structured

The addition of test-MessagePipe.cpp follows the established pattern for test organization and is properly integrated into the test suite.

src/spider/CMakeLists.txt (2)

2-7: LGTM: Core sources configuration is well-structured

The CACHE INTERNAL setting ensures proper variable scope accessibility, and the source files are logically organized.


15-16: Verify Boost.Asio and MessagePack dependencies

The addition of BoostAsio.hpp and MsgPack.hpp suggests new dependencies. Let's verify the configuration.

✅ Verification successful

Let me check the CMakeLists.txt content to verify the dependency configuration.


Dependencies are properly configured

The dependencies for both Boost.Asio and MessagePack are correctly set up:

  • Boost is configured with required components including system (which provides Asio)
  • MessagePack is properly configured via find_package(msgpack-cxx 7.0.0 REQUIRED)
  • Both libraries are correctly linked:
    • Boost via Boost::boost and Boost::system
    • MessagePack via msgpack-cxx
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for proper dependency configuration

# Check if Boost.Asio is properly configured
rg -l "find_package.*Boost.*asio" --type cmake

# Check if MessagePack is properly configured
rg -l "find_package.*msgpack" --type cmake

# Verify usage of these dependencies
rg -l "include.*BoostAsio\.hpp"
rg -l "include.*MsgPack\.hpp"

Length of output: 594


Script:

#!/bin/bash
# Check root CMakeLists.txt for dependency configuration
cat CMakeLists.txt

# Check spider's CMakeLists.txt
cat src/spider/CMakeLists.txt

Length of output: 6302

tools/scripts/lib_install/fmtlib.sh (1)

67-67: LGTM: Position Independent Code flag is correctly added

The addition of -DCMAKE_POSITION_INDEPENDENT_CODE=ON is appropriate for building shared libraries and aligns with the project's broader changes.

src/spider/worker/FunctionManager.cpp (4)

1-11: LGTM! Well-organized header includes

The header organization follows best practices with proper grouping (standard library, external, then internal headers).


62-67: LGTM! Clean and efficient implementation

The method follows best practices with proper const usage and efficient map lookup.


71-72: LGTM! Proper use of BOOST_DLL_ALIAS

The DLL alias is correctly implemented with appropriate NOLINT directive.


40-49: Verify message size constraints

The function should validate the message size to prevent potential memory issues with very large error messages.

tests/worker/test-MessagePipe.cpp (2)

1-14: LGTM! Well-structured includes with appropriate pragmas.

The includes are well-organized and the IWYU/NOLINT pragmas are appropriately used for test code.


1-56: Verify test integration in the test suite.

Let's ensure this test is properly integrated into the build system and test suite.

✅ Verification successful

Let me check the CMakeLists.txt content to verify how the test is integrated.


Test is properly integrated in the test suite

The test file is correctly integrated:

  • Listed in SPIDER_TEST_SOURCES variable in tests/CMakeLists.txt
  • Included in the unitTest executable target
  • Has proper library linkage with required dependencies (Boost, Catch2)
  • Is the only MessagePipe test file, which is appropriate given its comprehensive coverage of the message pipe functionality
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify test integration in CMake configuration

# Check if the test file is included in CMakeLists.txt
echo "Checking test file inclusion in CMakeLists.txt..."
fd -t f "CMakeLists.txt" tests/ -x grep -l "test-MessagePipe.cpp"

# Check for other message pipe related tests
echo "Checking for related test files..."
fd -t f "test.*MessagePipe.*\\.cpp$" tests/

Length of output: 351


Script:

#!/bin/bash
# Check the content of tests/CMakeLists.txt
cat tests/CMakeLists.txt

Length of output: 962

src/spider/worker/TaskExecutorMessage.hpp (1)

66-68: LGTM! MessagePack registration is properly implemented

The enum registration is correctly placed in the global namespace with clear documentation.

tests/worker/test-FunctionManager.cpp (1)

24-26: LGTM! Test function registration is comprehensive.

The test functions are well-structured to cover different scenarios including POD types, tuple returns, and custom Data objects.

CMakeLists.txt (2)

58-60: LGTM! Position Independent Code is essential for modern builds.

Enabling PIC globally is a good practice, especially when building components that might be used in shared libraries or plugins.


124-124: LGTM! Consistent spdlog configuration.

The SPDLOG_FMT_EXTERNAL definition is correctly added, matching the existing configuration set above.

src/spider/worker/FunctionManager.hpp (8)

93-95: response_get_error function implementation is sound

The addition of response_get_error enhances error handling by providing a structured way to extract error information from the buffer.


96-98: create_error_response improves error message consistency

The create_error_response function effectively standardizes error responses, which aids in consistent error handling across the system.


112-114: Verify robustness of array size check in response_get_result

The condition object.via.array.size == 2 assumes the array always contains exactly two elements. Ensure this assumption holds true in all contexts to prevent potential parsing errors.


116-119: Confirm accuracy of response type comparison

Double-check that the comparison with TaskExecutorResponseType::Result correctly identifies the expected response type and that the as<>() conversion is safe and reliable.


130-136: create_result_response function is well-structured

The function correctly packs the result into a msgpack::sbuffer, ensuring that the response is properly formatted.


149-158: create_args_request efficiently packs arguments

The implementation of create_args_request effectively serializes the arguments into a request buffer, facilitating seamless communication.


208-208: Ensure exception safety when returning results

While returning results, confirm that any exceptions during the packing process are appropriately handled to prevent unexpected crashes.


249-250: register_function_invoker method is appropriately implemented

The addition of register_function_invoker allows for direct registration of function invokers, increasing flexibility.

Comment on lines +12 to +20
auto send_message(boost::asio::writable_pipe& pipe, msgpack::sbuffer const& request) -> bool;

auto send_message(boost::asio::posix::stream_descriptor& fd, msgpack::sbuffer const& request)
-> bool;

auto receive_message_async(std::reference_wrapper<boost::asio::readable_pipe> pipe
) -> boost::asio::awaitable<std::optional<msgpack::sbuffer>>;

auto receive_message(boost::asio::posix::stream_descriptor& fd) -> std::optional<msgpack::sbuffer>;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add comprehensive function documentation and consider error handling strategy.

While the function signatures are well-designed, they would benefit from detailed documentation and explicit error handling guidance.

Add Doxygen-style documentation for each function:

+/**
+ * @brief Sends a message through a writable pipe
+ * @param pipe The pipe to write the message to
+ * @param request The message buffer to send
+ * @return true if the message was sent successfully, false otherwise
+ */
 auto send_message(boost::asio::writable_pipe& pipe, msgpack::sbuffer const& request) -> bool;

+/**
+ * @brief Sends a message through a POSIX stream descriptor
+ * @param fd The file descriptor to write the message to
+ * @param request The message buffer to send
+ * @return true if the message was sent successfully, false otherwise
+ */
 auto send_message(boost::asio::posix::stream_descriptor& fd, msgpack::sbuffer const& request)
         -> bool;

+/**
+ * @brief Asynchronously receives a message from a readable pipe
+ * @param pipe Reference wrapper to the pipe to read from
+ * @return An awaitable that resolves to an optional message buffer
+ */
 auto receive_message_async(std::reference_wrapper<boost::asio::readable_pipe> pipe
 ) -> boost::asio::awaitable<std::optional<msgpack::sbuffer>>;

+/**
+ * @brief Synchronously receives a message from a POSIX stream descriptor
+ * @param fd The file descriptor to read from
+ * @return An optional containing the received message buffer if successful
+ */
 auto receive_message(boost::asio::posix::stream_descriptor& fd) -> std::optional<msgpack::sbuffer>;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
auto send_message(boost::asio::writable_pipe& pipe, msgpack::sbuffer const& request) -> bool;
auto send_message(boost::asio::posix::stream_descriptor& fd, msgpack::sbuffer const& request)
-> bool;
auto receive_message_async(std::reference_wrapper<boost::asio::readable_pipe> pipe
) -> boost::asio::awaitable<std::optional<msgpack::sbuffer>>;
auto receive_message(boost::asio::posix::stream_descriptor& fd) -> std::optional<msgpack::sbuffer>;
/**
* @brief Sends a message through a writable pipe
* @param pipe The pipe to write the message to
* @param request The message buffer to send
* @return true if the message was sent successfully, false otherwise
*/
auto send_message(boost::asio::writable_pipe& pipe, msgpack::sbuffer const& request) -> bool;
/**
* @brief Sends a message through a POSIX stream descriptor
* @param fd The file descriptor to write the message to
* @param request The message buffer to send
* @return true if the message was sent successfully, false otherwise
*/
auto send_message(boost::asio::posix::stream_descriptor& fd, msgpack::sbuffer const& request)
-> bool;
/**
* @brief Asynchronously receives a message from a readable pipe
* @param pipe Reference wrapper to the pipe to read from
* @return An awaitable that resolves to an optional message buffer
*/
auto receive_message_async(std::reference_wrapper<boost::asio::readable_pipe> pipe
) -> boost::asio::awaitable<std::optional<msgpack::sbuffer>>;
/**
* @brief Synchronously receives a message from a POSIX stream descriptor
* @param fd The file descriptor to read from
* @return An optional containing the received message buffer if successful
*/
auto receive_message(boost::asio::posix::stream_descriptor& fd) -> std::optional<msgpack::sbuffer>;

Comment on lines 23 to 32
target_link_libraries(
unitTest
PRIVATE
Boost::headers
Boost::filesystem
Boost::program_options
Boost::process
Boost::system
spdlog::spdlog
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Remove unused Boost components from test dependencies

Based on the code analysis, some Boost components are not being used in the test suite:

  • Boost::filesystem and Boost::program_options are not used anywhere in the tests
  • Only the following Boost components are actually needed:
    • Boost::headers (for UUID functionality)
    • Boost::process and Boost::system (for asio functionality in MessagePipe tests)

Additionally, while spdlog is linked, it appears to be unused in the test code and can be removed.

🔗 Analysis chain

Verify usage of all Boost components

While the dependencies are properly configured, let's verify that all Boost components are actually needed by the test suite.


Let me gather more information about the actual usage of Boost components in the test files.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check usage of Boost components in test files

echo "Checking Boost::filesystem usage..."
rg -l "boost/filesystem" tests/

echo "Checking Boost::program_options usage..."
rg -l "boost/program_options" tests/

echo "Checking Boost::process usage..."
rg -l "boost/process" tests/

echo "Checking Boost::system usage..."
rg -l "boost/system" tests/

echo "Checking spdlog usage..."
rg -l "spdlog" tests/

Length of output: 547


Script:

#!/bin/bash
# Let's check the actual content of test files to see Boost usage
echo "Searching for Boost headers and usage..."
rg "boost" tests/ -A 3

echo -e "\nSearching for spdlog usage..."
rg "spdlog" tests/ -A 3

Length of output: 7844

Comment on lines +67 to 68
cmake -DCMAKE_POSITION_INDEPENDENT_CODE=ON ../
make -j${num_cpus}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider enhancing build verification and cleanup

The script should verify the build success and clean up on failure. Consider adding:

  1. Build status verification
  2. Cleanup trap for failures

Apply this diff to improve error handling:

 cmake -DCMAKE_POSITION_INDEPENDENT_CODE=ON ../
+if [ $? -ne 0 ]; then
+    echo "CMAKE configuration failed"
+    exit 1
+fi
 make -j${num_cpus}
+if [ $? -ne 0 ]; then
+    echo "Build failed"
+    exit 1
+fi

And at the start of the script, after set -e:

 # Exit on any error
 set -e
+
+# Cleanup on error
+cleanup() {
+    if [ $? -ne 0 ]; then
+        echo "Error occurred, cleaning up..."
+        rm -rf $temp_dir
+    fi
+}
+trap cleanup EXIT

Committable suggestion skipped: line range outside the PR's diff.

@@ -71,7 +71,7 @@ fi
cd ${extracted_dir}
mkdir -p build
cd build
cmake ../
cmake -DSPDLOG_FMT_EXTERNAL=ON ..
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Version mismatch between build script and CMake requirement

The build system has a version mismatch for the fmt library:

  • CMakeLists.txt requires fmt version 8.0.1
  • install-lib.sh installs fmt version 11.0.2

While the newer version should be backwards compatible, it's recommended to:

  • Update CMakeLists.txt to require fmt 11.0.2, or
  • Update install-lib.sh to install fmt 8.0.1

Otherwise, the -DSPDLOG_FMT_EXTERNAL=ON flag is properly supported as:

  • fmtlib.sh is installed before spdlog.sh
  • The installation scripts handle dependencies correctly
🔗 Analysis chain

Verify fmt library dependency for external usage

The addition of -DSPDLOG_FMT_EXTERNAL=ON is appropriate for the messaging pipe feature. However, we should ensure the fmt library dependency is properly handled.

Let's verify the fmt library dependency:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for fmt library installation script and dependencies

# Check for fmt library installation script
echo "Checking for fmt library installation script..."
fd -t f "fmt.sh" "tools/scripts/lib_install"

# Check if fmt is mentioned in any CMake files
echo "Checking for fmt in CMake files..."
rg -l "fmt" -g "CMakeLists.txt"

# Check for fmt package requirements
echo "Checking for fmt package requirements..."
rg "fmt" -g "package.xml|requirements.txt|dependencies"

Length of output: 593


Script:

#!/bin/bash
# Let's check the content of CMakeLists.txt files that mention fmt
echo "Checking CMakeLists.txt content for fmt dependency..."
rg -A 5 -B 5 "fmt" CMakeLists.txt src/spider/CMakeLists.txt

# Check if fmt library is installed before spdlog
echo "Checking spdlog.sh script for fmt dependency handling..."
cat tools/scripts/lib_install/spdlog.sh

# Look for any fmt installation scripts with different naming
echo "Looking for any fmt-related installation scripts..."
fd -t f ".*fmt.*" tools/scripts/lib_install/

Length of output: 4134


Script:

#!/bin/bash
# Let's check the fmtlib installation script
echo "Checking fmtlib.sh content..."
cat tools/scripts/lib_install/fmtlib.sh

# Check if fmtlib is installed before spdlog in any build scripts
echo "Checking for build order in scripts..."
rg -l "fmtlib.sh|spdlog.sh" tools/scripts/

# Verify fmt version requirement matches
echo "Checking fmt version requirements..."
rg "fmt.*8\.0\.1" -g "*.sh"

Length of output: 2407


Script:

#!/bin/bash
# Check the installation order in the Linux install script
echo "Checking library installation order..."
cat tools/scripts/lib_install/linux/install-lib.sh

# Check if there are any version specifications for fmt in CMake
echo "Checking fmt version in CMake files..."
rg "fmt.*VERSION" -A 2 -B 2 CMakeLists.txt

Length of output: 1287

Comment on lines +51 to +60
void create_error_buffer(
FunctionInvokeError error,
std::string const& message,
msgpack::sbuffer& buffer
) {
msgpack::packer packer{buffer};
packer.pack_array(2);
packer.pack(error);
packer.pack(message);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

⚠️ Potential issue

Array size difference is intentional, but buffer capacity check is recommended

The difference in array sizes between create_error_buffer (2 elements) and create_error_response (3 elements) is intentional:

  • create_error_response includes the response type as its first element (TaskExecutorResponseType::Error)
  • create_error_buffer is used when the response type is already known/written

However, the second concern about buffer capacity remains valid. Consider adding a check or using a buffer with automatic growth.

🔗 Analysis chain

Review array size inconsistency and add buffer capacity check

Two concerns:

  1. This function packs 2 elements while create_error_response packs 3 elements. Is this intentional?
  2. Consider checking buffer capacity before writing to prevent overflow.

Let me gather more information about the create_error_response function implementation to understand the array size difference.


Let me search for the implementation of create_error_response to compare with create_error_buffer.


Let me try one more search with a different approach to find the implementation.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for other uses of these functions to understand the expected array sizes
ast-grep --pattern 'create_error_buffer($$$)' -A 5
ast-grep --pattern 'create_error_response($$$)' -A 5

Length of output: 3579


Script:

#!/bin/bash
# Search for create_error_response implementation
ast-grep --pattern 'create_error_response($$$)' --lang cpp
rg "create_error_response.*\{" -A 5

Length of output: 2217


Script:

#!/bin/bash
# Search for create_error_response implementation in cpp files
rg -t cpp "create_error_response.*\{" -B 2 -A 10
# Also check the header file for its declaration
rg -t hpp "create_error_response" -B 2 -A 2

Length of output: 122


Script:

#!/bin/bash
# Search for create_error_response implementation
rg "create_error_response" -A 10 src/spider/worker/FunctionManager.cpp
# Also look for any msgpack array packing in the response creation
rg "pack_array" -B 5 -A 5 src/spider/worker/FunctionManager.cpp

Length of output: 1113

Comment on lines +82 to +86
auto receive_message_async(std::reference_wrapper<boost::asio::readable_pipe> pipe
) -> boost::asio::awaitable<std::optional<msgpack::sbuffer>> {
std::array<char, cHeaderSize> header_buffer{0};
// NOLINTNEXTLINE(clang-analyzer-core.NullDereference)
auto [header_ec, header_n] = co_await boost::asio::async_read(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Investigate suppressed null dereference warning

The comment // NOLINTNEXTLINE(clang-analyzer-core.NullDereference) suppresses a potential null dereference warning. Instead of suppressing the warning, consider addressing the underlying issue to ensure code safety.

Review whether pipe.get() could be null and add necessary checks or assertions.

Comment on lines 101 to 111
size_t response_size = 0;
try {
response_size = std::stoul(std::string{header_buffer.data(), cHeaderSize});
} catch (std::exception& e) {
spdlog::error(
"Cannot parse header: {} {}",
e.what(),
std::string{header_buffer.data(), cHeaderSize}
);
co_return std::nullopt;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Handle exceptions when parsing response size

Similar to the synchronous version, ensure that exceptions thrown by std::stoul when parsing response_size are properly handled. A malformed header could cause unexpected termination.

Ensure a try-catch block is present and include size checks:

try {
    response_size = std::stoul(std::string{header_buffer.data(), cHeaderSize});
    if (response_size >= max_allowed_size) {
        spdlog::error("Response size exceeds maximum allowed limit: {}", response_size);
        co_return std::nullopt;
    }
} catch (std::exception& e) {
    spdlog::error(
        "Cannot parse header: {} {}",
        e.what(),
        std::string{header_buffer.data(), cHeaderSize}
    );
    co_return std::nullopt;
}

Comment on lines 131 to 138
if (response_size != response_n) {
spdlog::error(
"Response read size not match. Expect {}. Got {}.",
response_size,
response_n
);
co_return std::nullopt;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Ensure complete asynchronous read of the message body

After the asynchronous read, verify that the number of bytes read matches the expected response_size to handle any incomplete reads.

Modify the check:

if (response_size != response_n) {
    spdlog::error(
        "Response read size does not match. Expected {} bytes, got {} bytes.",
        response_size,
        response_n
    );
    co_return std::nullopt;
}

Comment on lines +21 to +25
// NOLINTBEGIN(cppcoreguidelines-macro-usage)
#define CONCAT_DIRECT(s1, s2) s1##s2
#define CONCAT(s1, s2) CONCAT_DIRECT(s1, s2)
#define ANONYMOUS_VARIABLE(str) CONCAT(str, __COUNTER__)
// NOLINTEND(cppcoreguidelines-macro-usage)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Replace macros with constexpr functions for better type safety

Using macros like CONCAT, CONCAT_DIRECT, and ANONYMOUS_VARIABLE can lead to less type-safe code and harder-to-debug errors. Consider replacing them with constexpr functions or inline variables to enhance type safety and maintainability.

Comment on lines +28 to +29
inline const auto ANONYMOUS_VARIABLE(var) \
= spider::core::FunctionManager::get_instance().register_function(#func, func);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid macros in REGISTER_TASK; use templates instead

The REGISTER_TASK macro could be replaced with a templated function to reduce macro usage and improve code clarity. This change would enhance type safety and make the code easier to debug.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (6)
src/spider/worker/message_pipe.cpp (2)

120-122: Maintain consistency with synchronous version

The async version checks for zero body size while the synchronous version doesn't. This inconsistency could lead to different behaviour between the two implementations.

Either add the check to the synchronous version or document why the behaviours differ.


1-148: Add documentation for message format and size limits

The code would benefit from documentation describing:

  • The message format specification (16-byte header + variable body)
  • Maximum message size limits and their rationale
  • Error handling behaviour and recovery strategies

Consider adding a detailed comment block at the top of the file explaining these aspects.

src/spider/worker/FunctionManager.hpp (4)

93-103: Add documentation for error handling functions

These public API functions would benefit from documentation explaining:

  • Expected input format
  • Possible error conditions
  • Return value semantics

112-118: Replace magic numbers with named constants

The array size check uses magic number '2'. Consider defining a constant for better maintainability:

+    static constexpr std::size_t EXPECTED_RESPONSE_SIZE = 2;
-    if (msgpack::type::ARRAY != object.type || 2 != object.via.array.size) {
+    if (msgpack::type::ARRAY != object.type || EXPECTED_RESPONSE_SIZE != object.via.array.size) {

Line range hint 177-218: Enhance error messages with more context

Consider adding more detailed error messages that include:

  • The actual values that caused parsing failures
  • The specific type conversion that failed
  • Stack trace information when available

253-253: Document pointer ownership semantics

Please add documentation clarifying:

  • Pointer ownership (who owns the returned Function pointer)
  • Lifetime guarantees
  • Whether the pointer can be null
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 6fb2f0b and 034e309.

📒 Files selected for processing (4)
  • src/spider/CMakeLists.txt (2 hunks)
  • src/spider/worker/FunctionManager.cpp (1 hunks)
  • src/spider/worker/FunctionManager.hpp (6 hunks)
  • src/spider/worker/message_pipe.cpp (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/spider/CMakeLists.txt
  • src/spider/worker/FunctionManager.cpp
🔇 Additional comments (5)
src/spider/worker/message_pipe.cpp (3)

1-19: LGTM! Header organization and namespace setup look good.

The includes are well-organized, and the IWYU pragmas are appropriately used for managing header dependencies.


21-32: ⚠️ Potential issue

Add size validation before formatting message header

The current implementation doesn't validate the message size before formatting it into the header string. This could lead to issues with very large messages.

Add size validation before formatting:

 auto send_message(boost::asio::writable_pipe& pipe, msgpack::sbuffer const& request) -> bool {
     try {
         size_t const size = request.size();
+        if (size >= 1e16) {
+            spdlog::error("Message size exceeds maximum allowed limit: {}", size);
+            return false;
+        }
         std::string const size_str = fmt::format("{:016d}", size);

90-94: ⚠️ Potential issue

Address suppressed null dereference warning

The suppressed warning suggests a potential null dereference issue that should be addressed rather than suppressed.

Add explicit null check:

-    // NOLINTNEXTLINE(clang-analyzer-core.NullDereference)
+    if (&pipe.get() == nullptr) {
+        spdlog::error("Null pipe reference");
+        co_return std::nullopt;
+    }
     auto [header_ec, header_n] = co_await boost::asio::async_read(
src/spider/worker/FunctionManager.hpp (2)

21-29: Consider modernizing macro usage with constexpr alternatives

The use of macros for concatenation and variable generation could be replaced with more type-safe alternatives.


149-158: Consider adding buffer size checks

For safety, consider adding a maximum limit to the number of arguments to prevent potential buffer overflow issues.

Comment on lines 34 to 46
auto send_message(boost::asio::posix::stream_descriptor& fd, msgpack::sbuffer const& request)
-> bool {
try {
size_t const size = request.size();
std::string const size_str = fmt::format("{:016d}", size);
boost::asio::write(fd, boost::asio::buffer(size_str));
boost::asio::write(fd, boost::asio::buffer(request.data(), size));
return true;
} catch (boost::system::system_error const& e) {
spdlog::error("Failed to send message: {}", e.what());
return false;
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Refactor duplicated send_message logic

Both send_message overloads share identical logic, differing only in the type of the output device.

Consider implementing a template function to reduce code duplication:

+template<typename T>
+auto send_message_impl(T& output, msgpack::sbuffer const& request) -> bool {
+    try {
+        size_t const size = request.size();
+        if (size >= 1e16) {
+            spdlog::error("Message size exceeds maximum allowed limit: {}", size);
+            return false;
+        }
+        std::string const size_str = fmt::format("{:016d}", size);
+        boost::asio::write(output, boost::asio::buffer(size_str));
+        boost::asio::write(output, boost::asio::buffer(request.data(), size));
+        return true;
+    } catch (boost::system::system_error const& e) {
+        spdlog::error("Failed to send message: {}", e.what());
+        return false;
+    }
+}
+
+auto send_message(boost::asio::writable_pipe& pipe, msgpack::sbuffer const& request) -> bool {
+    return send_message_impl(pipe, request);
+}
+
+auto send_message(boost::asio::posix::stream_descriptor& fd, msgpack::sbuffer const& request) -> bool {
+    return send_message_impl(fd, request);
+}

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +48 to +88
auto receive_message(boost::asio::posix::stream_descriptor& fd) -> std::optional<msgpack::sbuffer> {
std::array<char, cHeaderSize> header_buffer{0};
try {
boost::asio::read(fd, boost::asio::buffer(header_buffer));
} catch (boost::system::system_error& e) {
if (boost::asio::error::eof != e.code()) {
spdlog::error("Fail to read header: {}", e.what());
}
return std::nullopt;
}
size_t body_size = 0;
try {
body_size = std::stoul(std::string{header_buffer.data(), cHeaderSize});
} catch (std::exception& e) {
spdlog::error(
"Cannot parse header: {} {}",
e.what(),
std::string{header_buffer.data(), cHeaderSize}
);
return std::nullopt;
}

std::vector<char> body_buffer(body_size);
try {
size_t body_read_size = boost::asio::read(fd, boost::asio::buffer(body_buffer));
if (body_read_size != body_size) {
spdlog::error(
"Message body read size mismatch. Expect {}. Got {}",
body_size,
body_read_size
);
return std::nullopt;
}
} catch (boost::system::system_error& e) {
spdlog::error("Fail to read response body: {}", e.what());
return std::nullopt;
}
msgpack::sbuffer buffer;
buffer.write(body_buffer.data(), body_buffer.size());
return buffer;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add memory safety checks for message body allocation

The function allocates a buffer based on the parsed size without checking if the allocation is reasonable or possible.

Add memory safety checks:

     try {
         body_size = std::stoul(std::string{header_buffer.data(), cHeaderSize});
+        constexpr size_t max_message_size = 1024 * 1024 * 1024; // 1GB or adjust as needed
+        if (body_size > max_message_size) {
+            spdlog::error("Message size {} exceeds maximum allowed size {}", body_size, max_message_size);
+            return std::nullopt;
+        }
     } catch (std::exception& e) {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
auto receive_message(boost::asio::posix::stream_descriptor& fd) -> std::optional<msgpack::sbuffer> {
std::array<char, cHeaderSize> header_buffer{0};
try {
boost::asio::read(fd, boost::asio::buffer(header_buffer));
} catch (boost::system::system_error& e) {
if (boost::asio::error::eof != e.code()) {
spdlog::error("Fail to read header: {}", e.what());
}
return std::nullopt;
}
size_t body_size = 0;
try {
body_size = std::stoul(std::string{header_buffer.data(), cHeaderSize});
} catch (std::exception& e) {
spdlog::error(
"Cannot parse header: {} {}",
e.what(),
std::string{header_buffer.data(), cHeaderSize}
);
return std::nullopt;
}
std::vector<char> body_buffer(body_size);
try {
size_t body_read_size = boost::asio::read(fd, boost::asio::buffer(body_buffer));
if (body_read_size != body_size) {
spdlog::error(
"Message body read size mismatch. Expect {}. Got {}",
body_size,
body_read_size
);
return std::nullopt;
}
} catch (boost::system::system_error& e) {
spdlog::error("Fail to read response body: {}", e.what());
return std::nullopt;
}
msgpack::sbuffer buffer;
buffer.write(body_buffer.data(), body_buffer.size());
return buffer;
}
auto receive_message(boost::asio::posix::stream_descriptor& fd) -> std::optional<msgpack::sbuffer> {
std::array<char, cHeaderSize> header_buffer{0};
try {
boost::asio::read(fd, boost::asio::buffer(header_buffer));
} catch (boost::system::system_error& e) {
if (boost::asio::error::eof != e.code()) {
spdlog::error("Fail to read header: {}", e.what());
}
return std::nullopt;
}
size_t body_size = 0;
try {
body_size = std::stoul(std::string{header_buffer.data(), cHeaderSize});
constexpr size_t max_message_size = 1024 * 1024 * 1024; // 1GB or adjust as needed
if (body_size > max_message_size) {
spdlog::error("Message size {} exceeds maximum allowed size {}", body_size, max_message_size);
return std::nullopt;
}
} catch (std::exception& e) {
spdlog::error(
"Cannot parse header: {} {}",
e.what(),
std::string{header_buffer.data(), cHeaderSize}
);
return std::nullopt;
}
std::vector<char> body_buffer(body_size);
try {
size_t body_read_size = boost::asio::read(fd, boost::asio::buffer(body_buffer));
if (body_read_size != body_size) {
spdlog::error(
"Message body read size mismatch. Expect {}. Got {}",
body_size,
body_read_size
);
return std::nullopt;
}
} catch (boost::system::system_error& e) {
spdlog::error("Fail to read response body: {}", e.what());
return std::nullopt;
}
msgpack::sbuffer buffer;
buffer.write(body_buffer.data(), body_buffer.size());
return buffer;
}

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (4)
src/spider/worker/FunctionManager.hpp (4)

93-104: Add documentation for public API functions

These public API functions would benefit from documentation explaining:

  • Parameters and return values
  • Error conditions and handling
  • Usage examples

112-118: Replace magic number with named constant

The array size check uses a magic number '2'. Consider defining a named constant for better maintainability.

+ static constexpr std::size_t RESPONSE_ARRAY_SIZE = 2;
- if (msgpack::type::ARRAY != object.type || 2 != object.via.array.size) {
+ if (msgpack::type::ARRAY != object.type || RESPONSE_ARRAY_SIZE != object.via.array.size) {

130-136: Implementation looks good, consider consistent constant usage

The implementation is correct, but should use the same named constant for array size as suggested above.


Line range hint 177-218: Enhance error messages for better debugging

Consider including more context in error messages:

  • For ArgumentParsingError: Include the actual parsing error message from the exception
  • For FunctionExecutionError: Include the exception message from std::exception
- fmt::format("Cannot parse arguments.")
+ fmt::format("Cannot parse arguments: {}", e.what())
- "Function execution error"
+ fmt::format("Function execution error: {}", e.what())
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 7b55d7b and 56f5684.

📒 Files selected for processing (2)
  • src/spider/worker/FunctionManager.hpp (6 hunks)
  • src/spider/worker/message_pipe.cpp (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/spider/worker/message_pipe.cpp
🔇 Additional comments (3)
src/spider/worker/FunctionManager.hpp (3)

21-29: Consider modernizing macro-based implementations

The macro-based approach for variable name generation could be replaced with more type-safe alternatives.


149-158: LGTM! Clean and efficient implementation

The implementation makes good use of variadic templates and parameter pack expansion.


249-255: Verify type safety of direct function registration

The register_function_invoker method allows bypassing the type checking provided by FunctionInvoker. Ensure that all direct registrations maintain proper type safety.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (1)
tests/worker/test-FunctionManager.cpp (1)

Line range hint 28-61: Consider adding test case descriptions using SECTION

While the test coverage is comprehensive, consider using Catch2's SECTION macro to better organize and describe different test scenarios.

Here's how you could refactor this:

 TEST_CASE("Register and run function with POD inputs", "[core]") {
     spider::core::FunctionManager const& manager = spider::core::FunctionManager::get_instance();
 
+    SECTION("Unregistered function returns nullptr") {
     // Get the function that has not been registered should return nullptr
     REQUIRE(nullptr == manager.get_function("foo"));
+    }
 
+    SECTION("Successful execution with correct arguments") {
     // Get the registered function should succeed
     spider::core::Function const* function = manager.get_function("int_test");
     spider::core::ArgsBuffer const args_buffers = spider::core::create_args_buffers(2, 3);
     msgpack::sbuffer const result = (*function)(args_buffers);
     REQUIRE(5 == spider::core::response_get_result<int>(result).value_or(0));
+    }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 56f5684 and 88b37a3.

📒 Files selected for processing (1)
  • tests/worker/test-FunctionManager.cpp (4 hunks)
🧰 Additional context used
🪛 cppcheck (2.10-2)
tests/worker/test-FunctionManager.cpp

[error] 28-28: syntax error

(syntaxError)

🔇 Additional comments (2)
tests/worker/test-FunctionManager.cpp (2)

24-26: LGTM! Good test coverage across different data types.

The registration of test functions provides comprehensive coverage across primitive types (int_test), compound types (tuple_ret_test), and custom types (data_test).


Line range hint 77-93: Enhance const correctness in data test

While the test is thorough, consider improving const correctness for better safety and clarity.

Apply this change:

-    spider::core::Data data{"test"};
+    spider::core::Data const data{"test"};

Also, verify that the Data class's get_id() and get_value() methods are marked as const:

✅ Verification successful

Both get_id() and get_value() are already const-correct, only data variable needs const

The Data class implementation in src/spider/core/Data.hpp shows that both get_id() and get_value() methods are properly marked as const. The only improvement needed is making the data variable const in the test case, as originally suggested.

-    spider::core::Data data{"test"};
+    spider::core::Data const data{"test"};
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if Data class methods are properly marked as const
ast-grep --pattern 'class Data {
  $$$
  get_id() const {
    $$$
  }
  $$$
  get_value() const {
    $$$
  }
  $$$
}'

Length of output: 120


Script:

#!/bin/bash
# Let's try to find the Data class definition first
rg -l "class Data"

# Then let's look for the method declarations
rg "get_id|get_value" -A 2

Length of output: 9341


Script:

#!/bin/bash
# Let's examine the Data class definition in core
cat src/spider/core/Data.hpp

Length of output: 1940

Comment on lines 64 to 75
TEST_CASE("Register and run function with tuple return", "[core]") {
SPIDER_WORKER_REGISTER_TASK(tuple_ret_test);

spider::core::FunctionManager& manager = spider::core::FunctionManager::get_instance();
spider::core::FunctionManager const& manager = spider::core::FunctionManager::get_instance();

spider::core::Function const* function = manager.get_function("tuple_ret_test");

spider::core::ArgsBuffer const args_buffers = spider::core::create_args_buffers("test", 3);
msgpack::sbuffer const result = (*function)(args_buffers);
REQUIRE(std::make_tuple("test", 3)
== spider::core::buffer_get<std::tuple<std::string, int>>(result).value_or(
== spider::core::response_get_result<std::tuple<std::string, int>>(result).value_or(
std::make_tuple("", 0)
));
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider adding error cases for tuple return test

The current test only covers the happy path. Consider adding test cases for error scenarios, such as incorrect argument types or missing arguments.

Here's a suggested addition:

// Test with wrong argument type
spider::core::ArgsBuffer const wrong_args = spider::core::create_args_buffers(42, 3);  // int instead of string
msgpack::sbuffer const error_result = (*function)(wrong_args);
auto error = spider::core::response_get_error(error_result);
REQUIRE(error.has_value());
REQUIRE(spider::core::FunctionInvokeError::ArgumentParsingError == std::get<0>(error.value()));

@sitaowang1998 sitaowang1998 merged commit c0c9fbc into y-scope:main Nov 30, 2024
2 checks passed
@sitaowang1998 sitaowang1998 deleted the message_pipe branch November 30, 2024 16:45
This was referenced Nov 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant