Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add worker task executor #26

Closed
wants to merge 36 commits into from

Conversation

sitaowang1998
Copy link
Collaborator

@sitaowang1998 sitaowang1998 commented Nov 27, 2024

Description

Implement task executor is worker as a separate subprocess. Add function manager and dll loader that is used to run task in subprocess.

Validation performed

  • lint:check
  • Unit test in devcontainer

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced a DllLoader class for dynamic library loading.
    • Added TaskExecutor class for managing task execution in an asynchronous environment.
    • Implemented message handling functions for inter-process communication.
    • Enhanced build configuration with new header files and improved library linking.
  • Bug Fixes

    • Enhanced error handling in various components for improved robustness.
  • Documentation

    • Updated test cases and documentation for new functionalities and error handling strategies.
  • Tests

    • Added new unit tests for TaskExecutor and message pipe functionalities.
    • Expanded coverage in FunctionManager tests.
  • Chores

    • Updated build scripts for improved library management and configuration.

Copy link

coderabbitai bot commented Nov 27, 2024

Walkthrough

The pull request introduces various modifications primarily to the build configuration and source files of the Spider project. Key changes include enhancements to the CMakeLists.txt for enabling Position Independent Code (PIC), refining Boost library linking, and adding new header and source files for worker functionality. Additionally, new test cases are introduced to ensure the integrity of the new features, and several scripts are updated to improve the installation process for dependencies. Overall, these changes aim to enhance the project’s build system, modularity, and testing capabilities.

Changes

File Path Change Summary
CMakeLists.txt Added set(CMAKE_POSITION_INDEPENDENT_CODE ON), updated Boost components in find_package, and added add_compile_definitions(SPDLOG_FMT_EXTERNAL).
src/spider/.clang-format Updated IncludeCategories regex to match multiple libraries.
src/spider/CMakeLists.txt Added new headers to SPIDER_CORE_HEADERS, updated SPIDER_WORKER_SOURCES, added SPIDER_TASK_EXECUTOR_SOURCES, and created spider_task_executor executable target.
src/spider/core/BoostAsio.hpp Introduced new header file for Boost.Asio functionalities.
src/spider/core/Data.hpp Reordered include statements without functional changes.
src/spider/core/Serializer.hpp Activated inclusion of Boost UUID library.
src/spider/core/Task.hpp Reordered include statements without functional changes.
src/spider/core/TaskGraph.hpp Reordered include statements without functional changes.
src/spider/storage/DataStorage.hpp Reordered include statement for Boost UUID library without functional changes.
src/spider/storage/MetadataStorage.hpp Reordered include statement for Boost UUID library without functional changes.
src/spider/storage/MysqlStorage.cpp Reordered include statements without functional changes.
src/spider/storage/MysqlStorage.hpp Reordered include statements without functional changes.
src/spider/worker/DllLoader.cpp Introduced DllLoader class with load_dll method for dynamic library loading.
src/spider/worker/DllLoader.hpp Introduced DllLoader class with singleton pattern and load_dll method.
src/spider/worker/FunctionManager.hpp Reorganized includes, modified REGISTER_TASK, renamed and updated functions for error handling and response creation.
src/spider/worker/TaskExecutor.cpp Implemented TaskExecutor class for managing task execution with asynchronous capabilities.
src/spider/worker/TaskExecutor.hpp Introduced TaskExecutor class with state management and task execution methods.
src/spider/worker/TaskExecutorMessage.hpp Introduced enumerations for task execution messages and utility functions for message handling.
src/spider/worker/message_pipe.cpp Introduced functions for sending and receiving messages over pipes using Boost.Asio.
src/spider/worker/message_pipe.hpp Defined functions for message communication through writable pipes.
src/spider/worker/task_executor.cpp Implemented command-line task executor with argument parsing and dynamic library loading.
tests/CMakeLists.txt Enhanced unit testing configuration by adding new test sources and linking additional Boost libraries.
tests/worker/test-FunctionManager.cpp Registered additional test functions and updated assertion handling.
tests/worker/test-MessagePipe.cpp Introduced test case for message pipe functionality using Catch2 framework.
tests/worker/test-TaskExecutor.cpp Introduced unit tests for TaskExecutor class with various scenarios.
tests/worker/worker-test.cpp Introduced test functions for task registration within the worker module.
tools/scripts/lib_install/boost.sh Updated Boost library bootstrap process to include process library.
tools/scripts/lib_install/fmtlib.sh Added -DCMAKE_POSITION_INDEPENDENT_CODE=ON flag to cmake command for fmtlib.
tools/scripts/lib_install/linux/install-lib.sh Updated spdlog library version from 1.14.1 to 1.15.0.
tools/scripts/lib_install/spdlog.sh Altered cmake invocation to enable external formatting support for spdlog.
src/spider/worker/FunctionManager.cpp Introduced functions for error handling and response creation, including response_get_error, create_error_response, and create_error_buffer.

Possibly related PRs

  • build: Split core into spearate library #15: Changes in the main PR regarding the CMakeLists.txt file, specifically the addition of set(CMAKE_POSITION_INDEPENDENT_CODE ON) and updates to Boost components, are related to the modifications in PR build: Split core into spearate library #15, which also involves updates to the CMakeLists.txt for setting the C++ standard and enhancing the build configuration. Both PRs focus on improving the build system and library linking.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 31

🧹 Outside diff range and nitpick comments (35)
tests/worker/worker-test.cpp (1)

6-8: Consider adding documentation for the test function.

While the function name is descriptive, adding a brief comment explaining the test's purpose and expected behaviour would improve maintainability.

+/**
+ * Test function that adds two integers.
+ * Used to verify basic task execution functionality.
+ */
 auto sum_test(int const x, int const y) -> int {
     return x + y;
 }
src/spider/worker/DllLoader.hpp (1)

20-20: Enhance the public interface for better error handling and control.

The current interface could be improved in several ways:

  1. Add error details for failure cases
  2. Add methods to check library status
  3. Consider adding unload capability
  4. Document thread safety requirements

Consider this enhanced interface:

+    /// Load a dynamic library from the specified path
+    /// @param path Path to the dynamic library
+    /// @return std::error_code indicating success or specific error
+    /// @thread_safety Thread-safe
+    auto load_dll(std::string const& path) -> std::error_code;
+
+    /// Check if a library is already loaded
+    /// @param path Path to the dynamic library
+    /// @return true if loaded, false otherwise
+    /// @thread_safety Thread-safe
+    auto is_loaded(std::string const& path) const -> bool;
src/spider/worker/message_pipe.hpp (2)

12-15: Add documentation for the send_message overloads

Consider adding Doxygen-style documentation to clarify:

  • Purpose and usage of each overload
  • Differences between pipe types
  • Error conditions and return value semantics

Example documentation:

/**
 * Sends a message through a writable pipe.
 * @param pipe The pipe to write to
 * @param request The message buffer to send
 * @return true if the message was sent successfully, false otherwise
 */

12-20: Consider enhancing error reporting mechanism

The current boolean/optional return types provide limited error context. Consider:

  1. Using a Result type pattern for richer error reporting
  2. Adding error codes or categories for specific failure modes

Example enhancement:

enum class MessageError {
    Success,
    ConnectionClosed,
    InvalidMessage,
    IOError
};

auto send_message(boost::asio::writable_pipe& pipe, 
                 msgpack::sbuffer const& request) 
    -> Result<void, MessageError>;
tests/CMakeLists.txt (1)

17-21: Consider using CMake path manipulation functions

The hardcoded relative path "../src/spider/" could break if the directory structure changes.

Consider using CMake's path manipulation functions:

-    list(APPEND SPIDER_TEST_WORKER_SOURCES "../src/spider/${worker_source}")
+    list(APPEND SPIDER_TEST_WORKER_SOURCES "${CMAKE_SOURCE_DIR}/src/spider/${worker_source}")
src/spider/worker/DllLoader.cpp (2)

10-10: Consider using project-relative include paths

The relative include path "../worker/FunctionManager.hpp" could cause issues if the file structure changes. Consider using a project-relative include path instead.

-#include "../worker/FunctionManager.hpp"
+#include "spider/worker/FunctionManager.hpp"

1-49: Consider adding DLL lifecycle management and version compatibility

The current implementation could benefit from additional architectural features:

  1. DLL unloading mechanism to free resources
  2. Version compatibility checking between DLLs and the host application
  3. Hot-reloading support for development scenarios

These features would improve the robustness and maintainability of the system. Would you like assistance in implementing any of these improvements?

src/spider/core/Serializer.hpp (2)

Line range hint 11-29: Consider adding bounds checking for UUID data copying

While the size check is present, the memcpy operation could benefit from additional safety measures when handling binary data, especially in a worker task execution context.

Consider this safer approach:

        if (object.via.bin.size != boost::uuids::uuid::static_size()) {
            throw type_error();
        }
-        std::uint8_t data[boost::uuids::uuid::static_size()];
-        std::memcpy(data, object.via.bin.ptr, boost::uuids::uuid::static_size());
+        std::array<std::uint8_t, boost::uuids::uuid::static_size()> data;
+        std::copy_n(
+            static_cast<const std::uint8_t*>(object.via.bin.ptr),
+            boost::uuids::uuid::static_size(),
+            data.begin());

Line range hint 31-43: Consider using std::span for safer binary data handling

The current implementation uses a C-style cast for binary data packing, which could be made safer using modern C++ features.

Consider this modernized approach:

-        packer.pack_bin_body((char const*)id.data(), id.size());
+        auto data_span = std::span<const std::uint8_t>(id.data(), id.size());
+        packer.pack_bin_body(reinterpret_cast<const char*>(data_span.data()),
+                            data_span.size());
tools/scripts/lib_install/fmtlib.sh (1)

Line range hint 39-46: Consider enhancing download security.

While HTTPS is used for downloads, consider adding checksum verification for the downloaded tarball to ensure integrity.

Here's a suggested improvement:

  if [ ! -e ${tar_filename} ] ; then
    curl -fsSL https://github.com/fmtlib/fmt/archive/refs/tags/${tar_filename} -o ${tar_filename}
+   # Download checksum
+   curl -fsSL https://github.com/fmtlib/fmt/releases/download/${version}/sha256sum.txt -o sha256sum.txt
+   # Verify checksum
+   sha256sum -c sha256sum.txt
  fi
tools/scripts/lib_install/spdlog.sh (1)

Line range hint 1-93: Enhance script robustness with additional safeguards

While the script is well-structured, consider these improvements for better robustness:

Add these checks:

 #!/bin/bash
+
+# Ensure cleanup on script exit
+trap 'rm -rf ${temp_dir}' EXIT
+
+# Validate version format
+if ! [[ ${version} =~ ^[0-9]+\.[0-9]+\.[0-9]+$ ]]; then
+    echo "Error: Version must be in format X.Y.Z"
+    exit 1
+fi
+
 # Dependencies:

These changes will:

  • Ensure cleanup even if the script fails
  • Validate the version format before proceeding
src/spider/CMakeLists.txt (2)

60-68: Fix typo in comment

There's a typo in the cache variable comment: "filees" should be "files".

-    "spider task executor source filees"
+    "spider task executor source files"

60-80: Consider refactoring shared components

The task executor shares several source files with the worker target. Consider:

  1. Creating a shared library for common components
  2. Refactoring DllLoader and message_pipe into a separate component
+ # Add new shared library
+ add_library(spider_worker_common STATIC
+     worker/DllLoader.cpp
+     worker/message_pipe.cpp
+ )
+ 
+ # Update existing targets to use the common library
+ target_link_libraries(spider_worker PRIVATE spider_worker_common)
+ target_link_libraries(spider_task_executor PRIVATE spider_worker_common)
tests/worker/test-MessagePipe.cpp (3)

21-22: Consider using named constants for test data.

The magic values in the sample tuple could be better documented using named constants.

+    static constexpr const char* TEST_STRING = "test";
+    static constexpr int TEST_VALUE = 3;
-    constexpr std::tuple cSampleResult = std::make_tuple("test", 3);
+    constexpr std::tuple cSampleResult = std::make_tuple(TEST_STRING, TEST_VALUE);

39-50: Simplify nested conditionals using REQUIRE.

The nested if statements can be flattened using Catch2's REQUIRE, making the test more readable and failing fast.

-    if (response_option.has_value()) {
-        msgpack::sbuffer const& response_buffer = response_option.value();
-        REQUIRE(spider::worker::TaskExecutorResponseType::Result
-                == spider::worker::get_response_type(response_buffer));
-        std::optional<std::tuple<std::string, int>> const parse_response
-                = spider::core::response_get_result<std::tuple<std::string, int>>(response_buffer);
-        REQUIRE(parse_response.has_value());
-        if (parse_response.has_value()) {
-            std::tuple<std::string, int> result = parse_response.value();
-            REQUIRE(cSampleResult == result);
-        }
-    }
+    msgpack::sbuffer const& response_buffer = response_option.value();
+    REQUIRE(spider::worker::TaskExecutorResponseType::Result
+            == spider::worker::get_response_type(response_buffer));
+    auto const parse_response = spider::core::response_get_result<std::tuple<std::string, int>>(
+            response_buffer);
+    REQUIRE(parse_response.has_value());
+    REQUIRE(cSampleResult == parse_response.value());

15-51: Add negative test cases.

The test suite would benefit from additional test cases covering error scenarios:

  • Invalid message format
  • Disconnected pipe
  • Message size exceeding buffer

Would you like me to help create these additional test cases?

src/spider/worker/TaskExecutorMessage.hpp (1)

1-7: Add file documentation

Consider adding a brief documentation block at the start of the file explaining its purpose and the message protocol structure.

 #ifndef SPIDER_WORKER_TASKEXECUTORMESSAGE_HPP
 #define SPIDER_WORKER_TASKEXECUTORMESSAGE_HPP
 
+/**
+ * @file TaskExecutorMessage.hpp
+ * @brief Defines the message protocol for communication between the task executor
+ *        and its parent process, including response and request types.
+ */
+
 #include <cstdint>
tests/worker/test-TaskExecutor.cpp (3)

34-38: Add library existence verification

The function assumes libworker_test.so exists without verification. Consider adding checks to provide better error messages.

 auto get_libraries() -> std::vector<std::string> {
     boost::filesystem::path const executable_dir = boost::dll::program_location().parent_path();
     boost::filesystem::path const lib_path = executable_dir / "libworker_test.so";
+    if (!boost::filesystem::exists(lib_path)) {
+        throw std::runtime_error("Required library not found: " + lib_path.string());
+    }
     return {lib_path.string()};
 }

40-56: Enhance test coverage with edge cases

While the basic success case is covered, consider adding tests for:

  • Maximum/minimum integer values
  • Zero values
  • Negative numbers

Would you like me to help generate additional test cases?

🧰 Tools
🪛 cppcheck (2.10-2)

[error] 40-40: syntax error

(syntaxError)


40-90: Consider using a test fixture to reduce duplication

The environment setup and context creation is duplicated across test cases. Consider using a Catch2 test fixture to share this setup.

Here's a suggested structure:

class TaskExecutorTest {
protected:
    TaskExecutorTest() 
        : environment_variable(get_environment_variable())
        , context()
    {}
    
    absl::flat_hash_map<boost::process::v2::environment::key,
                        boost::process::v2::environment::value> environment_variable;
    boost::asio::io_context context;
};

TEST_CASE_METHOD(TaskExecutorTest, "Task execute success", "[worker]") {
    spider::worker::TaskExecutor executor{context, "sum_test", get_libraries(), 
                                        environment_variable, 2, 3};
    // ... rest of the test
}
🧰 Tools
🪛 cppcheck (2.10-2)

[error] 40-40: syntax error

(syntaxError)

tests/worker/test-FunctionManager.cpp (4)

24-26: LGTM! Consider adding documentation for test functions.

The test function registration is well-structured and provides good coverage for different data types and return values.

Consider adding brief documentation comments above each test function to explain their purpose and expected behaviour:

+// Tests basic integer arithmetic operation
 REGISTER_TASK(int_test);
+// Tests tuple return value handling
 REGISTER_TASK(tuple_ret_test);
+// Tests custom Data type operations
 REGISTER_TASK(data_test);

Line range hint 28-61: LGTM! Consider verifying error messages.

The test case thoroughly covers both successful and error scenarios for POD inputs. Good job maintaining const correctness and proper error handling.

Consider adding assertions to verify the error messages for better diagnostic coverage:

     if (wrong_result_option.has_value()) {
         REQUIRE(spider::core::FunctionInvokeError::WrongNumberOfArguments
                 == std::get<0>(wrong_result_option.value()));
+        REQUIRE(std::get<1>(wrong_result_option.value()).find("Expected 2 arguments") != std::string::npos);
     }

64-75: Consider adding error cases for tuple return test.

While the happy path is well tested, consider adding error cases to ensure robust handling of tuple returns.

Add test cases for:

  • Wrong number of tuple elements
  • Incorrect tuple element types
  • Empty string handling

Example:

// Test wrong argument types
spider::core::ArgsBuffer const wrong_args = spider::core::create_args_buffers(42, "wrong_order");
msgpack::sbuffer const error_result = (*function)(wrong_args);
auto const error = spider::core::response_get_error(error_result);
REQUIRE(error.has_value());
REQUIRE(spider::core::FunctionInvokeError::ArgumentParsingError == std::get<0>(error.value()));

Line range hint 77-92: Consider adding error cases and improving value verification.

The test case handles the happy path well, but could benefit from additional error cases and more robust value verification.

Consider:

  1. Adding test cases for:

    • Empty data value
    • Invalid data ID
    • Null data
  2. Improving the value verification:

     if (result_option.has_value()) {
         spider::core::Data const& result_data = result_option.value();
         REQUIRE(data.get_id() == result_data.get_id());
-        REQUIRE("testtest" == result_data.get_value());
+        auto const expected_value = data.get_value() + data.get_value();
+        REQUIRE(expected_value == result_data.get_value());
     }
CMakeLists.txt (2)

58-60: Excellent addition of Position Independent Code support!

The global PIC setting is essential for the worker task executor implementation, particularly for dynamic loading of functions in subprocesses. This is a recommended practice for modern systems.


124-124: Good practice using external fmt library.

Using external fmt library through SPDLOG_FMT_EXTERNAL is the recommended approach. This provides better control over the fmt version and reduces potential conflicts.

src/spider/worker/TaskExecutor.hpp (1)

35-74: Remove unnecessary NOLINT directives after correcting argument forwarding

After properly forwarding the variadic arguments, the NOLINT directives suppressing cppcoreguidelines-missing-std-forward warnings are no longer needed.

Apply this diff to remove the NOLINT directives:

- // NOLINTBEGIN(cppcoreguidelines-missing-std-forward)
...
- // NOLINTEND(cppcoreguidelines-missing-std-forward)
src/spider/worker/TaskExecutor.cpp (1)

101-103: Handle 'Unknown' response type explicitly

Currently, the Unknown response type is ignored. Consider logging a warning or setting the task state to Error to aid in debugging unexpected response types.

Apply this diff to add logging:

             case TaskExecutorResponseType::Unknown:
+                // Log a warning about the unknown response type
+                // TODO: Implement appropriate logging mechanism
                 break;
src/spider/worker/task_executor.cpp (1)

121-124: Catch exceptions by const reference for consistency and efficiency

Catching exceptions by const reference is a good practice to avoid unnecessary copying and to prevent modification of the exception object.

Here's a suggested code change:

-} catch (std::exception& e) {
+} catch (const std::exception& e) {
     spdlog::error("Exception thrown: {}", e.what());
     return cOtherErr;
 } catch (...) {
     spdlog::error("Unknown exception thrown");
     return cOtherErr;
 }
src/spider/worker/FunctionManager.hpp (6)

95-115: Expand exception handling for comprehensive error capture in response_get_error

Currently, only msgpack::type_error exceptions are caught. To ensure robustness, consider catching additional exceptions such as msgpack::unpack_error or a general std::exception, which can handle other potential errors during unpacking or parsing.


145-166: Ensure consistency in error handling in response_get_result

Similar to response_get_error, broaden the exception handling to capture more types of exceptions. This can prevent unexpected crashes and provide more informative error messages to the caller in case of different parsing errors.


188-197: Optimize argument packing in create_args_request

The current implementation packs arguments individually within a lambda. Consider optimizing by packing the entire argument list directly if possible, which could improve performance and reduce code complexity.


Line range hint 216-239: Enhance error messages for argument parsing in FunctionInvoker::apply

Providing more detailed error messages, such as indicating which argument failed to parse, can greatly aid in debugging. This could involve catching exceptions within the loop and appending specific information about the argument index or type.


290-292: Avoid exposing internal implementations with register_function_invoker

Providing a method to register function invokers directly exposes internal implementation details. This can lead to tight coupling and potential misuse. Consider restricting access or providing a higher-level interface that maintains encapsulation.


309-310: Prevent external modification by avoiding returning internal maps in get_function_map

Returning a reference to the internal FunctionMap exposes the internal state of the class, which can be modified externally. To maintain encapsulation and protect internal data, consider returning a const reference or providing an interface that does not expose the internal map directly.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between b7ce91e and 16b7f92.

📒 Files selected for processing (30)
  • CMakeLists.txt (3 hunks)
  • src/spider/.clang-format (1 hunks)
  • src/spider/CMakeLists.txt (2 hunks)
  • src/spider/core/BoostAsio.hpp (1 hunks)
  • src/spider/core/Data.hpp (1 hunks)
  • src/spider/core/Serializer.hpp (1 hunks)
  • src/spider/core/Task.hpp (1 hunks)
  • src/spider/core/TaskGraph.hpp (1 hunks)
  • src/spider/storage/DataStorage.hpp (1 hunks)
  • src/spider/storage/MetadataStorage.hpp (1 hunks)
  • src/spider/storage/MysqlStorage.cpp (1 hunks)
  • src/spider/storage/MysqlStorage.hpp (1 hunks)
  • src/spider/worker/DllLoader.cpp (1 hunks)
  • src/spider/worker/DllLoader.hpp (1 hunks)
  • src/spider/worker/FunctionManager.hpp (6 hunks)
  • src/spider/worker/TaskExecutor.cpp (1 hunks)
  • src/spider/worker/TaskExecutor.hpp (1 hunks)
  • src/spider/worker/TaskExecutorMessage.hpp (1 hunks)
  • src/spider/worker/message_pipe.cpp (1 hunks)
  • src/spider/worker/message_pipe.hpp (1 hunks)
  • src/spider/worker/task_executor.cpp (1 hunks)
  • tests/CMakeLists.txt (1 hunks)
  • tests/worker/test-FunctionManager.cpp (4 hunks)
  • tests/worker/test-MessagePipe.cpp (1 hunks)
  • tests/worker/test-TaskExecutor.cpp (1 hunks)
  • tests/worker/worker-test.cpp (1 hunks)
  • tools/scripts/lib_install/boost.sh (1 hunks)
  • tools/scripts/lib_install/fmtlib.sh (1 hunks)
  • tools/scripts/lib_install/linux/install-lib.sh (1 hunks)
  • tools/scripts/lib_install/spdlog.sh (1 hunks)
✅ Files skipped from review due to trivial changes (8)
  • src/spider/core/BoostAsio.hpp
  • src/spider/core/Data.hpp
  • src/spider/core/Task.hpp
  • src/spider/core/TaskGraph.hpp
  • src/spider/storage/DataStorage.hpp
  • src/spider/storage/MetadataStorage.hpp
  • src/spider/storage/MysqlStorage.cpp
  • src/spider/storage/MysqlStorage.hpp
🧰 Additional context used
🪛 cppcheck (2.10-2)
tests/worker/test-FunctionManager.cpp

[error] 28-28: syntax error

(syntaxError)

tests/worker/test-TaskExecutor.cpp

[error] 40-40: syntax error

(syntaxError)

🔇 Additional comments (27)
tests/worker/worker-test.cpp (2)

1-4: LGTM! Headers are well-organized.

The includes are minimal and follow good practices with project headers preceding system headers.


16-19: Verify REGISTER_TASK macro implementation.

The task registration looks correct, but let's ensure the macro is properly defined and handles task registration appropriately.

✅ Verification successful

Let me gather more information about the macro definition and its implementation.


Let me check one more aspect to understand how the registration works in the FunctionManager.


REGISTER_TASK macro implementation is correct and well-designed

The macro is properly implemented in src/spider/worker/FunctionManager.hpp and correctly:

  • Uses the singleton pattern via get_instance()
  • Registers functions with their string names using register_function
  • Stores functions in a type-safe manner using function binding
  • Generates unique variable names using ANONYMOUS_VARIABLE
  • Has appropriate NOLINT directives

The usage in the test files is consistent with its design and implementation.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify REGISTER_TASK macro definition and usage

# Search for REGISTER_TASK macro definition
echo "Searching for REGISTER_TASK macro definition:"
rg -p "REGISTER_TASK" --type cpp

# Search for other task registrations to verify consistency
echo "Searching for other task registrations:"
ast-grep --pattern 'REGISTER_TASK($$$)'

Length of output: 874


Script:

#!/bin/bash
# Get the full macro definition and surrounding context
echo "Getting REGISTER_TASK macro definition and context:"
rg -p -B 2 -A 5 "^#define REGISTER_TASK" --type cpp

# Look for FunctionManager class definition to understand registration
echo "Looking for FunctionManager class definition:"
ast-grep --pattern 'class FunctionManager {
  $$$
}'

Length of output: 662


Script:

#!/bin/bash
# Look for register_function method implementation
echo "Searching for register_function method:"
rg -p "register_function" --type cpp -B 2 -A 5

# Look for get_instance implementation
echo "Searching for get_instance implementation:"
ast-grep --pattern 'static FunctionManager& get_instance() {
  $$$
}'

Length of output: 1675

src/spider/.clang-format (2)

13-14: LGTM! The regex pattern aligns with the project's dependencies.

The updated pattern correctly includes all the necessary libraries for the worker task executor implementation, maintaining proper include categorization.


13-14: Consider verifying library header patterns.

Let's verify that all library headers in the codebase match this pattern to ensure consistent formatting.

✅ Verification successful

The library header pattern in .clang-format is appropriate

The current regex pattern in .clang-format correctly prioritizes third-party library headers (fmt, spdlog, mariadb, msgpack, absl, boost, catch2). All other includes found in the codebase are standard C++ library headers (like string, vector, optional, etc.) which don't require special prioritization in the include ordering.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any library includes that might not match the current pattern
# This helps identify if we need to add more libraries to the regex

# Search for includes that might be library headers but don't match our pattern
rg -g '*.{cpp,hpp,h}' '^#include\s*<(?!(spider|fmt|spdlog|mariadb|msgpack|absl|boost|catch2))([^.>]+)>' \
  | grep -v '\.h>' \
  | sort -u

Length of output: 428


Script:

#!/bin/bash
# Let's try a different approach using ripgrep with PCRE2 to find library includes
rg --pcre2 -g '*.{cpp,hpp,h}' '^#include\s*<[^.>]+>' | sort -u

Length of output: 5698

tools/scripts/lib_install/linux/install-lib.sh (2)

Line range hint 1-28: LGTM! The script structure and error handling are well-implemented

The script follows best practices:

  • Uses proper error handling with set -e and set -u
  • Checks for elevated privileges appropriately
  • Uses consistent version specification for all dependencies

24-24: Verify spdlog version compatibility and security implications

The update to spdlog 1.15.0 looks reasonable as it's compatible with fmtlib 11.0.2. However, let's verify there are no breaking changes or security considerations.

✅ Verification successful

Update to spdlog 1.15.0 is safe to proceed

The version upgrade from 1.14.1 to 1.15.0 is safe as:

  • No breaking changes identified in the release notes
  • No security vulnerabilities reported
  • Main changes include bug fixes, documentation improvements, and performance optimizations
  • Compatible with the current fmtlib 11.0.2 (explicitly mentioned in the changelog)
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check spdlog release notes and security advisories

# Check GitHub releases for breaking changes
gh api \
  -H "Accept: application/vnd.github+json" \
  repos/gabime/spdlog/releases | jq -r '.[] | select(.tag_name=="v1.15.0" or .tag_name=="v1.14.1") | {tag_name, body}'

# Check for security advisories
gh api graphql -f query='
{
  securityVulnerabilities(first: 5, ecosystem: PIP, package: "spdlog") {
    nodes {
      advisory {
        summary
        severity
        publishedAt
      }
      vulnerableVersionRange
      firstPatchedVersion {
        identifier
      }
    }
  }
}'

Length of output: 3469

src/spider/worker/message_pipe.hpp (2)

1-9: LGTM: Well-organized header with proper include management

The header organization follows best practices with:

  • Proper header guards
  • Logical grouping of includes (standard library followed by project-specific)
  • Appropriate use of IWYU pragmas to manage dependencies

17-20: Consider platform compatibility for async/sync operations

The mix of async and sync operations with different pipe types suggests platform-specific handling.

Let's verify the platform support:

✅ Verification successful

Platform compatibility is properly handled with POSIX-specific implementations

The codebase shows a well-structured approach to pipe handling:

  • The async operations use platform-agnostic boost::asio::readable_pipe/writable_pipe
  • POSIX-specific operations use boost::asio::posix::stream_descriptor explicitly
  • No Windows-specific conditionals are present, indicating this is a POSIX-targeted implementation

The mix of async and sync operations with different pipe types is intentional and properly segregated based on their use cases, rather than being a platform compatibility concern.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for platform-specific code paths
rg -l "ifdef.*WIN32|ifdef.*_WIN32" 

# Check for related pipe implementations
ast-grep --pattern 'class $_ { 
  $$$ 
  boost::asio::$_pipe$$$ 
  $$$ 
}'

Length of output: 113


Script:

#!/bin/bash
# Let's check for pipe usage patterns more broadly
rg -A 5 "boost::asio::(readable|writable)_pipe" 

# Check for platform-specific includes or definitions
rg -A 2 "#if.*defined.*WIN|#ifdef.*WIN"

# Look for pipe-related implementations
rg -A 5 "stream_descriptor|posix::stream"

# Check for any platform-specific conditionals
rg "(#ifdef|#if defined).*(_WIN32|__linux__|__unix__|__APPLE__)"

Length of output: 6049

tools/scripts/lib_install/boost.sh (2)

39-39: LGTM: Addition of Boost.Process library

The addition of the process library aligns well with the PR's objective of implementing a worker task executor in a separate subprocess.


39-39: Verify Boost.Process compatibility and dependencies

Let's ensure all required dependencies for the worker task executor are included and verify compatibility with the specified Boost version.

✅ Verification successful

Boost.Process dependencies are correctly configured

The verification shows that:

  • Boost.Process is properly included in CMake configurations with all required dependencies
  • The codebase uses Boost.Process v2 API, which is compatible with the specified Boost version (1.74+)
  • All necessary Boost components are included in the bootstrap command:
    • process for task execution
    • filesystem for path handling
    • system for core functionality
    • Other components (iostreams, program_options, regex) are also correctly included
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check Boost.Process dependencies and version compatibility

# Check if CMakeLists.txt requires additional Boost components
echo "Checking CMake requirements..."
rg -A 5 "find_package\(.*Boost.*\)" 

# Check Boost.Process version requirements
echo "Checking Boost.Process version requirements..."
curl -s https://www.boost.org/doc/libs/1_${version_with_underscores}/doc/html/process.html | grep -i "introduced in version"

Length of output: 311


Script:

#!/bin/bash
# Let's check CMake files with a more flexible pattern and examine build configurations
echo "Checking CMake configurations for Boost dependencies..."
fd -e cmake -e txt -x cat {} \; | rg -A 5 "[Bb]oost"

# Check for any Boost-related build configurations or requirements
echo "Checking for Boost-related configurations..."
fd -e sh -e cmake -x cat {} \; | rg -i "boost.*process|process.*boost"

# Look for actual usage of Boost.Process in the codebase
echo "Checking usage of Boost.Process..."
rg -i "boost::process|process\.hpp"

Length of output: 3872

tests/CMakeLists.txt (2)

9-10: LGTM: Test file additions align with PR objectives

The addition of test files for MessagePipe and TaskExecutor components properly supports the new worker task executor feature.


24-33: Verify Boost component versions

The addition of multiple Boost components is appropriate for the worker functionality. However, we should ensure version compatibility.

✅ Verification successful

Boost components are properly versioned and compatible

The project explicitly requires Boost version 1.74 or higher, which is a modern and stable version that includes all the components being used in the test target (filesystem, program_options, process, system). The same components are consistently used across the project in both the main source and test directories.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for Boost version requirements in the project
# Look for any explicit Boost version requirements
rg -g 'CMakeLists.txt' -g '*.cmake' 'find_package\s*\(\s*Boost' -A 3

Length of output: 68


Script:

#!/bin/bash
# Let's try a broader search for Boost-related configurations
rg -g 'CMakeLists.txt' -g '*.cmake' 'Boost' -A 3

# Also search for minimum required CMake version which might indicate compatibility
rg -g 'CMakeLists.txt' -g '*.cmake' 'cmake_minimum_required' -A 1

Length of output: 2125

src/spider/worker/DllLoader.cpp (1)

23-23: ⚠️ Potential issue

Add DLL validation before loading

The code loads DLLs without validating their authenticity or integrity. This could be a security risk if malicious DLLs are loaded.

Consider implementing:

  1. Digital signature verification
  2. Checksum validation
  3. Allowed paths whitelist
  4. File permission checks
src/spider/core/Serializer.hpp (1)

7-8: LGTM: Boost UUID header inclusion is appropriate

The inclusion of the Boost UUID header is necessary for the serialization functionality and is correctly placed at the file level scope.

Let's verify this header is consistently included where UUID serialization is used:

✅ Verification successful

Boost UUID header inclusion is properly used across the codebase

The verification confirms that the boost/uuid/uuid.hpp header is correctly included in all files that use boost::uuids::uuid. The header inclusion in Serializer.hpp is appropriate as it provides the UUID type definition needed for the msgpack serialization adaptor implementation.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for potential UUID serialization usage without proper includes
# Search for UUID serialization usage
rg -l "boost::uuids::uuid" | while read -r file; do
  if ! grep -l "boost/uuid/uuid.hpp" "$file" > /dev/null; then
    echo "Warning: $file might be missing UUID header"
  fi
done

Length of output: 897


Script:

#!/bin/bash
# Let's check the actual content of files using boost::uuids::uuid to ensure proper header inclusion
rg -p "boost::uuids::uuid" --type cpp

Length of output: 12944

tools/scripts/lib_install/spdlog.sh (1)

74-74: Verify the impact of external fmt library dependency

The addition of -DSPDLOG_FMT_EXTERNAL=ON will require the fmt library to be installed separately. This could affect the build process if fmt isn't available.

Let's verify the fmt library installation script and dependencies:

Consider adding a check for the fmt library before proceeding with the build, or document this new dependency in the project's README.

✅ Verification successful

External fmt library dependency is properly handled

The fmtlib installation script (fmtlib.sh) is present in the same directory and includes all necessary build requirements. The script:

  • Handles version-specific installation
  • Includes proper dependency checks
  • Uses CMake with position-independent code enabled
  • Provides proper error handling
  • Supports both direct installation and package creation

No additional checks are required in spdlog.sh as the fmt library can be installed separately using the provided fmtlib.sh script.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for fmt library installation script and its version requirements
fd -t f "fmt.sh" tools/scripts/lib_install/
rg -A 2 "fmt.*version" tools/scripts/lib_install/

Length of output: 313


Script:

#!/bin/bash
# Check if fmt library installation is handled in the build process
rg -A 5 "fmtlib|fmt.sh" tools/scripts/lib_install/spdlog.sh

# Check for version requirements or dependencies
cat tools/scripts/lib_install/fmtlib.sh

Length of output: 2109

src/spider/CMakeLists.txt (2)

10-11: Verify the existence and implementation of new core headers

The addition of BoostAsio.hpp and MsgPack.hpp to the core headers suggests new networking and serialization capabilities. Let's verify these files are properly implemented.

✅ Verification successful

Both core headers are properly implemented with necessary includes

The verification confirms that both headers exist and are properly structured:

  • core/BoostAsio.hpp:

    • Has proper include guards (SPIDER_CORE_BOOSTASIO_HPP)
    • Includes comprehensive Boost.Asio functionality (networking, async I/O, coroutines)
  • core/MsgPack.hpp:

    • Has proper include guards (SPIDER_MSGPACK_HPP)
    • Includes complete MessagePack serialization capabilities
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the existence and basic structure of the new core headers

# Check if the files exist and contain expected content
for file in "core/BoostAsio.hpp" "core/MsgPack.hpp"; do
  echo "Checking $file..."
  if fd "$file" src/spider; then
    # Check for proper include guards and basic structure
    rg -A 5 "^#(pragma once|ifndef|define)" "src/spider/$file"
    # Check for relevant includes
    rg "^#include" "src/spider/$file"
  fi
done

Length of output: 2232


33-43: ⚠️ Potential issue

Review security implications of DllLoader implementation

The addition of DLL loading functionality requires careful security consideration. Please ensure:

  1. Proper validation of DLL paths
  2. Secure handling of loaded libraries
  3. Implementation of safeguards against DLL hijacking
✅ Verification successful

DLL loading implementation has adequate security measures

The DllLoader implementation shows appropriate security considerations:

  1. Uses boost::dll library which provides secure DLL loading mechanisms
  2. Validates file existence before loading
  3. Implements proper error handling with try-catch blocks
  4. Uses absolute paths through std::filesystem
  5. Maintains loaded libraries in a controlled map structure

The implementation follows secure coding practices:

  • Path validation using std::filesystem
  • Exception handling for loading failures
  • Logging of all error conditions
  • Singleton pattern preventing unauthorized loader instances
  • Uses modern C++ features that prevent buffer overflows
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for security considerations in DLL loading implementation

# Check for path validation and security measures
echo "Checking DllLoader implementation..."
rg -A 5 "LoadLibrary|dlopen" src/spider/worker/DllLoader.cpp

# Check for potential security measures
echo "Checking for security validations..."
rg "validate|verify|check|secure" src/spider/worker/DllLoader.cpp

Length of output: 296


Script:

#!/bin/bash
# Let's first check if the file exists and see its content
echo "Checking DllLoader files..."
fd DllLoader

# If found, let's examine the implementation
echo "Content of DllLoader files if they exist:"
for file in $(fd DllLoader); do
    echo "=== Content of $file ==="
    cat "$file"
done

# Let's also search for any dynamic loading related patterns
echo "Searching for dynamic loading patterns..."
rg -i "load.*dll|load.*library|dlopen" --type cpp

Length of output: 3561

tests/worker/test-MessagePipe.cpp (1)

1-13: LGTM! Headers and setup are well-organized.

The includes are appropriate for the test functionality, and the IWYU pragma is correctly used to maintain the MsgPack header dependency.

src/spider/worker/TaskExecutorMessage.hpp (2)

64-70: LGTM! Well-documented MessagePack integration

The MessagePack enum registration is properly implemented with clear documentation about namespace requirements.


1-70: Verify consistent message format usage

Let's verify that all message construction and parsing follows the [type, body] array format consistently.

✅ Verification successful

Let me analyze the message format usage patterns across the codebase to ensure consistency.


Message format is consistently implemented across the codebase

The verification confirms that the [type, body] array format is consistently used throughout the codebase:

  • All message parsing functions verify array type and size >= 2
  • Message construction follows the same pattern with type as first element
  • Error handling is consistent with Unknown type returns
  • FunctionManager properly implements the format for both requests and responses
  • TaskExecutor correctly processes all response types
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for msgpack array construction patterns
rg -l "msgpack.*pack.*array" | while read -r file; do
  echo "=== $file ==="
  rg -A 2 "msgpack.*pack.*array" "$file"
done

# Search for message parsing patterns
rg -l "msgpack.*unpack" | while read -r file; do
  echo "=== $file ==="
  rg -A 2 "msgpack.*unpack" "$file"
done

Length of output: 1884


Script:

#!/bin/bash
# Search for message construction patterns in TaskExecutor related files
rg -A 5 "TaskExecutorR(esponse|equest)Type" --type cpp

# Search for array construction patterns
rg -A 5 "msgpack::type::ARRAY|array\.ptr" --type cpp

Length of output: 14910

tests/worker/test-TaskExecutor.cpp (2)

1-14: LGTM! Headers are well-organized and necessary.

The header organization follows good practices with project headers preceding third-party ones.


17-32: Consider making environment setup more robust

The current implementation might be fragile in different environments:

  1. PATH is set only to the src directory, which might be insufficient for subprocess execution
  2. Other environment variables might be needed for proper subprocess operation

Let's verify the environment setup:

src/spider/worker/TaskExecutor.cpp (2)

19-37: State-checking methods are correctly implemented

The methods completed(), waiting(), succeed(), and error() correctly implement thread-safe state checking using std::lock_guard.


63-105: Asynchronous output handler appropriately processes responses

The process_output_handler() method correctly handles asynchronous responses, updating the task state and result buffer as expected.

src/spider/worker/task_executor.cpp (1)

105-116: Send error response to the caller when a function is not found

When the specified function is not found, it's good practice to log the error and send an error response back to the caller for proper handling.

src/spider/worker/FunctionManager.hpp (2)

301-307: Confirm C++20 compatibility due to usage of std::views::keys

The use of std::views::keys in list_functions() requires C++20 support. Verify that the project is configured to compile with C++20 and that all target environments support this standard to prevent compilation issues.

Run the following script to check the C++ standard specified in the build configuration:

✅ Verification successful

C++20 requirement is properly configured

The project is correctly configured to use C++20 with strict enforcement:

  • CMakeLists.txt sets CMAKE_CXX_STANDARD to 20
  • CMAKE_CXX_STANDARD_REQUIRED is set to ON, ensuring compilation will fail if C++20 is not available

The use of std::views::keys is safe within this configuration.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that C++20 is set in CMakeLists.txt

rg 'set\(CMAKE_CXX_STANDARD\s+20\)' CMakeLists.txt

Length of output: 79


Script:

#!/bin/bash
# Check for C++20 features and compiler requirements in documentation and build files
rg -i "c\+\+\s*20|cpp\s*20|compiler|standard" README.md
rg -i "c\+\+\s*20|cpp\s*20|compiler|standard" BUILDING.md || true
rg -i "c\+\+\s*20|cpp\s*20|compiler" .github/workflows -A 2 -B 2 || true

Length of output: 250


Script:

#!/bin/bash
# Check for compiler flags and C++ standard requirements in CMake files
rg -i "CMAKE_CXX|COMPILER|C\+\+\s*20" CMakeLists.txt -A 2 -B 2

# Check for any compiler version checks or requirements
rg "CMAKE_CXX_COMPILER_VERSION|CMAKE_COMPILER_IS_GNUCXX|MSVC|Clang" CMakeLists.txt

Length of output: 252


29-31: ⚠️ Potential issue

Verify thread safety of function registration in REGISTER_TASK macro

The REGISTER_TASK macro registers functions using an inline variable, which could have implications in multi-threaded environments regarding the order of initialization. Ensure that this registration mechanism is thread-safe and does not introduce race conditions.

Run the following script to identify all usages of REGISTER_TASK and assess for potential thread-safety issues:

✅ Verification successful

Let me check the implementation of FunctionManager to understand the thread safety aspects better.


Let me check one more aspect about the map implementation and any potential concurrent access patterns.


Thread safety of REGISTER_TASK macro is properly handled

The function registration mechanism is thread-safe due to the following factors:

  • The get_instance() uses the C++ static initialization pattern which is guaranteed to be thread-safe since C++11
  • Function registration occurs during static initialization, before any threads are spawned
  • The inline variable ensures each translation unit gets its own instance, preventing initialization order issues
  • The registration is done at compile-time, not runtime, eliminating race conditions

The design follows the standard singleton pattern for registries, which is a common and safe approach for function registration systems.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Find all usages of REGISTER_TASK to review thread safety.

rg 'REGISTER_TASK\(' -A 5

Length of output: 1290


Script:

#!/bin/bash
# Check FunctionManager implementation for thread safety
ast-grep --pattern 'class FunctionManager {
  $$$
}'

# Also check if there are any mutex/locks used
rg -B2 -A5 'register_function|get_instance' src/spider/worker/FunctionManager.hpp

Length of output: 1207


Script:

#!/bin/bash
# Check the map member and any synchronization mechanisms
ast-grep --pattern 'class FunctionManager {
  private:
    $$$
}'

# Look for any concurrent access patterns
rg -l 'std::mutex|std::lock_guard|std::shared_mutex' src/spider/worker/

Length of output: 211

Comment on lines 10 to 13
auto error_test(int const x) -> int {
std::quick_exit(1);
return x;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix unreachable code and consider a less abrupt error simulation.

The current implementation has several issues:

  1. The return x statement is unreachable due to std::quick_exit
  2. Using quick_exit might be too aggressive for testing error scenarios

Consider this alternative approach:

+/**
+ * Test function that simulates an error condition.
+ * Used to verify worker error handling.
+ */
 auto error_test(int const x) -> int {
-    std::quick_exit(1);
-    return x;
+    throw std::runtime_error("Simulated error in worker");
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
auto error_test(int const x) -> int {
std::quick_exit(1);
return x;
}
/**
* Test function that simulates an error condition.
* Used to verify worker error handling.
*/
auto error_test(int const x) -> int {
throw std::runtime_error("Simulated error in worker");
}

Comment on lines +22 to +24
private:
absl::flat_hash_map<std::string, boost::dll::shared_library> m_libraries;
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add thread safety and improve documentation for the library storage.

The private implementation needs improvements:

  1. Add synchronization for thread safety
  2. Document the purpose and lifecycle of stored libraries
  3. Consider using string_view for map lookups

Consider these improvements:

 private:
+    // Mutex for thread-safe access to m_libraries
+    mutable std::shared_mutex m_mutex;
+
+    // Maps library paths to their loaded instances
+    // The loaded libraries persist until the DllLoader is destroyed
     absl::flat_hash_map<std::string, boost::dll::shared_library> m_libraries;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
private:
absl::flat_hash_map<std::string, boost::dll::shared_library> m_libraries;
};
private:
// Mutex for thread-safe access to m_libraries
mutable std::shared_mutex m_mutex;
// Maps library paths to their loaded instances
// The loaded libraries persist until the DllLoader is destroyed
absl::flat_hash_map<std::string, boost::dll::shared_library> m_libraries;
};

Comment on lines +13 to +18
static auto get_instance() -> DllLoader& {
// Explicitly use new because DllLoader instance should not be destroyed
// NOLINTNEXTLINE(cppcoreguidelines-owning-memory)
static auto* instance = new DllLoader();
return *instance;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider improving the singleton implementation for better safety and clarity.

While the current implementation ensures a single instance, consider these improvements:

  1. Delete copy/move constructors and assignment operators to prevent accidental copying.
  2. Document thread safety guarantees.
  3. Consider using std::unique_ptr with a custom deleter for better ownership semantics.

Here's a suggested improvement:

 class DllLoader {
 public:
+    // Delete copy/move operations
+    DllLoader(DllLoader const&) = delete;
+    DllLoader(DllLoader&&) = delete;
+    auto operator=(DllLoader const&) -> DllLoader& = delete;
+    auto operator=(DllLoader&&) -> DllLoader& = delete;
+
+    // Thread-safe singleton access
     static auto get_instance() -> DllLoader& {
-        // Explicitly use new because DllLoader instance should not be destroyed
-        // NOLINTNEXTLINE(cppcoreguidelines-owning-memory)
-        static auto* instance = new DllLoader();
-        return *instance;
+        static DllLoader instance;
+        return instance;
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
static auto get_instance() -> DllLoader& {
// Explicitly use new because DllLoader instance should not be destroyed
// NOLINTNEXTLINE(cppcoreguidelines-owning-memory)
static auto* instance = new DllLoader();
return *instance;
}
class DllLoader {
public:
// Delete copy/move operations
DllLoader(DllLoader const&) = delete;
DllLoader(DllLoader&&) = delete;
auto operator=(DllLoader const&) -> DllLoader& = delete;
auto operator=(DllLoader&&) -> DllLoader& = delete;
// Thread-safe singleton access
static auto get_instance() -> DllLoader& {
static DllLoader instance;
return instance;
}

Comment on lines +1 to +28
#ifndef SPIDER_WORKER_DLLLOADER_HPP
#define SPIDER_WORKER_DLLLOADER_HPP

#include <string>

#include <absl/container/flat_hash_map.h>
#include <boost/dll/shared_library.hpp>

namespace spider::worker {

class DllLoader {
public:
static auto get_instance() -> DllLoader& {
// Explicitly use new because DllLoader instance should not be destroyed
// NOLINTNEXTLINE(cppcoreguidelines-owning-memory)
static auto* instance = new DllLoader();
return *instance;
}

auto load_dll(std::string const& path) -> bool;

private:
absl::flat_hash_map<std::string, boost::dll::shared_library> m_libraries;
};

} // namespace spider::worker

#endif // SPIDER_WORKER_DLLLOADER_HPP
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

⚠️ Potential issue

Based on the review of the implementation, I can now provide the final response:

Security measures needed for DLL loading

The current implementation lacks essential security controls for DLL loading:

  • The load_dll function only checks if the file exists but doesn't validate the path or location
  • DLLs are loaded without any signature verification or integrity checks
  • No restrictions on the DLL loading paths or directory traversal prevention
  • Direct loading of functions from DLLs without any security boundary

Recommended changes:

  • Implement a whitelist of allowed DLL paths or restrict to specific directories
  • Add path sanitization to prevent directory traversal attacks
  • Consider implementing DLL signature verification
  • Add integrity checks before loading DLLs
  • Consider implementing a security boundary or sandbox for loaded DLLs
🔗 Analysis chain

Add security measures for DLL loading.

Loading dynamic libraries can be a security risk. Consider implementing:

  1. Path validation to prevent directory traversal
  2. Signature verification for loaded libraries
  3. Sandboxing or restricted loading paths

Let's check if there are any security measures in place:


Let me gather more information about the DLL loading implementation and related security measures.


Let me check for any additional security controls or configuration settings that might be related to DLL loading.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for security-related code in the implementation
rg -l "verify|signature|sandbox" --type cpp

# Check if there are any path validation functions
ast-grep --pattern 'validate_path($_)'

Length of output: 124


Script:

#!/bin/bash
# Check the DllLoader implementation
fd DllLoader.cpp --exec cat {}

# Look for any path validation or security checks in the codebase
rg -A 5 "load_dll" --type cpp

# Check for any filesystem-related security functions
rg -l "filesystem|fs::" --type cpp

# Look for any configuration or security settings
fd -e json -e yaml -e conf -e toml --exec cat {}

Length of output: 9267


Script:

#!/bin/bash
# Check for any security-related configuration or allowed paths
rg -A 5 "allowed|whitelist|secure|restrict" --type cpp

# Look for the task executor implementation to see how DLLs are loaded
fd TaskExecutor.cpp --exec cat {}

# Check for any security-related constants or configurations
rg -A 5 "SECURITY|ALLOWED_PATHS|DLL" --type hpp

Length of output: 8003

Comment on lines +35 to +37
add_library(worker_test SHARED)
target_sources(worker_test PUBLIC worker/worker-test.cpp)
target_link_libraries(worker_test PUBLIC spider_core)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider adding POSITION_INDEPENDENT_CODE property

Since this is a shared library that links against spider_core, ensure proper PIC settings.

Add the following after the library definition:

 add_library(worker_test SHARED)
+set_property(TARGET worker_test PROPERTY POSITION_INDEPENDENT_CODE ON)
 target_sources(worker_test PUBLIC worker/worker-test.cpp)
 target_link_libraries(worker_test PUBLIC spider_core)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
add_library(worker_test SHARED)
target_sources(worker_test PUBLIC worker/worker-test.cpp)
target_link_libraries(worker_test PUBLIC spider_core)
add_library(worker_test SHARED)
set_property(TARGET worker_test PROPERTY POSITION_INDEPENDENT_CODE ON)
target_sources(worker_test PUBLIC worker/worker-test.cpp)
target_link_libraries(worker_test PUBLIC spider_core)

Comment on lines 28 to 30
} catch (boost::system::system_error const& e) {
return false;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider logging exceptions in send_message for improved diagnostics

Currently, exceptions caught in the send_message function are not logged, which may make debugging issues difficult. Adding logging statements will aid in troubleshooting.

Apply this diff to log the exception:

 catch (boost::system::system_error const& e) {
+    spdlog::error("Failed to send message: {}", e.what());
     return false;
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
} catch (boost::system::system_error const& e) {
return false;
}
} catch (boost::system::system_error const& e) {
spdlog::error("Failed to send message: {}", e.what());
return false;
}

Comment on lines 122 to 125
"Cannot read header from pipe {}: {}",
response_ec.value(),
response_ec.message()
);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Update error message in asynchronous read to reflect correct context

In receive_message_async, the error message implies a failure in reading the header when actually failing to read the message body. Adjusting this message will enhance log accuracy.

Apply this diff:

 spdlog::error(
-    "Cannot read header from pipe {}: {}",
+    "Cannot read message body from pipe {}: {}",
     response_ec.value(),
     response_ec.message()
 );
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"Cannot read header from pipe {}: {}",
response_ec.value(),
response_ec.message()
);
"Cannot read message body from pipe {}: {}",
response_ec.value(),
response_ec.message()
);

Comment on lines 72 to 73
spdlog::error("Fail to read header: {}", e.what());
return std::nullopt;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Correct error message when failing to read the message body

The error message incorrectly states "Fail to read header" when the failure occurs while reading the message body. Updating the message will improve clarity.

Apply this diff to correct the error message:

 catch (boost::system::system_error& e) {
-    spdlog::error("Fail to read header: {}", e.what());
+    spdlog::error("Failed to read message body: {}", e.what());
     return std::nullopt;
 }

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +42 to +44
return false;
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add logging to exception handling in send_message overload

In the overloaded send_message function, exceptions are also caught without logging. Including logs here will provide consistency and assist in diagnosing issues.

Apply this diff:

 catch (boost::system::system_error const& e) {
+    spdlog::error("Failed to send message: {}", e.what());
     return false;
 }

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +23 to +27
// NOLINTBEGIN(cppcoreguidelines-macro-usage)
#define CONCAT_DIRECT(s1, s2) s1##s2
#define CONCAT(s1, s2) CONCAT_DIRECT(s1, s2)
#define ANONYMOUS_VARIABLE(str) CONCAT(str, __COUNTER__)
// NOLINTEND(cppcoreguidelines-macro-usage)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider replacing macros with constexpr functions or inline variables

The use of macros (CONCAT_DIRECT, CONCAT, ANONYMOUS_VARIABLE) can lead to less readable and error-prone code. Modern C++ provides alternatives like constexpr functions or inline variables that are type-safe and scoped appropriately. Refactoring these macros can improve maintainability and clarity.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Outside diff range and nitpick comments (15)
src/spider/worker/DllLoader.cpp (1)

15-21: Enhance path validation for security

While the basic file existence check is good, consider adding additional security checks:

  1. Verify file permissions
  2. Validate file extension
  3. Check for symbolic links that might point to unsafe locations
 if (!std::filesystem::exists(dll_path)) {
     spdlog::error("Cannot find dll file {}", dll_path.string());
     return false;
 }
+if (!dll_path.extension().string().ends_with(".dll") && 
+    !dll_path.extension().string().ends_with(".so")) {
+    spdlog::error("Invalid library extension: {}", dll_path.string());
+    return false;
+}
+std::error_code ec;
+if (std::filesystem::is_symlink(dll_path, ec)) {
+    spdlog::warn("Library path is a symlink: {}", dll_path.string());
+}
src/spider/worker/TaskExecutor.hpp (4)

26-32: Add documentation for TaskExecutorState enum

Consider adding documentation to describe:

  • The purpose of each state
  • Valid state transitions
  • When each state is set

Example:

/// Represents the current state of a task execution
enum class TaskExecutorState : std::uint8_t {
    Running,   ///< Task is currently executing
    Waiting,   ///< Task is waiting for resources or input
    Succeed,   ///< Task completed successfully
    Error,     ///< Task encountered an error
    Cancelled, ///< Task was cancelled by user request
};

89-92: Document exceptions in get_result()

The template method get_result() could throw exceptions during deserialization but doesn't document this behaviour.

Add documentation:

/**
 * Retrieves the result of the task execution.
 * @tparam T The expected result type
 * @return The task result if available
 * @throws msgpack::type_error If result cannot be deserialized to type T
 */
template <class T>
auto get_result() -> std::optional<T>

85-85: Add timeout parameter to wait()

The wait() method could potentially block indefinitely. Consider adding a timeout parameter.

/**
 * Wait for task completion with timeout.
 * @param timeout Maximum duration to wait
 * @return true if completed, false if timed out
 */
template <typename Duration = std::chrono::seconds>
bool wait(Duration timeout = Duration{30});

99-100: Ensure thread-safe state transitions

While there's a mutex for state protection, the state transitions themselves might not be atomic. Consider using std::atomic for m_state or implementing a state machine pattern.

tests/worker/test-TaskExecutor.cpp (2)

17-38: Consider enhancing environment and library path handling.

Several improvements could make these utility functions more robust:

  1. Consider preserving existing PATH value and appending to it
  2. Add existence check for libworker_test.so
  3. Extract common path resolution logic to reduce duplication

Here's a suggested improvement:

+ auto get_executable_dir() -> boost::filesystem::path {
+     return boost::dll::program_location().parent_path();
+ }

  auto get_environment_variable() -> absl::flat_hash_map<
          boost::process::v2::environment::key,
          boost::process::v2::environment::value> {
-     boost::filesystem::path const executable_dir = boost::dll::program_location().parent_path();
+     boost::filesystem::path const executable_dir = get_executable_dir();
      boost::filesystem::path const src_dir = executable_dir.parent_path() / "src" / "spider";
      
      absl::flat_hash_map<
              boost::process::v2::environment::key,
              boost::process::v2::environment::value>
              environment_variables;
-     environment_variables.emplace("PATH", src_dir);
+     // Preserve existing PATH
+     auto current_path = std::getenv("PATH");
+     std::string new_path = src_dir.string();
+     if (current_path) {
+         new_path += ":" + std::string(current_path);
+     }
+     environment_variables.emplace("PATH", new_path);
      return environment_variables;
  }

  auto get_libraries() -> std::vector<std::string> {
-     boost::filesystem::path const executable_dir = boost::dll::program_location().parent_path();
+     boost::filesystem::path const executable_dir = get_executable_dir();
      boost::filesystem::path const lib_path = executable_dir / "libworker_test.so";
+     if (!boost::filesystem::exists(lib_path)) {
+         throw std::runtime_error("Required library not found: " + lib_path.string());
+     }
      return {lib_path.string()};
  }

46-90: Consider improving test robustness and type safety.

  1. Add io_context cleanup between tests to prevent state leakage
  2. Consider using a more type-safe approach for error handling instead of std::tuple

Here's a suggested improvement:

+ struct TaskError {
+     spider::core::FunctionInvokeError error_type;
+     std::string message;
+ };

  TEST_CASE("Task execute fail", "[worker]") {
      // ... setup code ...
      
      boost::asio::io_context context;
+     context.restart(); // Ensure clean state
      
      spider::worker::TaskExecutor
              executor{context, "error_test", get_libraries(), environment_variable, 2};
      context.run();
      executor.wait();
      REQUIRE(executor.error());
-     std::tuple<spider::core::FunctionInvokeError, std::string> error = executor.get_error();
-     REQUIRE(spider::core::FunctionInvokeError::FunctionExecutionError == std::get<0>(error));
+     TaskError error = executor.get_error();
+     REQUIRE(spider::core::FunctionInvokeError::FunctionExecutionError == error.error_type);
  }
src/spider/worker/message_pipe.cpp (3)

19-19: Consider documenting the header size format

The constant cHeaderSize is used for message size encoding, but its format requirements aren't documented. Consider adding a comment explaining that it represents a 16-digit zero-padded decimal number.


34-46: Consider reducing code duplication in send_message overloads

Both send_message overloads share identical logic. Consider extracting the common functionality into a template or helper function.

Example implementation:

template<typename Stream>
auto send_message_impl(Stream& stream, msgpack::sbuffer const& request) -> bool {
    try {
        size_t const size = request.size();
        if (size > 999999999999999) {
            spdlog::error("Message size {} exceeds maximum allowed size", size);
            return false;
        }
        std::string const size_str = fmt::format("{:016d}", size);
        boost::asio::write(stream, boost::asio::buffer(size_str));
        boost::asio::write(stream, boost::asio::buffer(request.data(), size));
        return true;
    } catch (boost::system::system_error const& e) {
        spdlog::error("Failed to send message: {}", e.what());
        return false;
    }
}

85-85: Document the reason for NOLINT directive

The NOLINT directive suppresses a clang-analyzer warning, but the reason isn't documented. Add a comment explaining why this warning can be safely ignored.

src/spider/worker/FunctionManager.hpp (4)

103-109: Replace magic numbers with named constants

The array size check uses magic number '3'. Consider defining a constant to improve maintainability and clarity.

+static constexpr std::size_t ERROR_RESPONSE_SIZE = 3;
+
 if (msgpack::type::ARRAY != object.type || 
-    3 != object.via.array.size) {
+    ERROR_RESPONSE_SIZE != object.via.array.size) {

152-154: Define constant for result response size

Similar to error handling, magic number '2' should be replaced with a named constant.

+static constexpr std::size_t RESULT_RESPONSE_SIZE = 2;
+
 if (msgpack::type::ARRAY != object.type || 
-    2 != object.via.array.size) {
+    RESULT_RESPONSE_SIZE != object.via.array.size) {

Line range hint 255-259: Enhance error message with exception details

The generic error message loses valuable debugging information. Consider including the exception message.

             return create_error_response(
                     FunctionInvokeError::FunctionExecutionError,
-                    "Function execution error"
+                    fmt::format("Function execution error: {}", e.what())
             );

302-308: Optimize list_functions implementation

The current implementation could be more efficient using ranges with a direct transform.

     [[nodiscard]] auto list_functions() const -> std::vector<std::string> {
-        std::vector<std::string> functions;
-        for (std::string const& func_name : std::views::keys(m_map)) {
-            functions.emplace_back(func_name);
-        }
-        return functions;
+        return std::vector<std::string>(m_map.keys().begin(), m_map.keys().end());
     }
src/spider/worker/TaskExecutor.cpp (1)

82-85: Refactor repeated code in process_output_handler() into a helper method

To reduce code duplication and improve maintainability, consider extracting the repeated blocks of code that acquire the mutex, update m_state, write to m_result_buffer, and co_return into a helper method.

Apply this diff to implement the refactor:

+void TaskExecutor::update_state_and_buffer(TaskExecutorState new_state, const msgpack::sbuffer& response) {
+    std::lock_guard const lock(m_state_mutex);
+    m_state = new_state;
+    m_result_buffer.write(response.data(), response.size());
+}

...

            case TaskExecutorResponseType::Error: {
-                std::lock_guard const lock(m_state_mutex);
-                m_state = TaskExecutorState::Error;
-                m_result_buffer.write(response.data(), response.size());
+                update_state_and_buffer(TaskExecutorState::Error, response);
                 co_return;
             }

...

            case TaskExecutorResponseType::Result: {
-                std::lock_guard const lock(m_state_mutex);
-                m_state = TaskExecutorState::Succeed;
-                m_result_buffer.write(response.data(), response.size());
+                update_state_and_buffer(TaskExecutorState::Succeed, response);
                 co_return;
             }

...

            case TaskExecutorResponseType::Cancel: {
-                std::lock_guard const lock(m_state_mutex);
-                m_state = TaskExecutorState::Cancelled;
-                m_result_buffer.write(response.data(), response.size());
+                update_state_and_buffer(TaskExecutorState::Cancelled, response);
                 co_return;
             }

Also applies to: 90-93, 96-99

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 16b7f92 and e505d5b.

📒 Files selected for processing (8)
  • src/spider/worker/DllLoader.cpp (1 hunks)
  • src/spider/worker/FunctionManager.hpp (7 hunks)
  • src/spider/worker/TaskExecutor.cpp (1 hunks)
  • src/spider/worker/TaskExecutor.hpp (1 hunks)
  • src/spider/worker/message_pipe.cpp (1 hunks)
  • tests/worker/test-MessagePipe.cpp (1 hunks)
  • tests/worker/test-TaskExecutor.cpp (1 hunks)
  • tests/worker/worker-test.cpp (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/worker/test-MessagePipe.cpp
  • tests/worker/worker-test.cpp
🧰 Additional context used
🪛 cppcheck (2.10-2)
tests/worker/test-TaskExecutor.cpp

[error] 40-40: syntax error

(syntaxError)

🔇 Additional comments (14)
src/spider/worker/DllLoader.cpp (4)

1-13: LGTM! Headers and namespace organization is clean

The includes are well-organized and necessary for the implementation. Good use of the spider::worker namespace.


32-37: ⚠️ Potential issue

Add thread-safe function registration mechanism

The function registration process remains thread-unsafe. Multiple DLLs loading concurrently could lead to race conditions when accessing the global FunctionManager instance.

Consider implementing a thread-safe registration mechanism:

+std::lock_guard<std::mutex> lock(FunctionManager::get_instance().get_mutex());
 for (auto const& func_iter : function_map) {
     core::FunctionManager::get_instance().register_function_invoker(
             func_iter.first,
             func_iter.second
     );
 }

40-46: 🛠️ Refactor suggestion

Implement specific exception handling and cleanup

The current error handling could be more robust and specific.

Add specific handlers and cleanup:

-} catch (std::exception& e) {
+} catch (boost::dll::library_load_error const& e) {
+    spdlog::error("DLL loading error for {}: {}", dll_path.string(), e.what());
+    cleanup_partial_registrations();
+    return false;
+} catch (std::runtime_error const& e) {
     spdlog::error("Failed to load library {}: {}", dll_path.string(), e.what());
+    cleanup_partial_registrations();
     return false;
 } catch (...) {
     spdlog::error("Failed to load library {}", dll_path.string());
+    cleanup_partial_registrations();
     return false;
 }

39-39: Verify library uniqueness before storage

The current implementation doesn't check if the library is already loaded, which could lead to duplicate loading and resource waste.

Consider adding a uniqueness check:

+if (m_libraries.contains(path_str)) {
+    spdlog::warn("Library {} is already loaded", path_str);
+    return true;
+}
 m_libraries.emplace(path_str, library);
src/spider/worker/TaskExecutor.hpp (2)

1-24: LGTM! Well-organized header structure

The includes are properly organized, necessary headers are present, and include guards are correctly implemented.


67-67: ⚠️ Potential issue

Ensure coroutine lifetime management

The coroutine detachment could lead to undefined behaviour if TaskExecutor is destroyed while the coroutine is still running.

tests/worker/test-TaskExecutor.cpp (2)

1-15: LGTM! Well-organized header inclusions.

The headers are properly organized with project-specific headers first, followed by external dependencies. The IWYU pragma for BoostAsio is appropriate.


40-90: 🛠️ Refactor suggestion

Add more comprehensive error and edge case testing.

While the current test cases cover basic functionality, consider adding tests for:

  1. Library loading failures
  2. Task execution timeout
  3. Invalid task name
  4. Memory allocation failures
  5. Edge cases for argument values (e.g., INT_MAX, negative numbers)

Would you like me to help implement these additional test cases?

🧰 Tools
🪛 cppcheck (2.10-2)

[error] 40-40: syntax error

(syntaxError)

src/spider/worker/message_pipe.cpp (1)

21-32: 🛠️ Refactor suggestion

Verify message size limits to prevent buffer overflow

The message size is formatted into a 16-digit string, which implies a maximum size of 9,999,999,999,999,999 bytes. Consider adding a size limit check to prevent potential buffer overflow.

Apply this diff:

 auto send_message(boost::asio::writable_pipe& pipe, msgpack::sbuffer const& request) -> bool {
     try {
         size_t const size = request.size();
+        if (size > 999999999999999) {
+            spdlog::error("Message size {} exceeds maximum allowed size", size);
+            return false;
+        }
         std::string const size_str = fmt::format("{:016d}", size);
src/spider/worker/FunctionManager.hpp (3)

24-28: Consider modernizing macro usage with constexpr alternatives

Previous feedback about replacing these macros with modern C++ alternatives remains relevant. Consider using constexpr functions or inline variables for improved type safety and maintainability.


321-322: LGTM: DLL alias is properly configured

The BOOST_DLL_ALIAS is correctly set up to export the singleton instance getter.


295-300: Consider thread safety for function map access

The function map is accessed without synchronization. If the manager is used in a multi-threaded context, consider adding thread safety mechanisms.

Also applies to: 310-310

src/spider/worker/TaskExecutor.cpp (2)

19-22: Function completed() correctly determines task completion status

The implementation accurately checks if the task has either succeeded or encountered an error, ensuring correct reporting of the task's completion state.


43-44: Conditional logic corrected to use '&&' operator

The condition now properly checks that the task state is neither Cancelled nor Error before updating the state to Error. This fixes the previous logical error where the condition always evaluated to true.

Comment on lines +36 to +72
template <class... Args>
explicit TaskExecutor(
boost::asio::io_context& context,
std::string const& func_name,
std::vector<std::string> const& libs,
absl::flat_hash_map<
boost::process::v2::environment::key,
boost::process::v2::environment::value> const& environment,
Args&&... args
)
: m_read_pipe(context),
m_write_pipe(context) {
std::vector<std::string> process_args{"--func", func_name, "--libs"};
process_args.insert(process_args.end(), libs.begin(), libs.end());
boost::filesystem::path const exe = boost::process::v2::environment::find_executable(
"spider_task_executor",
environment
);
m_process = std::make_unique<boost::process::v2::process>(
context,
exe,
process_args,
boost::process::v2::process_stdio{
.in = m_write_pipe,
.out = m_read_pipe,
.err = {/*stderr to default*/}
},
boost::process::v2::process_environment{environment}
);

// Set up handler for output file
boost::asio::co_spawn(context, process_output_handler(), boost::asio::detached);

// Send args
msgpack::sbuffer args_request = core::create_args_request(std::forward<Args>(args)...);
send_message(m_write_pipe, args_request);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Constructor needs error handling and could be simplified

The constructor performs multiple operations that could fail:

  1. Executable path resolution
  2. Process creation
  3. Message sending

Consider:

  1. Adding error handling for process creation
  2. Moving process setup to a separate method
  3. Using RAII patterns for cleanup on failure
class TaskExecutor {
    void setup_process(
        boost::asio::io_context& context,
        std::string const& func_name,
        std::vector<std::string> const& libs,
        absl::flat_hash_map<
            boost::process::v2::environment::key,
            boost::process::v2::environment::value> const& environment);
public:
    template <class... Args>
    explicit TaskExecutor(
            boost::asio::io_context& context,
            std::string const& func_name,
            std::vector<std::string> const& libs,
            absl::flat_hash_map<
                    boost::process::v2::environment::key,
                    boost::process::v2::environment::value> const& environment,
            Args&&... args) {
        setup_process(context, func_name, libs, environment);
        send_args(std::forward<Args>(args)...);
    }
};

Comment on lines +82 to +142
auto receive_message_async(std::reference_wrapper<boost::asio::readable_pipe> pipe
) -> boost::asio::awaitable<std::optional<msgpack::sbuffer>> {
std::array<char, cHeaderSize> header_buffer{0};
// NOLINTNEXTLINE(clang-analyzer-core.NullDereference)
auto [header_ec, header_n] = co_await boost::asio::async_read(
pipe.get(),
boost::asio::buffer(header_buffer),
boost::asio::as_tuple(boost::asio::use_awaitable)
);
if (header_ec) {
if (boost::asio::error::eof != header_ec) {
spdlog::error(
"Cannot read header from pipe {}: {}",
header_ec.value(),
header_ec.message()
);
}
co_return std::nullopt;
}
size_t response_size = 0;
try {
response_size = std::stoul(std::string{header_buffer.data(), cHeaderSize});
} catch (std::exception& e) {
spdlog::error(
"Cannot parse header: {} {}",
e.what(),
std::string{header_buffer.data(), cHeaderSize}
);
co_return std::nullopt;
}
if (response_size == 0) {
co_return std::nullopt;
}
std::vector<char> response_buffer(response_size);
auto [response_ec, response_n] = co_await boost::asio::async_read(
pipe.get(),
boost::asio::buffer(response_buffer),
boost::asio::as_tuple(boost::asio::use_awaitable)
);
if (response_ec) {
if (boost::asio::error::eof != response_ec) {
spdlog::error(
"Cannot read response body from pipe {}: {}",
response_ec.value(),
response_ec.message()
);
}
co_return std::nullopt;
}
if (response_size != response_n) {
spdlog::error(
"Response read size not match. Expect {}. Got {}.",
response_size,
response_n
);
co_return std::nullopt;
}
msgpack::sbuffer buffer;
buffer.write(response_buffer.data(), response_buffer.size());
co_return buffer;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve error handling consistency

The asynchronous implementation includes additional checks that aren't present in the synchronous version:

  1. Zero size check (line 112)
  2. Size mismatch check (line 131)

Consider adding these checks to the synchronous version for consistency.

Apply this diff to the synchronous version:

     }
+    if (body_size == 0) {
+        return std::nullopt;
+    }
     std::vector<char> body_buffer(body_size);
     try {
         boost::asio::read(fd, boost::asio::buffer(body_buffer));
+        if (body_size != body_buffer.size()) {
+            spdlog::error(
+                    "Response read size not match. Expect {}. Got {}.",
+                    body_size,
+                    body_buffer.size()
+            );
+            return std::nullopt;
+        }
     } catch (boost::system::system_error& e) {

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +48 to +80
auto receive_message(boost::asio::posix::stream_descriptor& fd) -> std::optional<msgpack::sbuffer> {
std::array<char, cHeaderSize> header_buffer{0};
try {
boost::asio::read(fd, boost::asio::buffer(header_buffer));
} catch (boost::system::system_error& e) {
if (boost::asio::error::eof != e.code()) {
spdlog::error("Fail to read header: {}", e.what());
}
return std::nullopt;
}
size_t body_size = 0;
try {
body_size = std::stoul(std::string{header_buffer.data(), cHeaderSize});
} catch (std::exception& e) {
spdlog::error(
"Cannot parse header: {} {}",
e.what(),
std::string{header_buffer.data(), cHeaderSize}
);
return std::nullopt;
}

std::vector<char> body_buffer(body_size);
try {
boost::asio::read(fd, boost::asio::buffer(body_buffer));
} catch (boost::system::system_error& e) {
spdlog::error("Fail to read response body: {}", e.what());
return std::nullopt;
}
msgpack::sbuffer buffer;
buffer.write(body_buffer.data(), body_buffer.size());
return buffer;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add size validation in receive_message

The function should validate the parsed message size before allocating the buffer to prevent potential memory exhaustion attacks.

Apply this diff:

     try {
         body_size = std::stoul(std::string{header_buffer.data(), cHeaderSize});
+        if (body_size > 999999999999999) {
+            spdlog::error("Received message size {} exceeds maximum allowed size", body_size);
+            return std::nullopt;
+        }
     } catch (std::exception& e) {

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines 113 to 116
return std::make_tuple(
object.via.array.ptr[1].as<FunctionInvokeError>(),
object.via.array.ptr[2].as<std::string>()
);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add bounds checking for array access

Direct array access without bounds checking could lead to undefined behaviour if the array structure is malformed.

+        if (object.via.array.size < 3) {
+            return std::nullopt;
+        }
         return std::make_tuple(
                 object.via.array.ptr[1].as<FunctionInvokeError>(),
                 object.via.array.ptr[2].as<std::string>()
         );

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +189 to +198
template <class... Args>
auto create_args_request(Args&&... args) -> msgpack::sbuffer {
msgpack::sbuffer buffer;
msgpack::packer packer{buffer};
packer.pack_array(2);
packer.pack(worker::TaskExecutorRequestType::Arguments);
packer.pack_array(sizeof...(args));
([&] { packer.pack(args); }(), ...);
return buffer;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add error handling for argument serialization

The function should handle potential serialization failures for the input arguments.

 template <class... Args>
 auto create_args_request(Args&&... args) -> msgpack::sbuffer {
     msgpack::sbuffer buffer;
     msgpack::packer packer{buffer};
+    try {
         packer.pack_array(2);
         packer.pack(worker::TaskExecutorRequestType::Arguments);
         packer.pack_array(sizeof...(args));
         ([&] { packer.pack(args); }(), ...);
         return buffer;
+    } catch (const msgpack::type_error& e) {
+        throw std::runtime_error("Failed to serialize arguments");
+    }
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
template <class... Args>
auto create_args_request(Args&&... args) -> msgpack::sbuffer {
msgpack::sbuffer buffer;
msgpack::packer packer{buffer};
packer.pack_array(2);
packer.pack(worker::TaskExecutorRequestType::Arguments);
packer.pack_array(sizeof...(args));
([&] { packer.pack(args); }(), ...);
return buffer;
}
template <class... Args>
auto create_args_request(Args&&... args) -> msgpack::sbuffer {
msgpack::sbuffer buffer;
msgpack::packer packer{buffer};
try {
packer.pack_array(2);
packer.pack(worker::TaskExecutorRequestType::Arguments);
packer.pack_array(sizeof...(args));
([&] { packer.pack(args); }(), ...);
return buffer;
} catch (const msgpack::type_error& e) {
throw std::runtime_error("Failed to serialize arguments");
}
}

Comment on lines 146 to 147
auto response_get_result(msgpack::sbuffer const& buffer) -> std::optional<T> {
// NOLINTBEGIN(cppcoreguidelines-pro-type-union-access,cppcoreguidelines-pro-bounds-pointer-arithmetic)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add constraints to template parameter

The template parameter T lacks constraints. Consider adding concepts or static_assert to ensure T is msgpack-serializable.

-template <class T>
+template <class T>
+    requires std::is_default_constructible_v<T>

Committable suggestion skipped: line range outside the PR's diff.

}

void TaskExecutor::wait() {
int const exit_code = m_process->wait();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add exception handling for process control methods

To enhance robustness, consider wrapping calls to m_process->wait() and m_process->terminate() with exception handling to catch and handle any potential exceptions thrown by these methods.

Apply this diff to add exception handling:

void TaskExecutor::wait() {
+    try {
        int const exit_code = m_process->wait();
+    } catch (const std::exception& e) {
+        std::lock_guard const lock(m_state_mutex);
+        m_state = TaskExecutorState::Error;
+        core::create_error_buffer(
+            core::FunctionInvokeError::FunctionExecutionError,
+            fmt::format("Exception during wait(): {}", e.what()),
+            m_result_buffer
+        );
+        return;
+    }
    if (exit_code != 0) {
        // existing code...
    }
}

void TaskExecutor::cancel() {
+    try {
        m_process->terminate();
+    } catch (const std::exception& e) {
+        std::lock_guard const lock(m_state_mutex);
+        m_state = TaskExecutorState::Error;
+        core::create_error_buffer(
+            core::FunctionInvokeError::FunctionExecutionError,
+            fmt::format("Exception during terminate(): {}", e.what()),
+            m_result_buffer
+        );
+        return;
+    }
    std::lock_guard const lock(m_state_mutex);
    m_state = TaskExecutorState::Cancelled;
    msgpack::packer packer{m_result_buffer};
    packer.pack("Task cancelled");
}

Also applies to: 55-55

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (8)
src/spider/CMakeLists.txt (2)

65-73: Fix typo in CACHE INTERNAL comment

"filees" should be "files" in the CACHE INTERNAL description.


65-85: Consider using object libraries for shared components

DllLoader and message_pipe are shared between spider_worker and spider_task_executor. Consider creating an object library for these shared components to:

  1. Avoid duplicate compilation
  2. Improve maintainability
  3. Make the shared nature of these components explicit
+add_library(spider_worker_common OBJECT
+    worker/DllLoader.cpp
+    worker/DllLoader.hpp
+    worker/message_pipe.cpp
+    worker/message_pipe.hpp
+)
+
+target_link_libraries(spider_worker_common
+    PRIVATE
+        spider_core
+        Boost::filesystem
+        Boost::system
+)
+
 add_executable(spider_worker)
-target_sources(spider_worker PRIVATE ${SPIDER_WORKER_SOURCES})
+target_sources(spider_worker PRIVATE worker/worker.cpp)
+target_link_libraries(spider_worker PRIVATE spider_worker_common)
 
 add_executable(spider_task_executor)
-target_sources(spider_task_executor PRIVATE ${SPIDER_TASK_EXECUTOR_SOURCES})
+target_sources(spider_task_executor PRIVATE worker/task_executor.cpp)
+target_link_libraries(spider_task_executor PRIVATE spider_worker_common)
src/spider/worker/FunctionManager.cpp (1)

14-39: Consider refactoring for improved readability and error handling.

The function logic is sound, but could benefit from these improvements:

  1. Use early returns to reduce nesting
  2. Add specific error handling for different msgpack exceptions
  3. Define a constant for the expected array size

Here's a suggested improvement:

+constexpr size_t ERROR_RESPONSE_SIZE = 3;
+
 auto response_get_error(msgpack::sbuffer const& buffer
 ) -> std::optional<std::tuple<FunctionInvokeError, std::string>> {
     try {
         msgpack::object_handle const handle = msgpack::unpack(buffer.data(), buffer.size());
         msgpack::object const object = handle.get();
 
-        if (msgpack::type::ARRAY != object.type || 3 != object.via.array.size) {
+        if (msgpack::type::ARRAY != object.type || ERROR_RESPONSE_SIZE != object.via.array.size) {
             return std::nullopt;
         }
 
-        if (worker::TaskExecutorResponseType::Error
-            != object.via.array.ptr[0].as<worker::TaskExecutorResponseType>())
-        {
+        if (worker::TaskExecutorResponseType::Error !=
+            object.via.array.ptr[0].as<worker::TaskExecutorResponseType>()) {
             return std::nullopt;
         }
 
         return std::make_tuple(
                 object.via.array.ptr[1].as<FunctionInvokeError>(),
                 object.via.array.ptr[2].as<std::string>());
-    } catch (msgpack::type_error& e) {
+    } catch (const msgpack::type_error& e) {
+        return std::nullopt;
+    } catch (const msgpack::unpack_error& e) {
         return std::nullopt;
     }
src/spider/worker/FunctionManager.hpp (5)

15-20: Consider reorganizing header includes

The header includes should be organized in groups: standard library headers, third-party library headers, and project headers. Each group should be separated by a blank line and sorted alphabetically within their group.

 #include <tuple>
 #include <type_traits>
 #include <utility>
+
 #include <absl/container/flat_hash_map.h>
 #include <fmt/format.h>
+
 #include "../core/MsgPack.hpp"  // IWYU pragma: keep
 #include "TaskExecutorMessage.hpp"

105-122: Enhance error messages in response validation

The error handling could be more informative by providing specific details about the validation failures.

         if (msgpack::type::ARRAY != object.type || 2 != object.via.array.size) {
-            return std::nullopt;
+            return std::nullopt;  // Consider adding: "Invalid response format: expected array of size 2"
         }

         if (worker::TaskExecutorResponseType::Result
             != object.via.array.ptr[0].as<worker::TaskExecutorResponseType>())
         {
-            return std::nullopt;
+            return std::nullopt;  // Consider adding: "Invalid response type"
         }

199-204: Enhance error message with exception details

Consider including the exception message in the error response to aid in debugging.

             return create_error_response(
                     FunctionInvokeError::ArgumentParsingError,
-                    fmt::format("Cannot parse arguments.")
+                    fmt::format("Cannot parse arguments: {}", e.what())
             );

214-219: Consider catching more specific exceptions

The generic std::exception catch block could be split to handle different types of exceptions more specifically.

-        } catch (std::exception& e) {
+        } catch (const std::runtime_error& e) {
             return create_error_response(
                     FunctionInvokeError::FunctionExecutionError,
-                    "Function execution error"
+                    fmt::format("Runtime error: {}", e.what())
             );
+        } catch (const std::exception& e) {
+            return create_error_response(
+                    FunctionInvokeError::FunctionExecutionError,
+                    fmt::format("Unexpected error: {}", e.what())
+            );
         }

253-253: Add documentation for get_function method

Consider adding a documentation comment explaining the method's behaviour, especially for the case when the function is not found.

+    /**
+     * Retrieves a function by name from the function map.
+     * @param name The name of the function to retrieve
+     * @return Pointer to the function if found, nullptr otherwise
+     */
     [[nodiscard]] auto get_function(std::string const& name) const -> Function const*;
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between e505d5b and fceb829.

📒 Files selected for processing (3)
  • src/spider/CMakeLists.txt (2 hunks)
  • src/spider/worker/FunctionManager.cpp (1 hunks)
  • src/spider/worker/FunctionManager.hpp (6 hunks)
🔇 Additional comments (6)
src/spider/CMakeLists.txt (3)

31-31: ⚠️ Potential issue

Switching to Boost::headers alone is insufficient

The codebase requires several non-header-only Boost components that must be explicitly linked:

  • Boost.Filesystem
  • Boost.Process
  • Boost.Program_Options
  • Boost.System

These components are actively used throughout the codebase and cannot be satisfied by header-only inclusion.


54-63: 🛠️ Refactor suggestion

Add Boost version requirement to ensure compatibility

The project specifies spdlog as a dependency but lacks a specific version requirement for Boost. While Boost components are correctly specified, adding a minimum version requirement would prevent potential compatibility issues.


2-7: Verify architectural separation of concerns

The inclusion of worker/FunctionManager in the core library sources suggests tight coupling between core and worker components. Consider whether this violates separation of concerns and if the FunctionManager should be moved to the worker library instead.

Also applies to: 15-16

✅ Verification successful

Let me gather more information about the architectural dependencies to make a more informed decision.


The architectural separation is appropriate

The current placement of FunctionManager.cpp in the core sources is justified based on the following findings:

  • No core components directly depend on FunctionManager, maintaining loose coupling
  • MsgPack.hpp is a core utility used by both core components (Data.hpp, Serializer.hpp) and worker components, following a shared infrastructure pattern
  • The worker components appropriately depend on core utilities, not the other way around, maintaining a clean dependency direction
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for dependencies between core and worker components

# Check for worker-related includes in core files
echo "Checking core files for worker dependencies..."
rg -l "include.*worker" src/spider/core/

# Check FunctionManager's dependencies
echo -e "\nChecking FunctionManager's dependencies..."
rg "^#include" src/spider/worker/FunctionManager.{hpp,cpp}

Length of output: 1705


Script:

#!/bin/bash
# Check if FunctionManager is used by core components
echo "Checking core components using FunctionManager..."
rg "FunctionManager" src/spider/core/

# Check TaskExecutorMessage dependencies
echo -e "\nChecking TaskExecutorMessage dependencies..."
rg "^#include" src/spider/worker/TaskExecutorMessage.hpp

# Check if MsgPack.hpp is used by other core components
echo -e "\nChecking MsgPack usage in core..."
rg -l "MsgPack\.hpp" src/spider/core/

Length of output: 561

src/spider/worker/FunctionManager.cpp (2)

1-12: LGTM! Well-organized includes with proper dependency management.

The includes are logically grouped and the IWYU pragma indicates thoughtful include management.


63-73: LGTM! Clean implementation of function lookup.

The get_function implementation is concise and efficient. The BOOST_DLL_ALIAS usage is appropriate for dynamic loading capabilities.

src/spider/worker/FunctionManager.hpp (1)

149-158: Add error handling for argument serialization

The function should handle potential serialization failures for the input arguments.

Comment on lines +38 to +48
set(SPIDER_WORKER_SOURCES
worker/DllLoader.hpp
worker/DllLoader.cpp
worker/TaskExecutor.hpp
worker/TaskExecutor.cpp
worker/TaskExecutorMessage.hpp
worker/message_pipe.cpp
worker/message_pipe.hpp
CACHE INTERNAL
"spider worker source files"
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Security measures for DLL loading need improvement

The current implementation lacks sufficient security controls for dynamic library loading:

  • Only basic existence check is performed on the DLL path
  • No path sanitization or validation beyond filesystem checks
  • Missing library signature or integrity verification
  • No configuration for allowed library paths or names
  • Basic error handling is present but security-specific checks are absent

Required improvements:

  • Implement a whitelist of allowed library paths or names
  • Add path sanitization to prevent directory traversal
  • Implement library signature verification
  • Add configuration for security settings
  • Consider implementing library load-time integrity checks
🔗 Analysis chain

Verify security measures for dynamic library loading

The addition of DllLoader introduces potential security risks. Ensure that:

  1. Only authorized libraries can be loaded
  2. The loading path is properly sanitized
  3. Error handling is robust
  4. Library validation is implemented

Let me gather more information about the DLL loading implementation and path handling.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check security measures in DllLoader

# Check for security-related validations
echo "Checking DllLoader implementation..."
rg -A 5 "load|LoadLibrary|dlopen" src/spider/worker/DllLoader.cpp

# Check for error handling
echo -e "\nChecking error handling..."
rg "throw|error|exception" src/spider/worker/DllLoader.cpp

Length of output: 1063


Script:

#!/bin/bash
# Check path sanitization and validation
echo "Checking path handling..."
ast-grep --pattern 'path_str' src/spider/worker/DllLoader.cpp

# Check if there's any path validation or security checks
echo -e "\nChecking security validations..."
rg -A 5 "validate|check|verify|sanitize" src/spider/worker/DllLoader.cpp

# Look for library validation mechanisms
echo -e "\nChecking library validation..."
rg -A 5 "verify|signature|checksum" src/spider/worker/DllLoader.cpp

# Check for any configuration or allowed paths list
echo -e "\nChecking for configuration..."
fd -e hpp -e cpp -e h "DllLoader|Config" src/spider/worker/

Length of output: 808

Comment on lines +41 to +50
auto create_error_response(FunctionInvokeError error, std::string const& message)
-> msgpack::sbuffer {
msgpack::sbuffer buffer;
msgpack::packer packer{buffer};
packer.pack_array(3);
packer.pack(worker::TaskExecutorResponseType::Error);
packer.pack(error);
packer.pack(message);
return buffer;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling and use consistent constants.

The function should use the same constant for array size as response_get_error and include error handling for packing operations.

 auto create_error_response(FunctionInvokeError error, std::string const& message)
         -> msgpack::sbuffer {
     msgpack::sbuffer buffer;
     msgpack::packer packer{buffer};
-    packer.pack_array(3);
+    try {
+        packer.pack_array(ERROR_RESPONSE_SIZE);
+        packer.pack(worker::TaskExecutorResponseType::Error);
+        packer.pack(error);
+        packer.pack(message);
+    } catch (const std::exception& e) {
+        // Create an empty buffer to indicate packing failure
+        buffer.clear();
+    }
-    packer.pack(worker::TaskExecutorResponseType::Error);
-    packer.pack(error);
-    packer.pack(message);
     return buffer;
 }

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +52 to +61
void create_error_buffer(
FunctionInvokeError error,
std::string const& message,
msgpack::sbuffer& buffer
) {
msgpack::packer packer{buffer};
packer.pack_array(2);
packer.pack(error);
packer.pack(message);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add documentation and error handling for buffer creation.

The function uses a different array size than create_error_response which could lead to confusion. Consider:

  1. Adding documentation to explain the different formats
  2. Including error handling for packing operations
  3. Using a constant for the array size
+constexpr size_t ERROR_BUFFER_SIZE = 2;
+
+/**
+ * Creates an error buffer with 2 elements: error code and message.
+ * Note: This differs from create_error_response which includes response type.
+ */
 void create_error_buffer(
         FunctionInvokeError error,
         std::string const& message,
         msgpack::sbuffer& buffer
 ) {
     msgpack::packer packer{buffer};
-    packer.pack_array(2);
-    packer.pack(error);
-    packer.pack(message);
+    try {
+        packer.pack_array(ERROR_BUFFER_SIZE);
+        packer.pack(error);
+        packer.pack(message);
+    } catch (const std::exception& e) {
+        buffer.clear();
+    }
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
void create_error_buffer(
FunctionInvokeError error,
std::string const& message,
msgpack::sbuffer& buffer
) {
msgpack::packer packer{buffer};
packer.pack_array(2);
packer.pack(error);
packer.pack(message);
}
constexpr size_t ERROR_BUFFER_SIZE = 2;
/**
* Creates an error buffer with 2 elements: error code and message.
* Note: This differs from create_error_response which includes response type.
*/
void create_error_buffer(
FunctionInvokeError error,
std::string const& message,
msgpack::sbuffer& buffer
) {
msgpack::packer packer{buffer};
try {
packer.pack_array(ERROR_BUFFER_SIZE);
packer.pack(error);
packer.pack(message);
} catch (const std::exception& e) {
buffer.clear();
}
}

Comment on lines +249 to 251
auto register_function_invoker(std::string const& name, Function f) -> bool {
return m_map.emplace(name, f).second;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add validation for function registration

The register_function_invoker method should validate the function object before registration.

     auto register_function_invoker(std::string const& name, Function f) -> bool {
+        if (!f) {
+            return false;
+        }
         return m_map.emplace(name, f).second;
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
auto register_function_invoker(std::string const& name, Function f) -> bool {
return m_map.emplace(name, f).second;
}
auto register_function_invoker(std::string const& name, Function f) -> bool {
if (!f) {
return false;
}
return m_map.emplace(name, f).second;
}

@sitaowang1998 sitaowang1998 deleted the worker branch November 30, 2024 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant