-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add worker task executor #26
Conversation
WalkthroughThe pull request introduces various modifications primarily to the build configuration and source files of the Spider project. Key changes include enhancements to the Changes
Possibly related PRs
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 31
🧹 Outside diff range and nitpick comments (35)
tests/worker/worker-test.cpp (1)
6-8
: Consider adding documentation for the test function.While the function name is descriptive, adding a brief comment explaining the test's purpose and expected behaviour would improve maintainability.
+/** + * Test function that adds two integers. + * Used to verify basic task execution functionality. + */ auto sum_test(int const x, int const y) -> int { return x + y; }src/spider/worker/DllLoader.hpp (1)
20-20
: Enhance the public interface for better error handling and control.The current interface could be improved in several ways:
- Add error details for failure cases
- Add methods to check library status
- Consider adding unload capability
- Document thread safety requirements
Consider this enhanced interface:
+ /// Load a dynamic library from the specified path + /// @param path Path to the dynamic library + /// @return std::error_code indicating success or specific error + /// @thread_safety Thread-safe + auto load_dll(std::string const& path) -> std::error_code; + + /// Check if a library is already loaded + /// @param path Path to the dynamic library + /// @return true if loaded, false otherwise + /// @thread_safety Thread-safe + auto is_loaded(std::string const& path) const -> bool;src/spider/worker/message_pipe.hpp (2)
12-15
: Add documentation for the send_message overloadsConsider adding Doxygen-style documentation to clarify:
- Purpose and usage of each overload
- Differences between pipe types
- Error conditions and return value semantics
Example documentation:
/** * Sends a message through a writable pipe. * @param pipe The pipe to write to * @param request The message buffer to send * @return true if the message was sent successfully, false otherwise */
12-20
: Consider enhancing error reporting mechanismThe current boolean/optional return types provide limited error context. Consider:
- Using a Result type pattern for richer error reporting
- Adding error codes or categories for specific failure modes
Example enhancement:
enum class MessageError { Success, ConnectionClosed, InvalidMessage, IOError }; auto send_message(boost::asio::writable_pipe& pipe, msgpack::sbuffer const& request) -> Result<void, MessageError>;tests/CMakeLists.txt (1)
17-21
: Consider using CMake path manipulation functionsThe hardcoded relative path "../src/spider/" could break if the directory structure changes.
Consider using CMake's path manipulation functions:
- list(APPEND SPIDER_TEST_WORKER_SOURCES "../src/spider/${worker_source}") + list(APPEND SPIDER_TEST_WORKER_SOURCES "${CMAKE_SOURCE_DIR}/src/spider/${worker_source}")src/spider/worker/DllLoader.cpp (2)
10-10
: Consider using project-relative include pathsThe relative include path "../worker/FunctionManager.hpp" could cause issues if the file structure changes. Consider using a project-relative include path instead.
-#include "../worker/FunctionManager.hpp" +#include "spider/worker/FunctionManager.hpp"
1-49
: Consider adding DLL lifecycle management and version compatibilityThe current implementation could benefit from additional architectural features:
- DLL unloading mechanism to free resources
- Version compatibility checking between DLLs and the host application
- Hot-reloading support for development scenarios
These features would improve the robustness and maintainability of the system. Would you like assistance in implementing any of these improvements?
src/spider/core/Serializer.hpp (2)
Line range hint
11-29
: Consider adding bounds checking for UUID data copyingWhile the size check is present, the
memcpy
operation could benefit from additional safety measures when handling binary data, especially in a worker task execution context.Consider this safer approach:
if (object.via.bin.size != boost::uuids::uuid::static_size()) { throw type_error(); } - std::uint8_t data[boost::uuids::uuid::static_size()]; - std::memcpy(data, object.via.bin.ptr, boost::uuids::uuid::static_size()); + std::array<std::uint8_t, boost::uuids::uuid::static_size()> data; + std::copy_n( + static_cast<const std::uint8_t*>(object.via.bin.ptr), + boost::uuids::uuid::static_size(), + data.begin());
Line range hint
31-43
: Consider using std::span for safer binary data handlingThe current implementation uses a C-style cast for binary data packing, which could be made safer using modern C++ features.
Consider this modernized approach:
- packer.pack_bin_body((char const*)id.data(), id.size()); + auto data_span = std::span<const std::uint8_t>(id.data(), id.size()); + packer.pack_bin_body(reinterpret_cast<const char*>(data_span.data()), + data_span.size());tools/scripts/lib_install/fmtlib.sh (1)
Line range hint
39-46
: Consider enhancing download security.While HTTPS is used for downloads, consider adding checksum verification for the downloaded tarball to ensure integrity.
Here's a suggested improvement:
if [ ! -e ${tar_filename} ] ; then curl -fsSL https://github.com/fmtlib/fmt/archive/refs/tags/${tar_filename} -o ${tar_filename} + # Download checksum + curl -fsSL https://github.com/fmtlib/fmt/releases/download/${version}/sha256sum.txt -o sha256sum.txt + # Verify checksum + sha256sum -c sha256sum.txt fitools/scripts/lib_install/spdlog.sh (1)
Line range hint
1-93
: Enhance script robustness with additional safeguardsWhile the script is well-structured, consider these improvements for better robustness:
Add these checks:
#!/bin/bash + +# Ensure cleanup on script exit +trap 'rm -rf ${temp_dir}' EXIT + +# Validate version format +if ! [[ ${version} =~ ^[0-9]+\.[0-9]+\.[0-9]+$ ]]; then + echo "Error: Version must be in format X.Y.Z" + exit 1 +fi + # Dependencies:These changes will:
- Ensure cleanup even if the script fails
- Validate the version format before proceeding
src/spider/CMakeLists.txt (2)
60-68
: Fix typo in commentThere's a typo in the cache variable comment: "filees" should be "files".
- "spider task executor source filees" + "spider task executor source files"
60-80
: Consider refactoring shared componentsThe task executor shares several source files with the worker target. Consider:
- Creating a shared library for common components
- Refactoring DllLoader and message_pipe into a separate component
+ # Add new shared library + add_library(spider_worker_common STATIC + worker/DllLoader.cpp + worker/message_pipe.cpp + ) + + # Update existing targets to use the common library + target_link_libraries(spider_worker PRIVATE spider_worker_common) + target_link_libraries(spider_task_executor PRIVATE spider_worker_common)tests/worker/test-MessagePipe.cpp (3)
21-22
: Consider using named constants for test data.The magic values in the sample tuple could be better documented using named constants.
+ static constexpr const char* TEST_STRING = "test"; + static constexpr int TEST_VALUE = 3; - constexpr std::tuple cSampleResult = std::make_tuple("test", 3); + constexpr std::tuple cSampleResult = std::make_tuple(TEST_STRING, TEST_VALUE);
39-50
: Simplify nested conditionals using REQUIRE.The nested if statements can be flattened using Catch2's REQUIRE, making the test more readable and failing fast.
- if (response_option.has_value()) { - msgpack::sbuffer const& response_buffer = response_option.value(); - REQUIRE(spider::worker::TaskExecutorResponseType::Result - == spider::worker::get_response_type(response_buffer)); - std::optional<std::tuple<std::string, int>> const parse_response - = spider::core::response_get_result<std::tuple<std::string, int>>(response_buffer); - REQUIRE(parse_response.has_value()); - if (parse_response.has_value()) { - std::tuple<std::string, int> result = parse_response.value(); - REQUIRE(cSampleResult == result); - } - } + msgpack::sbuffer const& response_buffer = response_option.value(); + REQUIRE(spider::worker::TaskExecutorResponseType::Result + == spider::worker::get_response_type(response_buffer)); + auto const parse_response = spider::core::response_get_result<std::tuple<std::string, int>>( + response_buffer); + REQUIRE(parse_response.has_value()); + REQUIRE(cSampleResult == parse_response.value());
15-51
: Add negative test cases.The test suite would benefit from additional test cases covering error scenarios:
- Invalid message format
- Disconnected pipe
- Message size exceeding buffer
Would you like me to help create these additional test cases?
src/spider/worker/TaskExecutorMessage.hpp (1)
1-7
: Add file documentationConsider adding a brief documentation block at the start of the file explaining its purpose and the message protocol structure.
#ifndef SPIDER_WORKER_TASKEXECUTORMESSAGE_HPP #define SPIDER_WORKER_TASKEXECUTORMESSAGE_HPP +/** + * @file TaskExecutorMessage.hpp + * @brief Defines the message protocol for communication between the task executor + * and its parent process, including response and request types. + */ + #include <cstdint>tests/worker/test-TaskExecutor.cpp (3)
34-38
: Add library existence verificationThe function assumes libworker_test.so exists without verification. Consider adding checks to provide better error messages.
auto get_libraries() -> std::vector<std::string> { boost::filesystem::path const executable_dir = boost::dll::program_location().parent_path(); boost::filesystem::path const lib_path = executable_dir / "libworker_test.so"; + if (!boost::filesystem::exists(lib_path)) { + throw std::runtime_error("Required library not found: " + lib_path.string()); + } return {lib_path.string()}; }
40-56
: Enhance test coverage with edge casesWhile the basic success case is covered, consider adding tests for:
- Maximum/minimum integer values
- Zero values
- Negative numbers
Would you like me to help generate additional test cases?
🧰 Tools
🪛 cppcheck (2.10-2)
[error] 40-40: syntax error
(syntaxError)
40-90
: Consider using a test fixture to reduce duplicationThe environment setup and context creation is duplicated across test cases. Consider using a Catch2 test fixture to share this setup.
Here's a suggested structure:
class TaskExecutorTest { protected: TaskExecutorTest() : environment_variable(get_environment_variable()) , context() {} absl::flat_hash_map<boost::process::v2::environment::key, boost::process::v2::environment::value> environment_variable; boost::asio::io_context context; }; TEST_CASE_METHOD(TaskExecutorTest, "Task execute success", "[worker]") { spider::worker::TaskExecutor executor{context, "sum_test", get_libraries(), environment_variable, 2, 3}; // ... rest of the test }🧰 Tools
🪛 cppcheck (2.10-2)
[error] 40-40: syntax error
(syntaxError)
tests/worker/test-FunctionManager.cpp (4)
24-26
: LGTM! Consider adding documentation for test functions.The test function registration is well-structured and provides good coverage for different data types and return values.
Consider adding brief documentation comments above each test function to explain their purpose and expected behaviour:
+// Tests basic integer arithmetic operation REGISTER_TASK(int_test); +// Tests tuple return value handling REGISTER_TASK(tuple_ret_test); +// Tests custom Data type operations REGISTER_TASK(data_test);
Line range hint
28-61
: LGTM! Consider verifying error messages.The test case thoroughly covers both successful and error scenarios for POD inputs. Good job maintaining const correctness and proper error handling.
Consider adding assertions to verify the error messages for better diagnostic coverage:
if (wrong_result_option.has_value()) { REQUIRE(spider::core::FunctionInvokeError::WrongNumberOfArguments == std::get<0>(wrong_result_option.value())); + REQUIRE(std::get<1>(wrong_result_option.value()).find("Expected 2 arguments") != std::string::npos); }
64-75
: Consider adding error cases for tuple return test.While the happy path is well tested, consider adding error cases to ensure robust handling of tuple returns.
Add test cases for:
- Wrong number of tuple elements
- Incorrect tuple element types
- Empty string handling
Example:
// Test wrong argument types spider::core::ArgsBuffer const wrong_args = spider::core::create_args_buffers(42, "wrong_order"); msgpack::sbuffer const error_result = (*function)(wrong_args); auto const error = spider::core::response_get_error(error_result); REQUIRE(error.has_value()); REQUIRE(spider::core::FunctionInvokeError::ArgumentParsingError == std::get<0>(error.value()));
Line range hint
77-92
: Consider adding error cases and improving value verification.The test case handles the happy path well, but could benefit from additional error cases and more robust value verification.
Consider:
Adding test cases for:
- Empty data value
- Invalid data ID
- Null data
Improving the value verification:
if (result_option.has_value()) { spider::core::Data const& result_data = result_option.value(); REQUIRE(data.get_id() == result_data.get_id()); - REQUIRE("testtest" == result_data.get_value()); + auto const expected_value = data.get_value() + data.get_value(); + REQUIRE(expected_value == result_data.get_value()); }CMakeLists.txt (2)
58-60
: Excellent addition of Position Independent Code support!The global PIC setting is essential for the worker task executor implementation, particularly for dynamic loading of functions in subprocesses. This is a recommended practice for modern systems.
124-124
: Good practice using external fmt library.Using external fmt library through SPDLOG_FMT_EXTERNAL is the recommended approach. This provides better control over the fmt version and reduces potential conflicts.
src/spider/worker/TaskExecutor.hpp (1)
35-74
: Remove unnecessaryNOLINT
directives after correcting argument forwardingAfter properly forwarding the variadic arguments, the
NOLINT
directives suppressingcppcoreguidelines-missing-std-forward
warnings are no longer needed.Apply this diff to remove the
NOLINT
directives:- // NOLINTBEGIN(cppcoreguidelines-missing-std-forward) ... - // NOLINTEND(cppcoreguidelines-missing-std-forward)src/spider/worker/TaskExecutor.cpp (1)
101-103
: Handle 'Unknown' response type explicitlyCurrently, the
Unknown
response type is ignored. Consider logging a warning or setting the task state toError
to aid in debugging unexpected response types.Apply this diff to add logging:
case TaskExecutorResponseType::Unknown: + // Log a warning about the unknown response type + // TODO: Implement appropriate logging mechanism break;src/spider/worker/task_executor.cpp (1)
121-124
: Catch exceptions by const reference for consistency and efficiencyCatching exceptions by
const
reference is a good practice to avoid unnecessary copying and to prevent modification of the exception object.Here's a suggested code change:
-} catch (std::exception& e) { +} catch (const std::exception& e) { spdlog::error("Exception thrown: {}", e.what()); return cOtherErr; } catch (...) { spdlog::error("Unknown exception thrown"); return cOtherErr; }src/spider/worker/FunctionManager.hpp (6)
95-115
: Expand exception handling for comprehensive error capture inresponse_get_error
Currently, only
msgpack::type_error
exceptions are caught. To ensure robustness, consider catching additional exceptions such asmsgpack::unpack_error
or a generalstd::exception
, which can handle other potential errors during unpacking or parsing.
145-166
: Ensure consistency in error handling inresponse_get_result
Similar to
response_get_error
, broaden the exception handling to capture more types of exceptions. This can prevent unexpected crashes and provide more informative error messages to the caller in case of different parsing errors.
188-197
: Optimize argument packing increate_args_request
The current implementation packs arguments individually within a lambda. Consider optimizing by packing the entire argument list directly if possible, which could improve performance and reduce code complexity.
Line range hint
216-239
: Enhance error messages for argument parsing inFunctionInvoker::apply
Providing more detailed error messages, such as indicating which argument failed to parse, can greatly aid in debugging. This could involve catching exceptions within the loop and appending specific information about the argument index or type.
290-292
: Avoid exposing internal implementations withregister_function_invoker
Providing a method to register function invokers directly exposes internal implementation details. This can lead to tight coupling and potential misuse. Consider restricting access or providing a higher-level interface that maintains encapsulation.
309-310
: Prevent external modification by avoiding returning internal maps inget_function_map
Returning a reference to the internal
FunctionMap
exposes the internal state of the class, which can be modified externally. To maintain encapsulation and protect internal data, consider returning a const reference or providing an interface that does not expose the internal map directly.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (30)
CMakeLists.txt
(3 hunks)src/spider/.clang-format
(1 hunks)src/spider/CMakeLists.txt
(2 hunks)src/spider/core/BoostAsio.hpp
(1 hunks)src/spider/core/Data.hpp
(1 hunks)src/spider/core/Serializer.hpp
(1 hunks)src/spider/core/Task.hpp
(1 hunks)src/spider/core/TaskGraph.hpp
(1 hunks)src/spider/storage/DataStorage.hpp
(1 hunks)src/spider/storage/MetadataStorage.hpp
(1 hunks)src/spider/storage/MysqlStorage.cpp
(1 hunks)src/spider/storage/MysqlStorage.hpp
(1 hunks)src/spider/worker/DllLoader.cpp
(1 hunks)src/spider/worker/DllLoader.hpp
(1 hunks)src/spider/worker/FunctionManager.hpp
(6 hunks)src/spider/worker/TaskExecutor.cpp
(1 hunks)src/spider/worker/TaskExecutor.hpp
(1 hunks)src/spider/worker/TaskExecutorMessage.hpp
(1 hunks)src/spider/worker/message_pipe.cpp
(1 hunks)src/spider/worker/message_pipe.hpp
(1 hunks)src/spider/worker/task_executor.cpp
(1 hunks)tests/CMakeLists.txt
(1 hunks)tests/worker/test-FunctionManager.cpp
(4 hunks)tests/worker/test-MessagePipe.cpp
(1 hunks)tests/worker/test-TaskExecutor.cpp
(1 hunks)tests/worker/worker-test.cpp
(1 hunks)tools/scripts/lib_install/boost.sh
(1 hunks)tools/scripts/lib_install/fmtlib.sh
(1 hunks)tools/scripts/lib_install/linux/install-lib.sh
(1 hunks)tools/scripts/lib_install/spdlog.sh
(1 hunks)
✅ Files skipped from review due to trivial changes (8)
- src/spider/core/BoostAsio.hpp
- src/spider/core/Data.hpp
- src/spider/core/Task.hpp
- src/spider/core/TaskGraph.hpp
- src/spider/storage/DataStorage.hpp
- src/spider/storage/MetadataStorage.hpp
- src/spider/storage/MysqlStorage.cpp
- src/spider/storage/MysqlStorage.hpp
🧰 Additional context used
🪛 cppcheck (2.10-2)
tests/worker/test-FunctionManager.cpp
[error] 28-28: syntax error
(syntaxError)
tests/worker/test-TaskExecutor.cpp
[error] 40-40: syntax error
(syntaxError)
🔇 Additional comments (27)
tests/worker/worker-test.cpp (2)
1-4
: LGTM! Headers are well-organized.
The includes are minimal and follow good practices with project headers preceding system headers.
16-19
: Verify REGISTER_TASK macro implementation.
The task registration looks correct, but let's ensure the macro is properly defined and handles task registration appropriately.
✅ Verification successful
Let me gather more information about the macro definition and its implementation.
Let me check one more aspect to understand how the registration works in the FunctionManager.
REGISTER_TASK macro implementation is correct and well-designed
The macro is properly implemented in src/spider/worker/FunctionManager.hpp
and correctly:
- Uses the singleton pattern via
get_instance()
- Registers functions with their string names using
register_function
- Stores functions in a type-safe manner using function binding
- Generates unique variable names using ANONYMOUS_VARIABLE
- Has appropriate NOLINT directives
The usage in the test files is consistent with its design and implementation.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify REGISTER_TASK macro definition and usage
# Search for REGISTER_TASK macro definition
echo "Searching for REGISTER_TASK macro definition:"
rg -p "REGISTER_TASK" --type cpp
# Search for other task registrations to verify consistency
echo "Searching for other task registrations:"
ast-grep --pattern 'REGISTER_TASK($$$)'
Length of output: 874
Script:
#!/bin/bash
# Get the full macro definition and surrounding context
echo "Getting REGISTER_TASK macro definition and context:"
rg -p -B 2 -A 5 "^#define REGISTER_TASK" --type cpp
# Look for FunctionManager class definition to understand registration
echo "Looking for FunctionManager class definition:"
ast-grep --pattern 'class FunctionManager {
$$$
}'
Length of output: 662
Script:
#!/bin/bash
# Look for register_function method implementation
echo "Searching for register_function method:"
rg -p "register_function" --type cpp -B 2 -A 5
# Look for get_instance implementation
echo "Searching for get_instance implementation:"
ast-grep --pattern 'static FunctionManager& get_instance() {
$$$
}'
Length of output: 1675
src/spider/.clang-format (2)
13-14
: LGTM! The regex pattern aligns with the project's dependencies.
The updated pattern correctly includes all the necessary libraries for the worker task executor implementation, maintaining proper include categorization.
13-14
: Consider verifying library header patterns.
Let's verify that all library headers in the codebase match this pattern to ensure consistent formatting.
✅ Verification successful
The library header pattern in .clang-format is appropriate
The current regex pattern in .clang-format correctly prioritizes third-party library headers (fmt, spdlog, mariadb, msgpack, absl, boost, catch2). All other includes found in the codebase are standard C++ library headers (like string, vector, optional, etc.) which don't require special prioritization in the include ordering.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for any library includes that might not match the current pattern
# This helps identify if we need to add more libraries to the regex
# Search for includes that might be library headers but don't match our pattern
rg -g '*.{cpp,hpp,h}' '^#include\s*<(?!(spider|fmt|spdlog|mariadb|msgpack|absl|boost|catch2))([^.>]+)>' \
| grep -v '\.h>' \
| sort -u
Length of output: 428
Script:
#!/bin/bash
# Let's try a different approach using ripgrep with PCRE2 to find library includes
rg --pcre2 -g '*.{cpp,hpp,h}' '^#include\s*<[^.>]+>' | sort -u
Length of output: 5698
tools/scripts/lib_install/linux/install-lib.sh (2)
Line range hint 1-28
: LGTM! The script structure and error handling are well-implemented
The script follows best practices:
- Uses proper error handling with
set -e
andset -u
- Checks for elevated privileges appropriately
- Uses consistent version specification for all dependencies
24-24
: Verify spdlog version compatibility and security implications
The update to spdlog 1.15.0 looks reasonable as it's compatible with fmtlib 11.0.2. However, let's verify there are no breaking changes or security considerations.
✅ Verification successful
Update to spdlog 1.15.0 is safe to proceed
The version upgrade from 1.14.1 to 1.15.0 is safe as:
- No breaking changes identified in the release notes
- No security vulnerabilities reported
- Main changes include bug fixes, documentation improvements, and performance optimizations
- Compatible with the current fmtlib 11.0.2 (explicitly mentioned in the changelog)
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check spdlog release notes and security advisories
# Check GitHub releases for breaking changes
gh api \
-H "Accept: application/vnd.github+json" \
repos/gabime/spdlog/releases | jq -r '.[] | select(.tag_name=="v1.15.0" or .tag_name=="v1.14.1") | {tag_name, body}'
# Check for security advisories
gh api graphql -f query='
{
securityVulnerabilities(first: 5, ecosystem: PIP, package: "spdlog") {
nodes {
advisory {
summary
severity
publishedAt
}
vulnerableVersionRange
firstPatchedVersion {
identifier
}
}
}
}'
Length of output: 3469
src/spider/worker/message_pipe.hpp (2)
1-9
: LGTM: Well-organized header with proper include management
The header organization follows best practices with:
- Proper header guards
- Logical grouping of includes (standard library followed by project-specific)
- Appropriate use of IWYU pragmas to manage dependencies
17-20
: Consider platform compatibility for async/sync operations
The mix of async and sync operations with different pipe types suggests platform-specific handling.
Let's verify the platform support:
✅ Verification successful
Platform compatibility is properly handled with POSIX-specific implementations
The codebase shows a well-structured approach to pipe handling:
- The async operations use platform-agnostic
boost::asio::readable_pipe/writable_pipe
- POSIX-specific operations use
boost::asio::posix::stream_descriptor
explicitly - No Windows-specific conditionals are present, indicating this is a POSIX-targeted implementation
The mix of async and sync operations with different pipe types is intentional and properly segregated based on their use cases, rather than being a platform compatibility concern.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check for platform-specific code paths
rg -l "ifdef.*WIN32|ifdef.*_WIN32"
# Check for related pipe implementations
ast-grep --pattern 'class $_ {
$$$
boost::asio::$_pipe$$$
$$$
}'
Length of output: 113
Script:
#!/bin/bash
# Let's check for pipe usage patterns more broadly
rg -A 5 "boost::asio::(readable|writable)_pipe"
# Check for platform-specific includes or definitions
rg -A 2 "#if.*defined.*WIN|#ifdef.*WIN"
# Look for pipe-related implementations
rg -A 5 "stream_descriptor|posix::stream"
# Check for any platform-specific conditionals
rg "(#ifdef|#if defined).*(_WIN32|__linux__|__unix__|__APPLE__)"
Length of output: 6049
tools/scripts/lib_install/boost.sh (2)
39-39
: LGTM: Addition of Boost.Process library
The addition of the process
library aligns well with the PR's objective of implementing a worker task executor in a separate subprocess.
39-39
: Verify Boost.Process compatibility and dependencies
Let's ensure all required dependencies for the worker task executor are included and verify compatibility with the specified Boost version.
✅ Verification successful
Boost.Process dependencies are correctly configured
The verification shows that:
- Boost.Process is properly included in CMake configurations with all required dependencies
- The codebase uses Boost.Process v2 API, which is compatible with the specified Boost version (1.74+)
- All necessary Boost components are included in the bootstrap command:
process
for task executionfilesystem
for path handlingsystem
for core functionality- Other components (iostreams, program_options, regex) are also correctly included
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check Boost.Process dependencies and version compatibility
# Check if CMakeLists.txt requires additional Boost components
echo "Checking CMake requirements..."
rg -A 5 "find_package\(.*Boost.*\)"
# Check Boost.Process version requirements
echo "Checking Boost.Process version requirements..."
curl -s https://www.boost.org/doc/libs/1_${version_with_underscores}/doc/html/process.html | grep -i "introduced in version"
Length of output: 311
Script:
#!/bin/bash
# Let's check CMake files with a more flexible pattern and examine build configurations
echo "Checking CMake configurations for Boost dependencies..."
fd -e cmake -e txt -x cat {} \; | rg -A 5 "[Bb]oost"
# Check for any Boost-related build configurations or requirements
echo "Checking for Boost-related configurations..."
fd -e sh -e cmake -x cat {} \; | rg -i "boost.*process|process.*boost"
# Look for actual usage of Boost.Process in the codebase
echo "Checking usage of Boost.Process..."
rg -i "boost::process|process\.hpp"
Length of output: 3872
tests/CMakeLists.txt (2)
9-10
: LGTM: Test file additions align with PR objectives
The addition of test files for MessagePipe and TaskExecutor components properly supports the new worker task executor feature.
24-33
: Verify Boost component versions
The addition of multiple Boost components is appropriate for the worker functionality. However, we should ensure version compatibility.
✅ Verification successful
Boost components are properly versioned and compatible
The project explicitly requires Boost version 1.74 or higher, which is a modern and stable version that includes all the components being used in the test target (filesystem, program_options, process, system). The same components are consistently used across the project in both the main source and test directories.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for Boost version requirements in the project
# Look for any explicit Boost version requirements
rg -g 'CMakeLists.txt' -g '*.cmake' 'find_package\s*\(\s*Boost' -A 3
Length of output: 68
Script:
#!/bin/bash
# Let's try a broader search for Boost-related configurations
rg -g 'CMakeLists.txt' -g '*.cmake' 'Boost' -A 3
# Also search for minimum required CMake version which might indicate compatibility
rg -g 'CMakeLists.txt' -g '*.cmake' 'cmake_minimum_required' -A 1
Length of output: 2125
src/spider/worker/DllLoader.cpp (1)
23-23
:
Add DLL validation before loading
The code loads DLLs without validating their authenticity or integrity. This could be a security risk if malicious DLLs are loaded.
Consider implementing:
- Digital signature verification
- Checksum validation
- Allowed paths whitelist
- File permission checks
src/spider/core/Serializer.hpp (1)
7-8
: LGTM: Boost UUID header inclusion is appropriate
The inclusion of the Boost UUID header is necessary for the serialization functionality and is correctly placed at the file level scope.
Let's verify this header is consistently included where UUID serialization is used:
✅ Verification successful
Boost UUID header inclusion is properly used across the codebase
The verification confirms that the boost/uuid/uuid.hpp
header is correctly included in all files that use boost::uuids::uuid
. The header inclusion in Serializer.hpp is appropriate as it provides the UUID type definition needed for the msgpack serialization adaptor implementation.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for potential UUID serialization usage without proper includes
# Search for UUID serialization usage
rg -l "boost::uuids::uuid" | while read -r file; do
if ! grep -l "boost/uuid/uuid.hpp" "$file" > /dev/null; then
echo "Warning: $file might be missing UUID header"
fi
done
Length of output: 897
Script:
#!/bin/bash
# Let's check the actual content of files using boost::uuids::uuid to ensure proper header inclusion
rg -p "boost::uuids::uuid" --type cpp
Length of output: 12944
tools/scripts/lib_install/spdlog.sh (1)
74-74
: Verify the impact of external fmt library dependency
The addition of -DSPDLOG_FMT_EXTERNAL=ON
will require the fmt library to be installed separately. This could affect the build process if fmt isn't available.
Let's verify the fmt library installation script and dependencies:
Consider adding a check for the fmt library before proceeding with the build, or document this new dependency in the project's README.
✅ Verification successful
External fmt library dependency is properly handled
The fmtlib installation script (fmtlib.sh) is present in the same directory and includes all necessary build requirements. The script:
- Handles version-specific installation
- Includes proper dependency checks
- Uses CMake with position-independent code enabled
- Provides proper error handling
- Supports both direct installation and package creation
No additional checks are required in spdlog.sh as the fmt library can be installed separately using the provided fmtlib.sh script.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check for fmt library installation script and its version requirements
fd -t f "fmt.sh" tools/scripts/lib_install/
rg -A 2 "fmt.*version" tools/scripts/lib_install/
Length of output: 313
Script:
#!/bin/bash
# Check if fmt library installation is handled in the build process
rg -A 5 "fmtlib|fmt.sh" tools/scripts/lib_install/spdlog.sh
# Check for version requirements or dependencies
cat tools/scripts/lib_install/fmtlib.sh
Length of output: 2109
src/spider/CMakeLists.txt (2)
10-11
: Verify the existence and implementation of new core headers
The addition of BoostAsio.hpp and MsgPack.hpp to the core headers suggests new networking and serialization capabilities. Let's verify these files are properly implemented.
✅ Verification successful
Both core headers are properly implemented with necessary includes
The verification confirms that both headers exist and are properly structured:
-
core/BoostAsio.hpp
:- Has proper include guards (
SPIDER_CORE_BOOSTASIO_HPP
) - Includes comprehensive Boost.Asio functionality (networking, async I/O, coroutines)
- Has proper include guards (
-
core/MsgPack.hpp
:- Has proper include guards (
SPIDER_MSGPACK_HPP
) - Includes complete MessagePack serialization capabilities
- Has proper include guards (
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the existence and basic structure of the new core headers
# Check if the files exist and contain expected content
for file in "core/BoostAsio.hpp" "core/MsgPack.hpp"; do
echo "Checking $file..."
if fd "$file" src/spider; then
# Check for proper include guards and basic structure
rg -A 5 "^#(pragma once|ifndef|define)" "src/spider/$file"
# Check for relevant includes
rg "^#include" "src/spider/$file"
fi
done
Length of output: 2232
33-43
:
Review security implications of DllLoader implementation
The addition of DLL loading functionality requires careful security consideration. Please ensure:
- Proper validation of DLL paths
- Secure handling of loaded libraries
- Implementation of safeguards against DLL hijacking
✅ Verification successful
DLL loading implementation has adequate security measures
The DllLoader implementation shows appropriate security considerations:
- Uses
boost::dll
library which provides secure DLL loading mechanisms - Validates file existence before loading
- Implements proper error handling with try-catch blocks
- Uses absolute paths through std::filesystem
- Maintains loaded libraries in a controlled map structure
The implementation follows secure coding practices:
- Path validation using std::filesystem
- Exception handling for loading failures
- Logging of all error conditions
- Singleton pattern preventing unauthorized loader instances
- Uses modern C++ features that prevent buffer overflows
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for security considerations in DLL loading implementation
# Check for path validation and security measures
echo "Checking DllLoader implementation..."
rg -A 5 "LoadLibrary|dlopen" src/spider/worker/DllLoader.cpp
# Check for potential security measures
echo "Checking for security validations..."
rg "validate|verify|check|secure" src/spider/worker/DllLoader.cpp
Length of output: 296
Script:
#!/bin/bash
# Let's first check if the file exists and see its content
echo "Checking DllLoader files..."
fd DllLoader
# If found, let's examine the implementation
echo "Content of DllLoader files if they exist:"
for file in $(fd DllLoader); do
echo "=== Content of $file ==="
cat "$file"
done
# Let's also search for any dynamic loading related patterns
echo "Searching for dynamic loading patterns..."
rg -i "load.*dll|load.*library|dlopen" --type cpp
Length of output: 3561
tests/worker/test-MessagePipe.cpp (1)
1-13
: LGTM! Headers and setup are well-organized.
The includes are appropriate for the test functionality, and the IWYU pragma is correctly used to maintain the MsgPack header dependency.
src/spider/worker/TaskExecutorMessage.hpp (2)
64-70
: LGTM! Well-documented MessagePack integration
The MessagePack enum registration is properly implemented with clear documentation about namespace requirements.
1-70
: Verify consistent message format usage
Let's verify that all message construction and parsing follows the [type, body] array format consistently.
✅ Verification successful
Let me analyze the message format usage patterns across the codebase to ensure consistency.
Message format is consistently implemented across the codebase
The verification confirms that the [type, body] array format is consistently used throughout the codebase:
- All message parsing functions verify array type and size >= 2
- Message construction follows the same pattern with type as first element
- Error handling is consistent with Unknown type returns
- FunctionManager properly implements the format for both requests and responses
- TaskExecutor correctly processes all response types
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for msgpack array construction patterns
rg -l "msgpack.*pack.*array" | while read -r file; do
echo "=== $file ==="
rg -A 2 "msgpack.*pack.*array" "$file"
done
# Search for message parsing patterns
rg -l "msgpack.*unpack" | while read -r file; do
echo "=== $file ==="
rg -A 2 "msgpack.*unpack" "$file"
done
Length of output: 1884
Script:
#!/bin/bash
# Search for message construction patterns in TaskExecutor related files
rg -A 5 "TaskExecutorR(esponse|equest)Type" --type cpp
# Search for array construction patterns
rg -A 5 "msgpack::type::ARRAY|array\.ptr" --type cpp
Length of output: 14910
tests/worker/test-TaskExecutor.cpp (2)
1-14
: LGTM! Headers are well-organized and necessary.
The header organization follows good practices with project headers preceding third-party ones.
17-32
: Consider making environment setup more robust
The current implementation might be fragile in different environments:
- PATH is set only to the src directory, which might be insufficient for subprocess execution
- Other environment variables might be needed for proper subprocess operation
Let's verify the environment setup:
src/spider/worker/TaskExecutor.cpp (2)
19-37
: State-checking methods are correctly implemented
The methods completed()
, waiting()
, succeed()
, and error()
correctly implement thread-safe state checking using std::lock_guard
.
63-105
: Asynchronous output handler appropriately processes responses
The process_output_handler()
method correctly handles asynchronous responses, updating the task state and result buffer as expected.
src/spider/worker/task_executor.cpp (1)
105-116
: Send error response to the caller when a function is not found
When the specified function is not found, it's good practice to log the error and send an error response back to the caller for proper handling.
src/spider/worker/FunctionManager.hpp (2)
301-307
: Confirm C++20 compatibility due to usage of std::views::keys
The use of std::views::keys
in list_functions()
requires C++20 support. Verify that the project is configured to compile with C++20 and that all target environments support this standard to prevent compilation issues.
Run the following script to check the C++ standard specified in the build configuration:
✅ Verification successful
C++20 requirement is properly configured
The project is correctly configured to use C++20 with strict enforcement:
- CMakeLists.txt sets
CMAKE_CXX_STANDARD
to 20 CMAKE_CXX_STANDARD_REQUIRED
is set to ON, ensuring compilation will fail if C++20 is not available
The use of std::views::keys
is safe within this configuration.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify that C++20 is set in CMakeLists.txt
rg 'set\(CMAKE_CXX_STANDARD\s+20\)' CMakeLists.txt
Length of output: 79
Script:
#!/bin/bash
# Check for C++20 features and compiler requirements in documentation and build files
rg -i "c\+\+\s*20|cpp\s*20|compiler|standard" README.md
rg -i "c\+\+\s*20|cpp\s*20|compiler|standard" BUILDING.md || true
rg -i "c\+\+\s*20|cpp\s*20|compiler" .github/workflows -A 2 -B 2 || true
Length of output: 250
Script:
#!/bin/bash
# Check for compiler flags and C++ standard requirements in CMake files
rg -i "CMAKE_CXX|COMPILER|C\+\+\s*20" CMakeLists.txt -A 2 -B 2
# Check for any compiler version checks or requirements
rg "CMAKE_CXX_COMPILER_VERSION|CMAKE_COMPILER_IS_GNUCXX|MSVC|Clang" CMakeLists.txt
Length of output: 252
29-31
:
Verify thread safety of function registration in REGISTER_TASK
macro
The REGISTER_TASK
macro registers functions using an inline variable, which could have implications in multi-threaded environments regarding the order of initialization. Ensure that this registration mechanism is thread-safe and does not introduce race conditions.
Run the following script to identify all usages of REGISTER_TASK
and assess for potential thread-safety issues:
✅ Verification successful
Let me check the implementation of FunctionManager
to understand the thread safety aspects better.
Let me check one more aspect about the map implementation and any potential concurrent access patterns.
Thread safety of REGISTER_TASK
macro is properly handled
The function registration mechanism is thread-safe due to the following factors:
- The
get_instance()
uses the C++ static initialization pattern which is guaranteed to be thread-safe since C++11 - Function registration occurs during static initialization, before any threads are spawned
- The
inline
variable ensures each translation unit gets its own instance, preventing initialization order issues - The registration is done at compile-time, not runtime, eliminating race conditions
The design follows the standard singleton pattern for registries, which is a common and safe approach for function registration systems.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Find all usages of REGISTER_TASK to review thread safety.
rg 'REGISTER_TASK\(' -A 5
Length of output: 1290
Script:
#!/bin/bash
# Check FunctionManager implementation for thread safety
ast-grep --pattern 'class FunctionManager {
$$$
}'
# Also check if there are any mutex/locks used
rg -B2 -A5 'register_function|get_instance' src/spider/worker/FunctionManager.hpp
Length of output: 1207
Script:
#!/bin/bash
# Check the map member and any synchronization mechanisms
ast-grep --pattern 'class FunctionManager {
private:
$$$
}'
# Look for any concurrent access patterns
rg -l 'std::mutex|std::lock_guard|std::shared_mutex' src/spider/worker/
Length of output: 211
tests/worker/worker-test.cpp
Outdated
auto error_test(int const x) -> int { | ||
std::quick_exit(1); | ||
return x; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix unreachable code and consider a less abrupt error simulation.
The current implementation has several issues:
- The
return x
statement is unreachable due tostd::quick_exit
- Using
quick_exit
might be too aggressive for testing error scenarios
Consider this alternative approach:
+/**
+ * Test function that simulates an error condition.
+ * Used to verify worker error handling.
+ */
auto error_test(int const x) -> int {
- std::quick_exit(1);
- return x;
+ throw std::runtime_error("Simulated error in worker");
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
auto error_test(int const x) -> int { | |
std::quick_exit(1); | |
return x; | |
} | |
/** | |
* Test function that simulates an error condition. | |
* Used to verify worker error handling. | |
*/ | |
auto error_test(int const x) -> int { | |
throw std::runtime_error("Simulated error in worker"); | |
} |
private: | ||
absl::flat_hash_map<std::string, boost::dll::shared_library> m_libraries; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add thread safety and improve documentation for the library storage.
The private implementation needs improvements:
- Add synchronization for thread safety
- Document the purpose and lifecycle of stored libraries
- Consider using string_view for map lookups
Consider these improvements:
private:
+ // Mutex for thread-safe access to m_libraries
+ mutable std::shared_mutex m_mutex;
+
+ // Maps library paths to their loaded instances
+ // The loaded libraries persist until the DllLoader is destroyed
absl::flat_hash_map<std::string, boost::dll::shared_library> m_libraries;
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
private: | |
absl::flat_hash_map<std::string, boost::dll::shared_library> m_libraries; | |
}; | |
private: | |
// Mutex for thread-safe access to m_libraries | |
mutable std::shared_mutex m_mutex; | |
// Maps library paths to their loaded instances | |
// The loaded libraries persist until the DllLoader is destroyed | |
absl::flat_hash_map<std::string, boost::dll::shared_library> m_libraries; | |
}; |
static auto get_instance() -> DllLoader& { | ||
// Explicitly use new because DllLoader instance should not be destroyed | ||
// NOLINTNEXTLINE(cppcoreguidelines-owning-memory) | ||
static auto* instance = new DllLoader(); | ||
return *instance; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider improving the singleton implementation for better safety and clarity.
While the current implementation ensures a single instance, consider these improvements:
- Delete copy/move constructors and assignment operators to prevent accidental copying.
- Document thread safety guarantees.
- Consider using
std::unique_ptr
with a custom deleter for better ownership semantics.
Here's a suggested improvement:
class DllLoader {
public:
+ // Delete copy/move operations
+ DllLoader(DllLoader const&) = delete;
+ DllLoader(DllLoader&&) = delete;
+ auto operator=(DllLoader const&) -> DllLoader& = delete;
+ auto operator=(DllLoader&&) -> DllLoader& = delete;
+
+ // Thread-safe singleton access
static auto get_instance() -> DllLoader& {
- // Explicitly use new because DllLoader instance should not be destroyed
- // NOLINTNEXTLINE(cppcoreguidelines-owning-memory)
- static auto* instance = new DllLoader();
- return *instance;
+ static DllLoader instance;
+ return instance;
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
static auto get_instance() -> DllLoader& { | |
// Explicitly use new because DllLoader instance should not be destroyed | |
// NOLINTNEXTLINE(cppcoreguidelines-owning-memory) | |
static auto* instance = new DllLoader(); | |
return *instance; | |
} | |
class DllLoader { | |
public: | |
// Delete copy/move operations | |
DllLoader(DllLoader const&) = delete; | |
DllLoader(DllLoader&&) = delete; | |
auto operator=(DllLoader const&) -> DllLoader& = delete; | |
auto operator=(DllLoader&&) -> DllLoader& = delete; | |
// Thread-safe singleton access | |
static auto get_instance() -> DllLoader& { | |
static DllLoader instance; | |
return instance; | |
} |
#ifndef SPIDER_WORKER_DLLLOADER_HPP | ||
#define SPIDER_WORKER_DLLLOADER_HPP | ||
|
||
#include <string> | ||
|
||
#include <absl/container/flat_hash_map.h> | ||
#include <boost/dll/shared_library.hpp> | ||
|
||
namespace spider::worker { | ||
|
||
class DllLoader { | ||
public: | ||
static auto get_instance() -> DllLoader& { | ||
// Explicitly use new because DllLoader instance should not be destroyed | ||
// NOLINTNEXTLINE(cppcoreguidelines-owning-memory) | ||
static auto* instance = new DllLoader(); | ||
return *instance; | ||
} | ||
|
||
auto load_dll(std::string const& path) -> bool; | ||
|
||
private: | ||
absl::flat_hash_map<std::string, boost::dll::shared_library> m_libraries; | ||
}; | ||
|
||
} // namespace spider::worker | ||
|
||
#endif // SPIDER_WORKER_DLLLOADER_HPP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Based on the review of the implementation, I can now provide the final response:
Security measures needed for DLL loading
The current implementation lacks essential security controls for DLL loading:
- The
load_dll
function only checks if the file exists but doesn't validate the path or location - DLLs are loaded without any signature verification or integrity checks
- No restrictions on the DLL loading paths or directory traversal prevention
- Direct loading of functions from DLLs without any security boundary
Recommended changes:
- Implement a whitelist of allowed DLL paths or restrict to specific directories
- Add path sanitization to prevent directory traversal attacks
- Consider implementing DLL signature verification
- Add integrity checks before loading DLLs
- Consider implementing a security boundary or sandbox for loaded DLLs
🔗 Analysis chain
Add security measures for DLL loading.
Loading dynamic libraries can be a security risk. Consider implementing:
- Path validation to prevent directory traversal
- Signature verification for loaded libraries
- Sandboxing or restricted loading paths
Let's check if there are any security measures in place:
Let me gather more information about the DLL loading implementation and related security measures.
Let me check for any additional security controls or configuration settings that might be related to DLL loading.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check for security-related code in the implementation
rg -l "verify|signature|sandbox" --type cpp
# Check if there are any path validation functions
ast-grep --pattern 'validate_path($_)'
Length of output: 124
Script:
#!/bin/bash
# Check the DllLoader implementation
fd DllLoader.cpp --exec cat {}
# Look for any path validation or security checks in the codebase
rg -A 5 "load_dll" --type cpp
# Check for any filesystem-related security functions
rg -l "filesystem|fs::" --type cpp
# Look for any configuration or security settings
fd -e json -e yaml -e conf -e toml --exec cat {}
Length of output: 9267
Script:
#!/bin/bash
# Check for any security-related configuration or allowed paths
rg -A 5 "allowed|whitelist|secure|restrict" --type cpp
# Look for the task executor implementation to see how DLLs are loaded
fd TaskExecutor.cpp --exec cat {}
# Check for any security-related constants or configurations
rg -A 5 "SECURITY|ALLOWED_PATHS|DLL" --type hpp
Length of output: 8003
add_library(worker_test SHARED) | ||
target_sources(worker_test PUBLIC worker/worker-test.cpp) | ||
target_link_libraries(worker_test PUBLIC spider_core) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider adding POSITION_INDEPENDENT_CODE property
Since this is a shared library that links against spider_core, ensure proper PIC settings.
Add the following after the library definition:
add_library(worker_test SHARED)
+set_property(TARGET worker_test PROPERTY POSITION_INDEPENDENT_CODE ON)
target_sources(worker_test PUBLIC worker/worker-test.cpp)
target_link_libraries(worker_test PUBLIC spider_core)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
add_library(worker_test SHARED) | |
target_sources(worker_test PUBLIC worker/worker-test.cpp) | |
target_link_libraries(worker_test PUBLIC spider_core) | |
add_library(worker_test SHARED) | |
set_property(TARGET worker_test PROPERTY POSITION_INDEPENDENT_CODE ON) | |
target_sources(worker_test PUBLIC worker/worker-test.cpp) | |
target_link_libraries(worker_test PUBLIC spider_core) |
src/spider/worker/message_pipe.cpp
Outdated
} catch (boost::system::system_error const& e) { | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider logging exceptions in send_message
for improved diagnostics
Currently, exceptions caught in the send_message
function are not logged, which may make debugging issues difficult. Adding logging statements will aid in troubleshooting.
Apply this diff to log the exception:
catch (boost::system::system_error const& e) {
+ spdlog::error("Failed to send message: {}", e.what());
return false;
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
} catch (boost::system::system_error const& e) { | |
return false; | |
} | |
} catch (boost::system::system_error const& e) { | |
spdlog::error("Failed to send message: {}", e.what()); | |
return false; | |
} |
src/spider/worker/message_pipe.cpp
Outdated
"Cannot read header from pipe {}: {}", | ||
response_ec.value(), | ||
response_ec.message() | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update error message in asynchronous read to reflect correct context
In receive_message_async
, the error message implies a failure in reading the header when actually failing to read the message body. Adjusting this message will enhance log accuracy.
Apply this diff:
spdlog::error(
- "Cannot read header from pipe {}: {}",
+ "Cannot read message body from pipe {}: {}",
response_ec.value(),
response_ec.message()
);
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
"Cannot read header from pipe {}: {}", | |
response_ec.value(), | |
response_ec.message() | |
); | |
"Cannot read message body from pipe {}: {}", | |
response_ec.value(), | |
response_ec.message() | |
); |
src/spider/worker/message_pipe.cpp
Outdated
spdlog::error("Fail to read header: {}", e.what()); | ||
return std::nullopt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct error message when failing to read the message body
The error message incorrectly states "Fail to read header" when the failure occurs while reading the message body. Updating the message will improve clarity.
Apply this diff to correct the error message:
catch (boost::system::system_error& e) {
- spdlog::error("Fail to read header: {}", e.what());
+ spdlog::error("Failed to read message body: {}", e.what());
return std::nullopt;
}
Committable suggestion skipped: line range outside the PR's diff.
return false; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add logging to exception handling in send_message
overload
In the overloaded send_message
function, exceptions are also caught without logging. Including logs here will provide consistency and assist in diagnosing issues.
Apply this diff:
catch (boost::system::system_error const& e) {
+ spdlog::error("Failed to send message: {}", e.what());
return false;
}
Committable suggestion skipped: line range outside the PR's diff.
// NOLINTBEGIN(cppcoreguidelines-macro-usage) | ||
#define CONCAT_DIRECT(s1, s2) s1##s2 | ||
#define CONCAT(s1, s2) CONCAT_DIRECT(s1, s2) | ||
#define ANONYMOUS_VARIABLE(str) CONCAT(str, __COUNTER__) | ||
// NOLINTEND(cppcoreguidelines-macro-usage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider replacing macros with constexpr functions or inline variables
The use of macros (CONCAT_DIRECT
, CONCAT
, ANONYMOUS_VARIABLE
) can lead to less readable and error-prone code. Modern C++ provides alternatives like constexpr
functions or inline variables that are type-safe and scoped appropriately. Refactoring these macros can improve maintainability and clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🧹 Outside diff range and nitpick comments (15)
src/spider/worker/DllLoader.cpp (1)
15-21
: Enhance path validation for securityWhile the basic file existence check is good, consider adding additional security checks:
- Verify file permissions
- Validate file extension
- Check for symbolic links that might point to unsafe locations
if (!std::filesystem::exists(dll_path)) { spdlog::error("Cannot find dll file {}", dll_path.string()); return false; } +if (!dll_path.extension().string().ends_with(".dll") && + !dll_path.extension().string().ends_with(".so")) { + spdlog::error("Invalid library extension: {}", dll_path.string()); + return false; +} +std::error_code ec; +if (std::filesystem::is_symlink(dll_path, ec)) { + spdlog::warn("Library path is a symlink: {}", dll_path.string()); +}src/spider/worker/TaskExecutor.hpp (4)
26-32
: Add documentation for TaskExecutorState enumConsider adding documentation to describe:
- The purpose of each state
- Valid state transitions
- When each state is set
Example:
/// Represents the current state of a task execution enum class TaskExecutorState : std::uint8_t { Running, ///< Task is currently executing Waiting, ///< Task is waiting for resources or input Succeed, ///< Task completed successfully Error, ///< Task encountered an error Cancelled, ///< Task was cancelled by user request };
89-92
: Document exceptions in get_result()The template method
get_result()
could throw exceptions during deserialization but doesn't document this behaviour.Add documentation:
/** * Retrieves the result of the task execution. * @tparam T The expected result type * @return The task result if available * @throws msgpack::type_error If result cannot be deserialized to type T */ template <class T> auto get_result() -> std::optional<T>
85-85
: Add timeout parameter to wait()The
wait()
method could potentially block indefinitely. Consider adding a timeout parameter./** * Wait for task completion with timeout. * @param timeout Maximum duration to wait * @return true if completed, false if timed out */ template <typename Duration = std::chrono::seconds> bool wait(Duration timeout = Duration{30});
99-100
: Ensure thread-safe state transitionsWhile there's a mutex for state protection, the state transitions themselves might not be atomic. Consider using std::atomic for m_state or implementing a state machine pattern.
tests/worker/test-TaskExecutor.cpp (2)
17-38
: Consider enhancing environment and library path handling.Several improvements could make these utility functions more robust:
- Consider preserving existing PATH value and appending to it
- Add existence check for libworker_test.so
- Extract common path resolution logic to reduce duplication
Here's a suggested improvement:
+ auto get_executable_dir() -> boost::filesystem::path { + return boost::dll::program_location().parent_path(); + } auto get_environment_variable() -> absl::flat_hash_map< boost::process::v2::environment::key, boost::process::v2::environment::value> { - boost::filesystem::path const executable_dir = boost::dll::program_location().parent_path(); + boost::filesystem::path const executable_dir = get_executable_dir(); boost::filesystem::path const src_dir = executable_dir.parent_path() / "src" / "spider"; absl::flat_hash_map< boost::process::v2::environment::key, boost::process::v2::environment::value> environment_variables; - environment_variables.emplace("PATH", src_dir); + // Preserve existing PATH + auto current_path = std::getenv("PATH"); + std::string new_path = src_dir.string(); + if (current_path) { + new_path += ":" + std::string(current_path); + } + environment_variables.emplace("PATH", new_path); return environment_variables; } auto get_libraries() -> std::vector<std::string> { - boost::filesystem::path const executable_dir = boost::dll::program_location().parent_path(); + boost::filesystem::path const executable_dir = get_executable_dir(); boost::filesystem::path const lib_path = executable_dir / "libworker_test.so"; + if (!boost::filesystem::exists(lib_path)) { + throw std::runtime_error("Required library not found: " + lib_path.string()); + } return {lib_path.string()}; }
46-90
: Consider improving test robustness and type safety.
- Add io_context cleanup between tests to prevent state leakage
- Consider using a more type-safe approach for error handling instead of std::tuple
Here's a suggested improvement:
+ struct TaskError { + spider::core::FunctionInvokeError error_type; + std::string message; + }; TEST_CASE("Task execute fail", "[worker]") { // ... setup code ... boost::asio::io_context context; + context.restart(); // Ensure clean state spider::worker::TaskExecutor executor{context, "error_test", get_libraries(), environment_variable, 2}; context.run(); executor.wait(); REQUIRE(executor.error()); - std::tuple<spider::core::FunctionInvokeError, std::string> error = executor.get_error(); - REQUIRE(spider::core::FunctionInvokeError::FunctionExecutionError == std::get<0>(error)); + TaskError error = executor.get_error(); + REQUIRE(spider::core::FunctionInvokeError::FunctionExecutionError == error.error_type); }src/spider/worker/message_pipe.cpp (3)
19-19
: Consider documenting the header size formatThe constant
cHeaderSize
is used for message size encoding, but its format requirements aren't documented. Consider adding a comment explaining that it represents a 16-digit zero-padded decimal number.
34-46
: Consider reducing code duplication in send_message overloadsBoth
send_message
overloads share identical logic. Consider extracting the common functionality into a template or helper function.Example implementation:
template<typename Stream> auto send_message_impl(Stream& stream, msgpack::sbuffer const& request) -> bool { try { size_t const size = request.size(); if (size > 999999999999999) { spdlog::error("Message size {} exceeds maximum allowed size", size); return false; } std::string const size_str = fmt::format("{:016d}", size); boost::asio::write(stream, boost::asio::buffer(size_str)); boost::asio::write(stream, boost::asio::buffer(request.data(), size)); return true; } catch (boost::system::system_error const& e) { spdlog::error("Failed to send message: {}", e.what()); return false; } }
85-85
: Document the reason for NOLINT directiveThe NOLINT directive suppresses a clang-analyzer warning, but the reason isn't documented. Add a comment explaining why this warning can be safely ignored.
src/spider/worker/FunctionManager.hpp (4)
103-109
: Replace magic numbers with named constantsThe array size check uses magic number '3'. Consider defining a constant to improve maintainability and clarity.
+static constexpr std::size_t ERROR_RESPONSE_SIZE = 3; + if (msgpack::type::ARRAY != object.type || - 3 != object.via.array.size) { + ERROR_RESPONSE_SIZE != object.via.array.size) {
152-154
: Define constant for result response sizeSimilar to error handling, magic number '2' should be replaced with a named constant.
+static constexpr std::size_t RESULT_RESPONSE_SIZE = 2; + if (msgpack::type::ARRAY != object.type || - 2 != object.via.array.size) { + RESULT_RESPONSE_SIZE != object.via.array.size) {
Line range hint
255-259
: Enhance error message with exception detailsThe generic error message loses valuable debugging information. Consider including the exception message.
return create_error_response( FunctionInvokeError::FunctionExecutionError, - "Function execution error" + fmt::format("Function execution error: {}", e.what()) );
302-308
: Optimize list_functions implementationThe current implementation could be more efficient using ranges with a direct transform.
[[nodiscard]] auto list_functions() const -> std::vector<std::string> { - std::vector<std::string> functions; - for (std::string const& func_name : std::views::keys(m_map)) { - functions.emplace_back(func_name); - } - return functions; + return std::vector<std::string>(m_map.keys().begin(), m_map.keys().end()); }src/spider/worker/TaskExecutor.cpp (1)
82-85
: Refactor repeated code inprocess_output_handler()
into a helper methodTo reduce code duplication and improve maintainability, consider extracting the repeated blocks of code that acquire the mutex, update
m_state
, write tom_result_buffer
, andco_return
into a helper method.Apply this diff to implement the refactor:
+void TaskExecutor::update_state_and_buffer(TaskExecutorState new_state, const msgpack::sbuffer& response) { + std::lock_guard const lock(m_state_mutex); + m_state = new_state; + m_result_buffer.write(response.data(), response.size()); +} ... case TaskExecutorResponseType::Error: { - std::lock_guard const lock(m_state_mutex); - m_state = TaskExecutorState::Error; - m_result_buffer.write(response.data(), response.size()); + update_state_and_buffer(TaskExecutorState::Error, response); co_return; } ... case TaskExecutorResponseType::Result: { - std::lock_guard const lock(m_state_mutex); - m_state = TaskExecutorState::Succeed; - m_result_buffer.write(response.data(), response.size()); + update_state_and_buffer(TaskExecutorState::Succeed, response); co_return; } ... case TaskExecutorResponseType::Cancel: { - std::lock_guard const lock(m_state_mutex); - m_state = TaskExecutorState::Cancelled; - m_result_buffer.write(response.data(), response.size()); + update_state_and_buffer(TaskExecutorState::Cancelled, response); co_return; }Also applies to: 90-93, 96-99
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (8)
src/spider/worker/DllLoader.cpp
(1 hunks)src/spider/worker/FunctionManager.hpp
(7 hunks)src/spider/worker/TaskExecutor.cpp
(1 hunks)src/spider/worker/TaskExecutor.hpp
(1 hunks)src/spider/worker/message_pipe.cpp
(1 hunks)tests/worker/test-MessagePipe.cpp
(1 hunks)tests/worker/test-TaskExecutor.cpp
(1 hunks)tests/worker/worker-test.cpp
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/worker/test-MessagePipe.cpp
- tests/worker/worker-test.cpp
🧰 Additional context used
🪛 cppcheck (2.10-2)
tests/worker/test-TaskExecutor.cpp
[error] 40-40: syntax error
(syntaxError)
🔇 Additional comments (14)
src/spider/worker/DllLoader.cpp (4)
1-13
: LGTM! Headers and namespace organization is clean
The includes are well-organized and necessary for the implementation. Good use of the spider::worker namespace.
32-37
:
Add thread-safe function registration mechanism
The function registration process remains thread-unsafe. Multiple DLLs loading concurrently could lead to race conditions when accessing the global FunctionManager instance.
Consider implementing a thread-safe registration mechanism:
+std::lock_guard<std::mutex> lock(FunctionManager::get_instance().get_mutex());
for (auto const& func_iter : function_map) {
core::FunctionManager::get_instance().register_function_invoker(
func_iter.first,
func_iter.second
);
}
40-46
: 🛠️ Refactor suggestion
Implement specific exception handling and cleanup
The current error handling could be more robust and specific.
Add specific handlers and cleanup:
-} catch (std::exception& e) {
+} catch (boost::dll::library_load_error const& e) {
+ spdlog::error("DLL loading error for {}: {}", dll_path.string(), e.what());
+ cleanup_partial_registrations();
+ return false;
+} catch (std::runtime_error const& e) {
spdlog::error("Failed to load library {}: {}", dll_path.string(), e.what());
+ cleanup_partial_registrations();
return false;
} catch (...) {
spdlog::error("Failed to load library {}", dll_path.string());
+ cleanup_partial_registrations();
return false;
}
39-39
: Verify library uniqueness before storage
The current implementation doesn't check if the library is already loaded, which could lead to duplicate loading and resource waste.
Consider adding a uniqueness check:
+if (m_libraries.contains(path_str)) {
+ spdlog::warn("Library {} is already loaded", path_str);
+ return true;
+}
m_libraries.emplace(path_str, library);
src/spider/worker/TaskExecutor.hpp (2)
1-24
: LGTM! Well-organized header structure
The includes are properly organized, necessary headers are present, and include guards are correctly implemented.
67-67
:
Ensure coroutine lifetime management
The coroutine detachment could lead to undefined behaviour if TaskExecutor is destroyed while the coroutine is still running.
tests/worker/test-TaskExecutor.cpp (2)
1-15
: LGTM! Well-organized header inclusions.
The headers are properly organized with project-specific headers first, followed by external dependencies. The IWYU pragma for BoostAsio is appropriate.
40-90
: 🛠️ Refactor suggestion
Add more comprehensive error and edge case testing.
While the current test cases cover basic functionality, consider adding tests for:
- Library loading failures
- Task execution timeout
- Invalid task name
- Memory allocation failures
- Edge cases for argument values (e.g., INT_MAX, negative numbers)
Would you like me to help implement these additional test cases?
🧰 Tools
🪛 cppcheck (2.10-2)
[error] 40-40: syntax error
(syntaxError)
src/spider/worker/message_pipe.cpp (1)
21-32
: 🛠️ Refactor suggestion
Verify message size limits to prevent buffer overflow
The message size is formatted into a 16-digit string, which implies a maximum size of 9,999,999,999,999,999 bytes. Consider adding a size limit check to prevent potential buffer overflow.
Apply this diff:
auto send_message(boost::asio::writable_pipe& pipe, msgpack::sbuffer const& request) -> bool {
try {
size_t const size = request.size();
+ if (size > 999999999999999) {
+ spdlog::error("Message size {} exceeds maximum allowed size", size);
+ return false;
+ }
std::string const size_str = fmt::format("{:016d}", size);
src/spider/worker/FunctionManager.hpp (3)
24-28
: Consider modernizing macro usage with constexpr alternatives
Previous feedback about replacing these macros with modern C++ alternatives remains relevant. Consider using constexpr functions or inline variables for improved type safety and maintainability.
321-322
: LGTM: DLL alias is properly configured
The BOOST_DLL_ALIAS is correctly set up to export the singleton instance getter.
295-300
: Consider thread safety for function map access
The function map is accessed without synchronization. If the manager is used in a multi-threaded context, consider adding thread safety mechanisms.
Also applies to: 310-310
src/spider/worker/TaskExecutor.cpp (2)
19-22
: Function completed()
correctly determines task completion status
The implementation accurately checks if the task has either succeeded or encountered an error, ensuring correct reporting of the task's completion state.
43-44
: Conditional logic corrected to use '&&' operator
The condition now properly checks that the task state is neither Cancelled
nor Error
before updating the state to Error
. This fixes the previous logical error where the condition always evaluated to true
.
template <class... Args> | ||
explicit TaskExecutor( | ||
boost::asio::io_context& context, | ||
std::string const& func_name, | ||
std::vector<std::string> const& libs, | ||
absl::flat_hash_map< | ||
boost::process::v2::environment::key, | ||
boost::process::v2::environment::value> const& environment, | ||
Args&&... args | ||
) | ||
: m_read_pipe(context), | ||
m_write_pipe(context) { | ||
std::vector<std::string> process_args{"--func", func_name, "--libs"}; | ||
process_args.insert(process_args.end(), libs.begin(), libs.end()); | ||
boost::filesystem::path const exe = boost::process::v2::environment::find_executable( | ||
"spider_task_executor", | ||
environment | ||
); | ||
m_process = std::make_unique<boost::process::v2::process>( | ||
context, | ||
exe, | ||
process_args, | ||
boost::process::v2::process_stdio{ | ||
.in = m_write_pipe, | ||
.out = m_read_pipe, | ||
.err = {/*stderr to default*/} | ||
}, | ||
boost::process::v2::process_environment{environment} | ||
); | ||
|
||
// Set up handler for output file | ||
boost::asio::co_spawn(context, process_output_handler(), boost::asio::detached); | ||
|
||
// Send args | ||
msgpack::sbuffer args_request = core::create_args_request(std::forward<Args>(args)...); | ||
send_message(m_write_pipe, args_request); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Constructor needs error handling and could be simplified
The constructor performs multiple operations that could fail:
- Executable path resolution
- Process creation
- Message sending
Consider:
- Adding error handling for process creation
- Moving process setup to a separate method
- Using RAII patterns for cleanup on failure
class TaskExecutor {
void setup_process(
boost::asio::io_context& context,
std::string const& func_name,
std::vector<std::string> const& libs,
absl::flat_hash_map<
boost::process::v2::environment::key,
boost::process::v2::environment::value> const& environment);
public:
template <class... Args>
explicit TaskExecutor(
boost::asio::io_context& context,
std::string const& func_name,
std::vector<std::string> const& libs,
absl::flat_hash_map<
boost::process::v2::environment::key,
boost::process::v2::environment::value> const& environment,
Args&&... args) {
setup_process(context, func_name, libs, environment);
send_args(std::forward<Args>(args)...);
}
};
auto receive_message_async(std::reference_wrapper<boost::asio::readable_pipe> pipe | ||
) -> boost::asio::awaitable<std::optional<msgpack::sbuffer>> { | ||
std::array<char, cHeaderSize> header_buffer{0}; | ||
// NOLINTNEXTLINE(clang-analyzer-core.NullDereference) | ||
auto [header_ec, header_n] = co_await boost::asio::async_read( | ||
pipe.get(), | ||
boost::asio::buffer(header_buffer), | ||
boost::asio::as_tuple(boost::asio::use_awaitable) | ||
); | ||
if (header_ec) { | ||
if (boost::asio::error::eof != header_ec) { | ||
spdlog::error( | ||
"Cannot read header from pipe {}: {}", | ||
header_ec.value(), | ||
header_ec.message() | ||
); | ||
} | ||
co_return std::nullopt; | ||
} | ||
size_t response_size = 0; | ||
try { | ||
response_size = std::stoul(std::string{header_buffer.data(), cHeaderSize}); | ||
} catch (std::exception& e) { | ||
spdlog::error( | ||
"Cannot parse header: {} {}", | ||
e.what(), | ||
std::string{header_buffer.data(), cHeaderSize} | ||
); | ||
co_return std::nullopt; | ||
} | ||
if (response_size == 0) { | ||
co_return std::nullopt; | ||
} | ||
std::vector<char> response_buffer(response_size); | ||
auto [response_ec, response_n] = co_await boost::asio::async_read( | ||
pipe.get(), | ||
boost::asio::buffer(response_buffer), | ||
boost::asio::as_tuple(boost::asio::use_awaitable) | ||
); | ||
if (response_ec) { | ||
if (boost::asio::error::eof != response_ec) { | ||
spdlog::error( | ||
"Cannot read response body from pipe {}: {}", | ||
response_ec.value(), | ||
response_ec.message() | ||
); | ||
} | ||
co_return std::nullopt; | ||
} | ||
if (response_size != response_n) { | ||
spdlog::error( | ||
"Response read size not match. Expect {}. Got {}.", | ||
response_size, | ||
response_n | ||
); | ||
co_return std::nullopt; | ||
} | ||
msgpack::sbuffer buffer; | ||
buffer.write(response_buffer.data(), response_buffer.size()); | ||
co_return buffer; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve error handling consistency
The asynchronous implementation includes additional checks that aren't present in the synchronous version:
- Zero size check (line 112)
- Size mismatch check (line 131)
Consider adding these checks to the synchronous version for consistency.
Apply this diff to the synchronous version:
}
+ if (body_size == 0) {
+ return std::nullopt;
+ }
std::vector<char> body_buffer(body_size);
try {
boost::asio::read(fd, boost::asio::buffer(body_buffer));
+ if (body_size != body_buffer.size()) {
+ spdlog::error(
+ "Response read size not match. Expect {}. Got {}.",
+ body_size,
+ body_buffer.size()
+ );
+ return std::nullopt;
+ }
} catch (boost::system::system_error& e) {
Committable suggestion skipped: line range outside the PR's diff.
auto receive_message(boost::asio::posix::stream_descriptor& fd) -> std::optional<msgpack::sbuffer> { | ||
std::array<char, cHeaderSize> header_buffer{0}; | ||
try { | ||
boost::asio::read(fd, boost::asio::buffer(header_buffer)); | ||
} catch (boost::system::system_error& e) { | ||
if (boost::asio::error::eof != e.code()) { | ||
spdlog::error("Fail to read header: {}", e.what()); | ||
} | ||
return std::nullopt; | ||
} | ||
size_t body_size = 0; | ||
try { | ||
body_size = std::stoul(std::string{header_buffer.data(), cHeaderSize}); | ||
} catch (std::exception& e) { | ||
spdlog::error( | ||
"Cannot parse header: {} {}", | ||
e.what(), | ||
std::string{header_buffer.data(), cHeaderSize} | ||
); | ||
return std::nullopt; | ||
} | ||
|
||
std::vector<char> body_buffer(body_size); | ||
try { | ||
boost::asio::read(fd, boost::asio::buffer(body_buffer)); | ||
} catch (boost::system::system_error& e) { | ||
spdlog::error("Fail to read response body: {}", e.what()); | ||
return std::nullopt; | ||
} | ||
msgpack::sbuffer buffer; | ||
buffer.write(body_buffer.data(), body_buffer.size()); | ||
return buffer; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add size validation in receive_message
The function should validate the parsed message size before allocating the buffer to prevent potential memory exhaustion attacks.
Apply this diff:
try {
body_size = std::stoul(std::string{header_buffer.data(), cHeaderSize});
+ if (body_size > 999999999999999) {
+ spdlog::error("Received message size {} exceeds maximum allowed size", body_size);
+ return std::nullopt;
+ }
} catch (std::exception& e) {
Committable suggestion skipped: line range outside the PR's diff.
return std::make_tuple( | ||
object.via.array.ptr[1].as<FunctionInvokeError>(), | ||
object.via.array.ptr[2].as<std::string>() | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add bounds checking for array access
Direct array access without bounds checking could lead to undefined behaviour if the array structure is malformed.
+ if (object.via.array.size < 3) {
+ return std::nullopt;
+ }
return std::make_tuple(
object.via.array.ptr[1].as<FunctionInvokeError>(),
object.via.array.ptr[2].as<std::string>()
);
Committable suggestion skipped: line range outside the PR's diff.
template <class... Args> | ||
auto create_args_request(Args&&... args) -> msgpack::sbuffer { | ||
msgpack::sbuffer buffer; | ||
msgpack::packer packer{buffer}; | ||
packer.pack_array(2); | ||
packer.pack(worker::TaskExecutorRequestType::Arguments); | ||
packer.pack_array(sizeof...(args)); | ||
([&] { packer.pack(args); }(), ...); | ||
return buffer; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for argument serialization
The function should handle potential serialization failures for the input arguments.
template <class... Args>
auto create_args_request(Args&&... args) -> msgpack::sbuffer {
msgpack::sbuffer buffer;
msgpack::packer packer{buffer};
+ try {
packer.pack_array(2);
packer.pack(worker::TaskExecutorRequestType::Arguments);
packer.pack_array(sizeof...(args));
([&] { packer.pack(args); }(), ...);
return buffer;
+ } catch (const msgpack::type_error& e) {
+ throw std::runtime_error("Failed to serialize arguments");
+ }
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
template <class... Args> | |
auto create_args_request(Args&&... args) -> msgpack::sbuffer { | |
msgpack::sbuffer buffer; | |
msgpack::packer packer{buffer}; | |
packer.pack_array(2); | |
packer.pack(worker::TaskExecutorRequestType::Arguments); | |
packer.pack_array(sizeof...(args)); | |
([&] { packer.pack(args); }(), ...); | |
return buffer; | |
} | |
template <class... Args> | |
auto create_args_request(Args&&... args) -> msgpack::sbuffer { | |
msgpack::sbuffer buffer; | |
msgpack::packer packer{buffer}; | |
try { | |
packer.pack_array(2); | |
packer.pack(worker::TaskExecutorRequestType::Arguments); | |
packer.pack_array(sizeof...(args)); | |
([&] { packer.pack(args); }(), ...); | |
return buffer; | |
} catch (const msgpack::type_error& e) { | |
throw std::runtime_error("Failed to serialize arguments"); | |
} | |
} |
auto response_get_result(msgpack::sbuffer const& buffer) -> std::optional<T> { | ||
// NOLINTBEGIN(cppcoreguidelines-pro-type-union-access,cppcoreguidelines-pro-bounds-pointer-arithmetic) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add constraints to template parameter
The template parameter T lacks constraints. Consider adding concepts or static_assert to ensure T is msgpack-serializable.
-template <class T>
+template <class T>
+ requires std::is_default_constructible_v<T>
Committable suggestion skipped: line range outside the PR's diff.
} | ||
|
||
void TaskExecutor::wait() { | ||
int const exit_code = m_process->wait(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add exception handling for process control methods
To enhance robustness, consider wrapping calls to m_process->wait()
and m_process->terminate()
with exception handling to catch and handle any potential exceptions thrown by these methods.
Apply this diff to add exception handling:
void TaskExecutor::wait() {
+ try {
int const exit_code = m_process->wait();
+ } catch (const std::exception& e) {
+ std::lock_guard const lock(m_state_mutex);
+ m_state = TaskExecutorState::Error;
+ core::create_error_buffer(
+ core::FunctionInvokeError::FunctionExecutionError,
+ fmt::format("Exception during wait(): {}", e.what()),
+ m_result_buffer
+ );
+ return;
+ }
if (exit_code != 0) {
// existing code...
}
}
void TaskExecutor::cancel() {
+ try {
m_process->terminate();
+ } catch (const std::exception& e) {
+ std::lock_guard const lock(m_state_mutex);
+ m_state = TaskExecutorState::Error;
+ core::create_error_buffer(
+ core::FunctionInvokeError::FunctionExecutionError,
+ fmt::format("Exception during terminate(): {}", e.what()),
+ m_result_buffer
+ );
+ return;
+ }
std::lock_guard const lock(m_state_mutex);
m_state = TaskExecutorState::Cancelled;
msgpack::packer packer{m_result_buffer};
packer.pack("Task cancelled");
}
Also applies to: 55-55
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (8)
src/spider/CMakeLists.txt (2)
65-73
: Fix typo in CACHE INTERNAL comment"filees" should be "files" in the CACHE INTERNAL description.
65-85
: Consider using object libraries for shared components
DllLoader
andmessage_pipe
are shared betweenspider_worker
andspider_task_executor
. Consider creating an object library for these shared components to:
- Avoid duplicate compilation
- Improve maintainability
- Make the shared nature of these components explicit
+add_library(spider_worker_common OBJECT + worker/DllLoader.cpp + worker/DllLoader.hpp + worker/message_pipe.cpp + worker/message_pipe.hpp +) + +target_link_libraries(spider_worker_common + PRIVATE + spider_core + Boost::filesystem + Boost::system +) + add_executable(spider_worker) -target_sources(spider_worker PRIVATE ${SPIDER_WORKER_SOURCES}) +target_sources(spider_worker PRIVATE worker/worker.cpp) +target_link_libraries(spider_worker PRIVATE spider_worker_common) add_executable(spider_task_executor) -target_sources(spider_task_executor PRIVATE ${SPIDER_TASK_EXECUTOR_SOURCES}) +target_sources(spider_task_executor PRIVATE worker/task_executor.cpp) +target_link_libraries(spider_task_executor PRIVATE spider_worker_common)src/spider/worker/FunctionManager.cpp (1)
14-39
: Consider refactoring for improved readability and error handling.The function logic is sound, but could benefit from these improvements:
- Use early returns to reduce nesting
- Add specific error handling for different msgpack exceptions
- Define a constant for the expected array size
Here's a suggested improvement:
+constexpr size_t ERROR_RESPONSE_SIZE = 3; + auto response_get_error(msgpack::sbuffer const& buffer ) -> std::optional<std::tuple<FunctionInvokeError, std::string>> { try { msgpack::object_handle const handle = msgpack::unpack(buffer.data(), buffer.size()); msgpack::object const object = handle.get(); - if (msgpack::type::ARRAY != object.type || 3 != object.via.array.size) { + if (msgpack::type::ARRAY != object.type || ERROR_RESPONSE_SIZE != object.via.array.size) { return std::nullopt; } - if (worker::TaskExecutorResponseType::Error - != object.via.array.ptr[0].as<worker::TaskExecutorResponseType>()) - { + if (worker::TaskExecutorResponseType::Error != + object.via.array.ptr[0].as<worker::TaskExecutorResponseType>()) { return std::nullopt; } return std::make_tuple( object.via.array.ptr[1].as<FunctionInvokeError>(), object.via.array.ptr[2].as<std::string>()); - } catch (msgpack::type_error& e) { + } catch (const msgpack::type_error& e) { + return std::nullopt; + } catch (const msgpack::unpack_error& e) { return std::nullopt; }src/spider/worker/FunctionManager.hpp (5)
15-20
: Consider reorganizing header includesThe header includes should be organized in groups: standard library headers, third-party library headers, and project headers. Each group should be separated by a blank line and sorted alphabetically within their group.
#include <tuple> #include <type_traits> #include <utility> + #include <absl/container/flat_hash_map.h> #include <fmt/format.h> + #include "../core/MsgPack.hpp" // IWYU pragma: keep #include "TaskExecutorMessage.hpp"
105-122
: Enhance error messages in response validationThe error handling could be more informative by providing specific details about the validation failures.
if (msgpack::type::ARRAY != object.type || 2 != object.via.array.size) { - return std::nullopt; + return std::nullopt; // Consider adding: "Invalid response format: expected array of size 2" } if (worker::TaskExecutorResponseType::Result != object.via.array.ptr[0].as<worker::TaskExecutorResponseType>()) { - return std::nullopt; + return std::nullopt; // Consider adding: "Invalid response type" }
199-204
: Enhance error message with exception detailsConsider including the exception message in the error response to aid in debugging.
return create_error_response( FunctionInvokeError::ArgumentParsingError, - fmt::format("Cannot parse arguments.") + fmt::format("Cannot parse arguments: {}", e.what()) );
214-219
: Consider catching more specific exceptionsThe generic std::exception catch block could be split to handle different types of exceptions more specifically.
- } catch (std::exception& e) { + } catch (const std::runtime_error& e) { return create_error_response( FunctionInvokeError::FunctionExecutionError, - "Function execution error" + fmt::format("Runtime error: {}", e.what()) ); + } catch (const std::exception& e) { + return create_error_response( + FunctionInvokeError::FunctionExecutionError, + fmt::format("Unexpected error: {}", e.what()) + ); }
253-253
: Add documentation for get_function methodConsider adding a documentation comment explaining the method's behaviour, especially for the case when the function is not found.
+ /** + * Retrieves a function by name from the function map. + * @param name The name of the function to retrieve + * @return Pointer to the function if found, nullptr otherwise + */ [[nodiscard]] auto get_function(std::string const& name) const -> Function const*;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
src/spider/CMakeLists.txt
(2 hunks)src/spider/worker/FunctionManager.cpp
(1 hunks)src/spider/worker/FunctionManager.hpp
(6 hunks)
🔇 Additional comments (6)
src/spider/CMakeLists.txt (3)
31-31
:
Switching to Boost::headers alone is insufficient
The codebase requires several non-header-only Boost components that must be explicitly linked:
- Boost.Filesystem
- Boost.Process
- Boost.Program_Options
- Boost.System
These components are actively used throughout the codebase and cannot be satisfied by header-only inclusion.
54-63
: 🛠️ Refactor suggestion
Add Boost version requirement to ensure compatibility
The project specifies spdlog as a dependency but lacks a specific version requirement for Boost. While Boost components are correctly specified, adding a minimum version requirement would prevent potential compatibility issues.
2-7
: Verify architectural separation of concerns
The inclusion of worker/FunctionManager
in the core library sources suggests tight coupling between core and worker components. Consider whether this violates separation of concerns and if the FunctionManager should be moved to the worker library instead.
Also applies to: 15-16
✅ Verification successful
Let me gather more information about the architectural dependencies to make a more informed decision.
The architectural separation is appropriate
The current placement of FunctionManager.cpp
in the core sources is justified based on the following findings:
- No core components directly depend on
FunctionManager
, maintaining loose coupling MsgPack.hpp
is a core utility used by both core components (Data.hpp
,Serializer.hpp
) and worker components, following a shared infrastructure pattern- The worker components appropriately depend on core utilities, not the other way around, maintaining a clean dependency direction
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for dependencies between core and worker components
# Check for worker-related includes in core files
echo "Checking core files for worker dependencies..."
rg -l "include.*worker" src/spider/core/
# Check FunctionManager's dependencies
echo -e "\nChecking FunctionManager's dependencies..."
rg "^#include" src/spider/worker/FunctionManager.{hpp,cpp}
Length of output: 1705
Script:
#!/bin/bash
# Check if FunctionManager is used by core components
echo "Checking core components using FunctionManager..."
rg "FunctionManager" src/spider/core/
# Check TaskExecutorMessage dependencies
echo -e "\nChecking TaskExecutorMessage dependencies..."
rg "^#include" src/spider/worker/TaskExecutorMessage.hpp
# Check if MsgPack.hpp is used by other core components
echo -e "\nChecking MsgPack usage in core..."
rg -l "MsgPack\.hpp" src/spider/core/
Length of output: 561
src/spider/worker/FunctionManager.cpp (2)
1-12
: LGTM! Well-organized includes with proper dependency management.
The includes are logically grouped and the IWYU pragma indicates thoughtful include management.
63-73
: LGTM! Clean implementation of function lookup.
The get_function implementation is concise and efficient. The BOOST_DLL_ALIAS usage is appropriate for dynamic loading capabilities.
src/spider/worker/FunctionManager.hpp (1)
149-158
: Add error handling for argument serialization
The function should handle potential serialization failures for the input arguments.
set(SPIDER_WORKER_SOURCES | ||
worker/DllLoader.hpp | ||
worker/DllLoader.cpp | ||
worker/TaskExecutor.hpp | ||
worker/TaskExecutor.cpp | ||
worker/TaskExecutorMessage.hpp | ||
worker/message_pipe.cpp | ||
worker/message_pipe.hpp | ||
CACHE INTERNAL | ||
"spider worker source files" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Security measures for DLL loading need improvement
The current implementation lacks sufficient security controls for dynamic library loading:
- Only basic existence check is performed on the DLL path
- No path sanitization or validation beyond filesystem checks
- Missing library signature or integrity verification
- No configuration for allowed library paths or names
- Basic error handling is present but security-specific checks are absent
Required improvements:
- Implement a whitelist of allowed library paths or names
- Add path sanitization to prevent directory traversal
- Implement library signature verification
- Add configuration for security settings
- Consider implementing library load-time integrity checks
🔗 Analysis chain
Verify security measures for dynamic library loading
The addition of DllLoader
introduces potential security risks. Ensure that:
- Only authorized libraries can be loaded
- The loading path is properly sanitized
- Error handling is robust
- Library validation is implemented
Let me gather more information about the DLL loading implementation and path handling.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check security measures in DllLoader
# Check for security-related validations
echo "Checking DllLoader implementation..."
rg -A 5 "load|LoadLibrary|dlopen" src/spider/worker/DllLoader.cpp
# Check for error handling
echo -e "\nChecking error handling..."
rg "throw|error|exception" src/spider/worker/DllLoader.cpp
Length of output: 1063
Script:
#!/bin/bash
# Check path sanitization and validation
echo "Checking path handling..."
ast-grep --pattern 'path_str' src/spider/worker/DllLoader.cpp
# Check if there's any path validation or security checks
echo -e "\nChecking security validations..."
rg -A 5 "validate|check|verify|sanitize" src/spider/worker/DllLoader.cpp
# Look for library validation mechanisms
echo -e "\nChecking library validation..."
rg -A 5 "verify|signature|checksum" src/spider/worker/DllLoader.cpp
# Check for any configuration or allowed paths list
echo -e "\nChecking for configuration..."
fd -e hpp -e cpp -e h "DllLoader|Config" src/spider/worker/
Length of output: 808
auto create_error_response(FunctionInvokeError error, std::string const& message) | ||
-> msgpack::sbuffer { | ||
msgpack::sbuffer buffer; | ||
msgpack::packer packer{buffer}; | ||
packer.pack_array(3); | ||
packer.pack(worker::TaskExecutorResponseType::Error); | ||
packer.pack(error); | ||
packer.pack(message); | ||
return buffer; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add error handling and use consistent constants.
The function should use the same constant for array size as response_get_error and include error handling for packing operations.
auto create_error_response(FunctionInvokeError error, std::string const& message)
-> msgpack::sbuffer {
msgpack::sbuffer buffer;
msgpack::packer packer{buffer};
- packer.pack_array(3);
+ try {
+ packer.pack_array(ERROR_RESPONSE_SIZE);
+ packer.pack(worker::TaskExecutorResponseType::Error);
+ packer.pack(error);
+ packer.pack(message);
+ } catch (const std::exception& e) {
+ // Create an empty buffer to indicate packing failure
+ buffer.clear();
+ }
- packer.pack(worker::TaskExecutorResponseType::Error);
- packer.pack(error);
- packer.pack(message);
return buffer;
}
Committable suggestion skipped: line range outside the PR's diff.
void create_error_buffer( | ||
FunctionInvokeError error, | ||
std::string const& message, | ||
msgpack::sbuffer& buffer | ||
) { | ||
msgpack::packer packer{buffer}; | ||
packer.pack_array(2); | ||
packer.pack(error); | ||
packer.pack(message); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add documentation and error handling for buffer creation.
The function uses a different array size than create_error_response which could lead to confusion. Consider:
- Adding documentation to explain the different formats
- Including error handling for packing operations
- Using a constant for the array size
+constexpr size_t ERROR_BUFFER_SIZE = 2;
+
+/**
+ * Creates an error buffer with 2 elements: error code and message.
+ * Note: This differs from create_error_response which includes response type.
+ */
void create_error_buffer(
FunctionInvokeError error,
std::string const& message,
msgpack::sbuffer& buffer
) {
msgpack::packer packer{buffer};
- packer.pack_array(2);
- packer.pack(error);
- packer.pack(message);
+ try {
+ packer.pack_array(ERROR_BUFFER_SIZE);
+ packer.pack(error);
+ packer.pack(message);
+ } catch (const std::exception& e) {
+ buffer.clear();
+ }
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
void create_error_buffer( | |
FunctionInvokeError error, | |
std::string const& message, | |
msgpack::sbuffer& buffer | |
) { | |
msgpack::packer packer{buffer}; | |
packer.pack_array(2); | |
packer.pack(error); | |
packer.pack(message); | |
} | |
constexpr size_t ERROR_BUFFER_SIZE = 2; | |
/** | |
* Creates an error buffer with 2 elements: error code and message. | |
* Note: This differs from create_error_response which includes response type. | |
*/ | |
void create_error_buffer( | |
FunctionInvokeError error, | |
std::string const& message, | |
msgpack::sbuffer& buffer | |
) { | |
msgpack::packer packer{buffer}; | |
try { | |
packer.pack_array(ERROR_BUFFER_SIZE); | |
packer.pack(error); | |
packer.pack(message); | |
} catch (const std::exception& e) { | |
buffer.clear(); | |
} | |
} |
auto register_function_invoker(std::string const& name, Function f) -> bool { | ||
return m_map.emplace(name, f).second; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add validation for function registration
The register_function_invoker method should validate the function object before registration.
auto register_function_invoker(std::string const& name, Function f) -> bool {
+ if (!f) {
+ return false;
+ }
return m_map.emplace(name, f).second;
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
auto register_function_invoker(std::string const& name, Function f) -> bool { | |
return m_map.emplace(name, f).second; | |
} | |
auto register_function_invoker(std::string const& name, Function f) -> bool { | |
if (!f) { | |
return false; | |
} | |
return m_map.emplace(name, f).second; | |
} |
Description
Implement task executor is worker as a separate subprocess. Add function manager and dll loader that is used to run task in subprocess.
Validation performed
Summary by CodeRabbit
Release Notes
New Features
DllLoader
class for dynamic library loading.TaskExecutor
class for managing task execution in an asynchronous environment.Bug Fixes
Documentation
Tests
TaskExecutor
and message pipe functionalities.FunctionManager
tests.Chores