feat: Add task executor #28

sitaowang1998 · 2024-11-30T17:59:55Z

Description

Add task executor that spawns a subprocess and executes a task.

Validation performed

lint:check
Unit test in devcontainer

Summary by CodeRabbit

New Features
- Introduced a DllLoader class for dynamic library loading.
- Added a TaskExecutor class to manage task execution in a concurrent environment.
- Implemented command-line task execution capabilities.
Bug Fixes
- Enhanced error handling for task execution and library loading.
Tests
- Added unit tests for the TaskExecutor class to ensure proper functionality and error handling.
- Created test functions for task execution scenarios.
Documentation
- Updated documentation to reflect new classes and methods for user guidance.

coderabbitai · 2024-11-30T18:00:02Z

Walkthrough

This pull request introduces updates to the Spider project, particularly in the CMakeLists.txt file and the addition of new source files for the worker module. Key changes include the introduction of the DllLoader and TaskExecutor classes, which facilitate dynamic library loading and task management, respectively. The changes also enhance the testing framework by adding unit tests for the TaskExecutor class and creating a new shared library for testing purposes.

Changes

File Path	Change Summary
src/spider/CMakeLists.txt	- Updated `SPIDER_WORKER_SOURCES` to include new source files. - Added `SPIDER_TASK_EXECUTOR_SOURCES` and defined `spider_task_executor` executable. - Marked source/header variables as `CACHE INTERNAL`.
src/spider/worker/DllLoader.cpp	- Implemented `DllLoader` class with `load_dll` method for loading dynamic libraries.
src/spider/worker/DllLoader.hpp	- Introduced `DllLoader` class with singleton pattern and `load_dll` method.
src/spider/worker/TaskExecutor.cpp	- Implemented `TaskExecutor` class with methods for task management and subprocess handling.
src/spider/worker/TaskExecutor.hpp	- Introduced `TaskExecutor` class with state management and asynchronous execution capabilities.
src/spider/worker/task_executor.cpp	- Implemented command-line task executor logic with argument parsing and library loading.
tests/CMakeLists.txt	- Added new test source `worker/test-TaskExecutor.cpp` and created `worker_test` shared library.
tests/worker/test-TaskExecutor.cpp	- Added unit tests for `TaskExecutor` class covering success, argument errors, and failures.
tests/worker/worker-test.cpp	- Introduced `sum_test` and `error_test` functions for testing purposes.

Possibly related PRs

build: Add CMake config and libraries #13: Related to changes in CMakeLists.txt for library management and configuration settings.
build: Split core into spearate library #15: Updates CMakeLists.txt with new source files and library linking, similar to the main PR.
test: Add tests for data storage #21: Introduces unit tests for data storage, relevant to the organization of components in the main PR.
test: Add unit tests for metadata storage and fix bugs in MySQL storage backend #23: Adds unit tests for metadata storage, relating to the main PR's source file organization.
feat: Add function manager for register and run function by function name #25: Introduces FunctionManager, connecting with organizational changes in the main PR.
feat: Add message pipe #27: Adds message pipe functionality, relevant to the organizational changes in CMakeLists.txt.

Suggested reviewers

kirkrodrigues

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 6

🧹 Outside diff range and nitpick comments (7)

tests/worker/worker-test.cpp (1)

15-18: Provide justification for NOLINT suppression.

The NOLINTBEGIN(cert-err58-cpp) and NOLINTEND(cert-err58-cpp) directives suppress warnings related to exception handling. It's advisable to include a comment explaining why this suppression is necessary to aid future maintainers.

tests/CMakeLists.txt (1)

35-37: Review the visibility scope in target definitions.

In target_sources(worker_test PUBLIC worker/worker-test.cpp), consider changing PUBLIC to PRIVATE since the source file is only used within the worker_test library and does not need to be exposed to dependent targets.

Similarly, in target_link_libraries(worker_test PUBLIC spider_core), evaluate if PRIVATE linkage is sufficient. Using PRIVATE reduces unnecessary propagation of include directories and compile definitions to dependents.
src/spider/worker/DllLoader.cpp (1)
39-39: Optimise library insertion by moving the shared_library object

To avoid unnecessary copying of the boost::dll::shared_library object, consider moving it into m_libraries using std::move(library).

Apply this diff to improve performance:
-            m_libraries.emplace(path_str, library);
+            m_libraries.emplace(path_str, std::move(library));
src/spider/CMakeLists.txt (1)
72-72: Correct the typo in the cache variable description

There's a minor typo: "filees" should be "files".

Apply this diff to fix the typo:
-        "spider task executor source filees"
+        "spider task executor source files"
src/spider/worker/task_executor.cpp (1)
76-78: Catch all parsing-related exceptions

Catching boost::bad_any_cast may not handle all potential errors during command-line argument parsing. Consider catching boost::program_options::error& e to handle all parsing-related exceptions.

Apply this diff to update the exception handling:
 } catch (boost::bad_any_cast& e) {
     return cCmdArgParseErr;
+} catch (boost::program_options::error& e) {
+    spdlog::error("Command-line parsing error: {}", e.what());
+    return cCmdArgParseErr;
 }
src/spider/worker/TaskExecutor.cpp (2)
19-22: Mark State Query Methods as const

The methods completed(), waiting(), succeed(), and error() do not modify the state of the object and can be marked as const. This improves const-correctness and allows these methods to be called on const instances of TaskExecutor.

Apply this diff to mark the methods as const:
-auto TaskExecutor::completed() -> bool {
+auto TaskExecutor::completed() const -> bool {
     std::lock_guard const lock(m_state_mutex);
     return TaskExecutorState::Succeed == m_state || TaskExecutorState::Error == m_state;
 }
-auto TaskExecutor::waiting() -> bool {
+auto TaskExecutor::waiting() const -> bool {
     std::lock_guard const lock(m_state_mutex);
     return TaskExecutorState::Waiting == m_state;
 }
-auto TaskExecutor::succeed() -> bool {
+auto TaskExecutor::succeed() const -> bool {
     std::lock_guard const lock(m_state_mutex);
     return TaskExecutorState::Succeed == m_state;
 }
-auto TaskExecutor::error() -> bool {
+auto TaskExecutor::error() const -> bool {
     std::lock_guard const lock(m_state_mutex);
     return TaskExecutorState::Error == m_state;
 }
Also applies to: 24-27, 29-32, 34-37

101-103: Handle Unknown Response Types Appropriately

The current implementation ignores unknown response types, which might lead to unhandled scenarios. Consider logging a warning or handling this case to aid in debugging.

Apply this diff to handle unknown response types:
 case TaskExecutorResponseType::Unknown:
-    break;
+    // Log a warning or handle the unknown response type
+    // Example: log_warning("Received unknown response type.");
+    break;

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between c0c9fbc and dc55830.

📒 Files selected for processing (9)

src/spider/CMakeLists.txt (2 hunks)
src/spider/worker/DllLoader.cpp (1 hunks)
src/spider/worker/DllLoader.hpp (1 hunks)
src/spider/worker/TaskExecutor.cpp (1 hunks)
src/spider/worker/TaskExecutor.hpp (1 hunks)
src/spider/worker/task_executor.cpp (1 hunks)
tests/CMakeLists.txt (2 hunks)
tests/worker/test-TaskExecutor.cpp (1 hunks)
tests/worker/worker-test.cpp (1 hunks)

🧰 Additional context used

🪛 cppcheck (2.10-2)

tests/worker/test-TaskExecutor.cpp

[error] 40-40: syntax error

(syntaxError)

🔇 Additional comments (4)

tests/worker/worker-test.cpp (2)

6-8: Function sum_test is correctly implemented.

The function accurately returns the sum of the two input integers.

10-12: Exception handling in error_test is appropriate.

The function correctly throws a std::runtime_error as intended, simulating an error scenario for testing purposes.

tests/CMakeLists.txt (1)

10-10: Test file test-TaskExecutor.cpp correctly added to test sources.

The new test file is properly included in SPIDER_TEST_SOURCES, ensuring it will be compiled and linked with the test executable.

src/spider/worker/TaskExecutor.hpp (1)

58-62: Ensure C++20 support for designated initialisers

The use of designated initialisers (e.g., .in = m_write_pipe) requires C++20 support. Please verify that the project is compiled with C++20 enabled. If not, consider initialising process_stdio members directly without designated initialisers.

coderabbitai · 2024-11-30T18:08:53Z

src/spider/worker/DllLoader.hpp

+    static auto get_instance() -> DllLoader& {
+        // Explicitly use new because DllLoader instance should not be destroyed
+        // NOLINTNEXTLINE(cppcoreguidelines-owning-memory)
+        static auto* instance = new DllLoader();
+        return *instance;
+    }


🛠️ Refactor suggestion

Avoid intentional memory leak in singleton pattern.

The current implementation of get_instance() uses new to allocate the DllLoader instance without a corresponding delete, leading to an intentional memory leak:

static auto* instance = new DllLoader();

Consider using a local static instance to prevent the memory leak and ensure proper destruction upon program termination:

static DllLoader instance; return instance;

This approach is thread-safe in C++11 and later and eliminates the need for dynamic memory allocation and NOLINT suppression.

coderabbitai · 2024-11-30T18:08:53Z

tests/worker/test-TaskExecutor.cpp

+            boost::process::v2::environment::value>
+            environment_variables;
+
+    environment_variables.emplace("PATH", src_dir);


🛠️ Refactor suggestion

Avoid overwriting the PATH environment variable

Overwriting the PATH environment variable may cause unexpected behaviour by removing existing paths. Consider appending src_dir to the existing PATH instead.

Apply this diff to modify the code:

environment_variables.emplace("PATH", src_dir); + char const* existing_path_cstr = std::getenv("PATH"); + std::string existing_path = existing_path_cstr ? existing_path_cstr : ""; + existing_path += ":" + src_dir.string(); + environment_variables["PATH"] = existing_path;

Committable suggestion skipped: line range outside the PR's diff.

coderabbitai · 2024-11-30T18:08:53Z

tests/worker/test-TaskExecutor.cpp

+namespace {
+auto get_environment_variable() -> absl::flat_hash_map<
+                                        boost::process::v2::environment::key,
+                                        boost::process::v2::environment::value> {
+    boost::filesystem::path const executable_dir = boost::dll::program_location().parent_path();
+    boost::filesystem::path const src_dir = executable_dir.parent_path() / "src" / "spider";
+
+    absl::flat_hash_map<
+            boost::process::v2::environment::key,
+            boost::process::v2::environment::value>
+            environment_variables;
+
+    environment_variables.emplace("PATH", src_dir);
+
+    return environment_variables;
+}
+
+auto get_libraries() -> std::vector<std::string> {
+    boost::filesystem::path const executable_dir = boost::dll::program_location().parent_path();
+    boost::filesystem::path const lib_path = executable_dir / "libworker_test.so";
+    return {lib_path.string()};
+}
+
+TEST_CASE("Task execute success", "[worker]") {
+    absl::flat_hash_map<
+            boost::process::v2::environment::key,
+            boost::process::v2::environment::value> const environment_variable
+            = get_environment_variable();
+
+    boost::asio::io_context context;
+
+    spider::worker::TaskExecutor
+            executor{context, "sum_test", get_libraries(), environment_variable, 2, 3};
+    context.run();
+    executor.wait();
+    REQUIRE(executor.succeed());
+    std::optional<int> const result_option = executor.get_result<int>();
+    REQUIRE(result_option.has_value());
+    REQUIRE(5 == result_option.value_or(0));
+}
+
+TEST_CASE("Task execute wrong number of arguments", "[worker]") {
+    absl::flat_hash_map<
+            boost::process::v2::environment::key,
+            boost::process::v2::environment::value> const environment_variable
+            = get_environment_variable();
+
+    boost::asio::io_context context;
+
+    spider::worker::TaskExecutor
+            executor{context, "sum_test", get_libraries(), environment_variable, 2};
+    context.run();
+    executor.wait();
+    REQUIRE(executor.error());
+    std::tuple<spider::core::FunctionInvokeError, std::string> error = executor.get_error();
+    REQUIRE(spider::core::FunctionInvokeError::WrongNumberOfArguments == std::get<0>(error));
+}
+
+TEST_CASE("Task execute fail", "[worker]") {
+    absl::flat_hash_map<
+            boost::process::v2::environment::key,
+            boost::process::v2::environment::value> const environment_variable
+            = get_environment_variable();
+
+    boost::asio::io_context context;
+
+    spider::worker::TaskExecutor
+            executor{context, "error_test", get_libraries(), environment_variable, 2};
+    context.run();
+    executor.wait();
+    REQUIRE(executor.error());
+    std::tuple<spider::core::FunctionInvokeError, std::string> error = executor.get_error();
+    REQUIRE(spider::core::FunctionInvokeError::FunctionExecutionError == std::get<0>(error));
+}
+}  // namespace
+
+// NOLINTEND(cert-err58-cpp,cppcoreguidelines-avoid-do-while,readability-function-cognitive-complexity,cppcoreguidelines-avoid-non-const-global-variables,cppcoreguidelines-avoid-c-arrays,modernize-avoid-c-arrays)


⚠️ Potential issue

Move TEST_CASE definitions outside the anonymous namespace

The TEST_CASE macros should be in the global scope. Defining them within an anonymous namespace can cause compilation errors or prevent the tests from being discovered by the Catch2 framework.

Apply this diff to move the TEST_CASE definitions outside the anonymous namespace:

namespace { // Helper functions +} // Close the anonymous namespace before the TEST_CASEs TEST_CASE("Task execute success", "[worker]") { // Test code } // Additional TEST_CASEs ... -} // Remove this closing brace as the namespace is already closed

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

namespace {

auto get_environment_variable() -> absl::flat_hash_map<

boost::process::v2::environment::key,

boost::process::v2::environment::value> {

boost::filesystem::path const executable_dir = boost::dll::program_location().parent_path();

boost::filesystem::path const src_dir = executable_dir.parent_path() / "src" / "spider";

absl::flat_hash_map<

boost::process::v2::environment::key,

boost::process::v2::environment::value>

environment_variables;

environment_variables.emplace("PATH", src_dir);

return environment_variables;

}

auto get_libraries() -> std::vector<std::string> {

boost::filesystem::path const executable_dir = boost::dll::program_location().parent_path();

boost::filesystem::path const lib_path = executable_dir / "libworker_test.so";

return {lib_path.string()};

}

TEST_CASE("Task execute success", "[worker]") {

absl::flat_hash_map<

boost::process::v2::environment::key,

boost::process::v2::environment::value> const environment_variable

= get_environment_variable();

boost::asio::io_context context;

spider::worker::TaskExecutor

executor{context, "sum_test", get_libraries(), environment_variable, 2, 3};

context.run();

executor.wait();

REQUIRE(executor.succeed());

std::optional<int> const result_option = executor.get_result<int>();

REQUIRE(result_option.has_value());

REQUIRE(5 == result_option.value_or(0));

}

TEST_CASE("Task execute wrong number of arguments", "[worker]") {

absl::flat_hash_map<

boost::process::v2::environment::key,

boost::process::v2::environment::value> const environment_variable

= get_environment_variable();

boost::asio::io_context context;

spider::worker::TaskExecutor

executor{context, "sum_test", get_libraries(), environment_variable, 2};

context.run();

executor.wait();

REQUIRE(executor.error());

std::tuple<spider::core::FunctionInvokeError, std::string> error = executor.get_error();

REQUIRE(spider::core::FunctionInvokeError::WrongNumberOfArguments == std::get<0>(error));

}

TEST_CASE("Task execute fail", "[worker]") {

absl::flat_hash_map<

boost::process::v2::environment::key,

boost::process::v2::environment::value> const environment_variable

= get_environment_variable();

boost::asio::io_context context;

spider::worker::TaskExecutor

executor{context, "error_test", get_libraries(), environment_variable, 2};

context.run();

executor.wait();

REQUIRE(executor.error());

std::tuple<spider::core::FunctionInvokeError, std::string> error = executor.get_error();

REQUIRE(spider::core::FunctionInvokeError::FunctionExecutionError == std::get<0>(error));

}

} // namespace

// NOLINTEND(cert-err58-cpp,cppcoreguidelines-avoid-do-while,readability-function-cognitive-complexity,cppcoreguidelines-avoid-non-const-global-variables,cppcoreguidelines-avoid-c-arrays,modernize-avoid-c-arrays)

namespace {

auto get_environment_variable() -> absl::flat_hash_map<

boost::process::v2::environment::key,

boost::process::v2::environment::value> {

boost::filesystem::path const executable_dir = boost::dll::program_location().parent_path();

boost::filesystem::path const src_dir = executable_dir.parent_path() / "src" / "spider";

absl::flat_hash_map<

boost::process::v2::environment::key,

boost::process::v2::environment::value>

environment_variables;

environment_variables.emplace("PATH", src_dir);

return environment_variables;

}

auto get_libraries() -> std::vector<std::string> {

boost::filesystem::path const executable_dir = boost::dll::program_location().parent_path();

boost::filesystem::path const lib_path = executable_dir / "libworker_test.so";

return {lib_path.string()};

}

} // namespace

TEST_CASE("Task execute success", "[worker]") {

absl::flat_hash_map<

boost::process::v2::environment::key,

boost::process::v2::environment::value> const environment_variable

= get_environment_variable();

boost::asio::io_context context;

spider::worker::TaskExecutor

executor{context, "sum_test", get_libraries(), environment_variable, 2, 3};

context.run();

executor.wait();

REQUIRE(executor.succeed());

std::optional<int> const result_option = executor.get_result<int>();

REQUIRE(result_option.has_value());

REQUIRE(5 == result_option.value_or(0));

}

TEST_CASE("Task execute wrong number of arguments", "[worker]") {

absl::flat_hash_map<

boost::process::v2::environment::key,

boost::process::v2::environment::value> const environment_variable

= get_environment_variable();

boost::asio::io_context context;

spider::worker::TaskExecutor

executor{context, "sum_test", get_libraries(), environment_variable, 2};

context.run();

executor.wait();

REQUIRE(executor.error());

std::tuple<spider::core::FunctionInvokeError, std::string> error = executor.get_error();

REQUIRE(spider::core::FunctionInvokeError::WrongNumberOfArguments == std::get<0>(error));

}

TEST_CASE("Task execute fail", "[worker]") {

absl::flat_hash_map<

boost::process::v2::environment::key,

boost::process::v2::environment::value> const environment_variable

= get_environment_variable();

boost::asio::io_context context;

spider::worker::TaskExecutor

executor{context, "error_test", get_libraries(), environment_variable, 2};

context.run();

executor.wait();

REQUIRE(executor.error());

std::tuple<spider::core::FunctionInvokeError, std::string> error = executor.get_error();

REQUIRE(spider::core::FunctionInvokeError::FunctionExecutionError == std::get<0>(error));

}

// NOLINTEND(cert-err58-cpp,cppcoreguidelines-avoid-do-while,readability-function-cognitive-complexity,cppcoreguidelines-avoid-non-const-global-variables,cppcoreguidelines-avoid-c-arrays,modernize-avoid-c-arrays)

🧰 Tools

🪛 cppcheck (2.10-2)

[error] 40-40: syntax error

(syntaxError)

coderabbitai · 2024-11-30T18:08:53Z

src/spider/worker/task_executor.cpp

+        if (nullptr == function) {
+            spider::worker::send_message(
+                    out,
+                    spider::core::create_error_response(
+                            spider::core::FunctionInvokeError::FunctionExecutionError,
+                            fmt::format("Function {} not found.", func_name)
+                    )
+            );
+            return cResultSendErr;
+        }


🛠️ Refactor suggestion

Use appropriate error code when function is not found

Returning FunctionInvokeError::FunctionExecutionError when the function is not found may be misleading. Consider using a more specific error code, such as FunctionInvokeError::FunctionNotFound, to accurately represent the error condition.

Apply this diff to update the error code:

if (nullptr == function) { spider::worker::send_message( out, spider::core::create_error_response( - spider::core::FunctionInvokeError::FunctionExecutionError, + spider::core::FunctionInvokeError::FunctionNotFound, fmt::format("Function {} not found.", func_name) ) ); return cResultSendErr; }

Ensure that FunctionInvokeError includes a FunctionNotFound value.

Committable suggestion skipped: line range outside the PR's diff.

coderabbitai · 2024-11-30T18:08:53Z

src/spider/worker/TaskExecutor.cpp

+void TaskExecutor::cancel() {
+    m_process->terminate();
+    std::lock_guard const lock(m_state_mutex);
+    m_state = TaskExecutorState::Cancelled;
+    msgpack::packer packer{m_result_buffer};
+    packer.pack("Task cancelled");
+}


🛠️ Refactor suggestion

Ensure Consistent Error Handling in cancel() Method

In the cancel() method, the cancellation message is directly packed into m_result_buffer without following the same error formatting used elsewhere. For consistency and to aid in error parsing, consider using core::create_error_buffer() or a similar mechanism.

Apply this diff to standardize error handling:

void TaskExecutor::cancel() { m_process->terminate(); std::lock_guard const lock(m_state_mutex); m_state = TaskExecutorState::Cancelled; - msgpack::packer packer{m_result_buffer}; - packer.pack("Task cancelled"); + core::create_error_buffer( + core::FunctionInvokeError::FunctionCancelled, + "Task cancelled", + m_result_buffer + ); }

Committable suggestion skipped: line range outside the PR's diff.

coderabbitai · 2024-11-30T18:08:53Z

src/spider/worker/TaskExecutor.cpp

+void TaskExecutor::wait() {
+    int const exit_code = m_process->wait();
+    if (exit_code != 0) {
+        std::lock_guard const lock(m_state_mutex);
+        if (m_state != TaskExecutorState::Cancelled && m_state != TaskExecutorState::Error) {
+            m_state = TaskExecutorState::Error;
+            core::create_error_buffer(
+                    core::FunctionInvokeError::FunctionExecutionError,
+                    fmt::format("Subprocess exit with {}", exit_code),
+                    m_result_buffer
+            );
+        }
+    }
+}


🛠️ Refactor suggestion

Add Exception Handling to wait() Method

If m_process->wait() throws an exception, it could lead to unhandled exceptions and potential crashes. Consider adding exception handling to ensure proper resource cleanup and error reporting.

Apply this diff to add exception handling:

void TaskExecutor::wait() { + int exit_code = 0; + try { + exit_code = m_process->wait(); + } catch (const std::exception& e) { + std::lock_guard const lock(m_state_mutex); + m_state = TaskExecutorState::Error; + core::create_error_buffer( + core::FunctionInvokeError::FunctionExecutionError, + fmt::format("Exception during wait(): {}", e.what()), + m_result_buffer + ); + return; + } - - int const exit_code = m_process->wait(); if (exit_code != 0) { std::lock_guard const lock(m_state_mutex); if (m_state != TaskExecutorState::Cancelled && m_state != TaskExecutorState::Error) { m_state = TaskExecutorState::Error; core::create_error_buffer( core::FunctionInvokeError::FunctionExecutionError, fmt::format("Subprocess exit with {}", exit_code), m_result_buffer ); } } }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

void TaskExecutor::wait() {

int const exit_code = m_process->wait();

if (exit_code != 0) {

std::lock_guard const lock(m_state_mutex);

if (m_state != TaskExecutorState::Cancelled && m_state != TaskExecutorState::Error) {

m_state = TaskExecutorState::Error;

core::create_error_buffer(

core::FunctionInvokeError::FunctionExecutionError,

fmt::format("Subprocess exit with {}", exit_code),

m_result_buffer

);

}

}

}

void TaskExecutor::wait() {

int exit_code = 0;

try {

exit_code = m_process->wait();

} catch (const std::exception& e) {

std::lock_guard const lock(m_state_mutex);

m_state = TaskExecutorState::Error;

core::create_error_buffer(

core::FunctionInvokeError::FunctionExecutionError,

fmt::format("Exception during wait(): {}", e.what()),

m_result_buffer

);

return;

}

if (exit_code != 0) {

std::lock_guard const lock(m_state_mutex);

if (m_state != TaskExecutorState::Cancelled && m_state != TaskExecutorState::Error) {

m_state = TaskExecutorState::Error;

core::create_error_buffer(

core::FunctionInvokeError::FunctionExecutionError,

fmt::format("Subprocess exit with {}", exit_code),

m_result_buffer

);

}

}

}

coderabbitai

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (7)

src/spider/CMakeLists.txt (2)
39-43: Consider reorganizing message-related components

The worker module's source organization could be improved. Currently, TaskExecutorMessage.hpp stands alone without a corresponding implementation file, while other components follow the header/implementation pattern.

Consider either:

Moving the TaskExecutorMessage implementation into a .cpp file for consistency

Creating a dedicated messages subdirectory to group all message-related components

78-85: Consolidate Boost dependencies

The Boost dependencies for spider_task_executor could be better organized.

Consider:

Creating a variable for common Boost dependencies

Using find_package with specific components needed
set(SPIDER_COMMON_BOOST_LIBS
    Boost::filesystem
    Boost::program_options
    Boost::system
)

target_link_libraries(
    spider_task_executor
    PRIVATE
        ${SPIDER_COMMON_BOOST_LIBS}
        spdlog::spdlog
)
src/spider/worker/task_executor.cpp (4)
28-46: Enhance argument parsing robustness and documentation

The argument parser could benefit from additional validation and improved help documentation:
 auto parse_arg(int const argc, char** const& argv) -> boost::program_options::variables_map {
     boost::program_options::options_description desc;
-    desc.add_options()("help", "spider task executor");
+    desc.add_options()("help", "Spider task executor - Executes functions from dynamically loaded libraries");
     desc.add_options()("func", boost::program_options::value<std::string>(), "function to run");
     desc.add_options()(
             "libs",
-            boost::program_options::value<std::vector<std::string>>(),
+            boost::program_options::value<std::vector<std::string>>()->multitoken(),
             "dynamic libraries that include the spider tasks"
     );
 
     boost::program_options::variables_map variables;
     boost::program_options::store(
             boost::program_options::parse_command_line(argc, argv, desc),
             variables
     );
+    if (variables.count("help")) {
+        std::cout << desc << "\n";
+        std::exit(0);
+    }
     boost::program_options::notify(variables);
     return variables;
 }
50-54: Add documentation for error constants

The error constants would benefit from documentation explaining their specific use cases.
+// Error codes returned by the task executor
+// Command-line argument parsing failed
 constexpr int cCmdArgParseErr = 1;
+// Dynamic library loading failed
 constexpr int cDllErr = 2;
+// Function argument parsing failed
 constexpr int cFuncArgParseErr = 3;
+// Error sending result back to parent process
 constexpr int cResultSendErr = 4;
+// Unexpected errors during execution
 constexpr int cOtherErr = 5;
92-103: Improve error messages for request handling

The error messages could be more descriptive to aid in debugging.
         std::optional<msgpack::sbuffer> request_buffer_option = spider::worker::receive_message(in);
         if (!request_buffer_option.has_value()) {
-            spdlog::error("Cannot read args buffer request");
+            spdlog::error("Failed to read argument buffer from stdin: buffer is empty or malformed");
             return cFuncArgParseErr;
         }
         msgpack::sbuffer const& request_buffer = request_buffer_option.value();
         if (spider::worker::TaskExecutorRequestType::Arguments
             != spider::worker::get_request_type(request_buffer))
         {
-            spdlog::error("Expect args request.");
+            spdlog::error("Invalid request type: expected Arguments request type");
             return cFuncArgParseErr;
         }
1-131: Consider implementing a consistent error handling strategy

The current implementation mixes different error handling approaches (return codes, exceptions, and error responses). Consider implementing a unified error handling strategy across the codebase.

Suggestions:

Define a clear hierarchy of error types

Implement consistent error propagation patterns

Document error handling expectations for each component

Consider using a Result type pattern for error handling
tests/worker/test-TaskExecutor.cpp (1)
46-96: Consider adding test cleanup handling

The tests create resources (io_context, executor) but don't explicitly handle cleanup. While RAII handles most cleanup, consider adding a test fixture to ensure consistent setup/teardown, especially for more complex scenarios that might be added later.

Example fixture structure:
class TaskExecutorTest {
protected:
    boost::asio::io_context context;
    absl::flat_hash_map<boost::process::v2::environment::key,
                       boost::process::v2::environment::value> environment_variable;
    
    TaskExecutorTest() : environment_variable(get_environment_variable()) {}
    
    ~TaskExecutorTest() {
        context.stop();
    }
};

TEST_CASE_METHOD(TaskExecutorTest, "Task execute success", "[worker]") {
    spider::worker::TaskExecutor executor{context, "sum_test", get_libraries(), 
                                        environment_variable, 2, 3};
    // ... rest of the test
}
🧰 Tools

🪛 cppcheck (2.10-2)

[error] 46-46: syntax error

(syntaxError)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between dc55830 and 1db25b8.

📒 Files selected for processing (5)

src/spider/CMakeLists.txt (2 hunks)
src/spider/worker/DllLoader.cpp (1 hunks)
src/spider/worker/task_executor.cpp (1 hunks)
tests/CMakeLists.txt (2 hunks)
tests/worker/test-TaskExecutor.cpp (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

tests/CMakeLists.txt
src/spider/worker/DllLoader.cpp

🧰 Additional context used

🪛 cppcheck (2.10-2)

tests/worker/test-TaskExecutor.cpp

[error] 46-46: syntax error

(syntaxError)

🔇 Additional comments (4)

src/spider/CMakeLists.txt (1)

65-73: Verify build system integration

The task executor component appears to be properly configured, but we should verify its integration with the build system.

✅ Verification successful

Build system integration is properly configured

The CMake configuration for the spider_task_executor component is correctly integrated with the build system:

All source files specified in SPIDER_TASK_EXECUTOR_SOURCES exist in the correct locations
The executable target is properly defined with appropriate dependencies
Required libraries (Boost, spdlog) are correctly linked
No circular dependencies were found
Source files have consistent include dependencies

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify CMake configuration and dependency resolution

# Check if all required source files exist
echo "Verifying source files..."
for file in worker/DllLoader.cpp worker/DllLoader.hpp worker/message_pipe.cpp worker/message_pipe.hpp worker/task_executor.cpp; do
    if [ ! -f "src/spider/$file" ]; then
        echo "Missing file: $file"
        exit 1
    fi
done

# Check for potential circular dependencies in CMake files
echo "Checking for circular dependencies..."
fd -e cmake -e txt -x grep -l "spider_task_executor" {}

Length of output: 1131

Script:

#!/bin/bash
# Let's check the CMakeLists.txt content to understand how spider_task_executor is used
echo "Checking CMakeLists.txt content..."
cat src/spider/CMakeLists.txt

# Check for any other CMake targets that might depend on these files
echo -e "\nChecking for other related targets..."
rg "add_library|add_executable" src/spider/CMakeLists.txt

# Look for any potential include dependencies
echo -e "\nChecking for include dependencies..."
rg "#include" src/spider/worker/

Length of output: 10464

src/spider/worker/task_executor.cpp (1)

112-121: ⚠️ Potential issue

Use appropriate error code when function is not found

Returning FunctionInvokeError::FunctionExecutionError when the function is not found may be misleading.

         if (nullptr == function) {
             spider::worker::send_message(
                     out,
                     spider::core::create_error_response(
-                            spider::core::FunctionInvokeError::FunctionExecutionError,
+                            spider::core::FunctionInvokeError::FunctionNotFound,
                             fmt::format("Function {} not found.", func_name)
                     )
             );
             return cResultSendErr;
         }

tests/worker/test-TaskExecutor.cpp (2)

1-16: LGTM! Well-organized header inclusions.

The header organization follows best practices with project-specific headers first, followed by third-party and standard library headers. The IWYU pragma for BoostAsio.hpp is appropriately used.

46-96: Move TEST_CASE definitions outside anonymous namespace

The TEST_CASE macros should be in the global namespace scope for proper test discovery by Catch2.

🧰 Tools

🪛 cppcheck (2.10-2)

[error] 46-46: syntax error

(syntaxError)

coderabbitai · 2024-11-30T20:28:26Z

src/spider/CMakeLists.txt

+set(SPIDER_TASK_EXECUTOR_SOURCES
+    worker/DllLoader.cpp
+    worker/DllLoader.hpp
+    worker/message_pipe.cpp
+    worker/message_pipe.hpp
+    worker/task_executor.cpp
+    CACHE INTERNAL
+    "spider task executor source files"
+)
+
+add_executable(spider_task_executor)
+target_sources(spider_task_executor PRIVATE ${SPIDER_TASK_EXECUTOR_SOURCES})
+target_link_libraries(spider_task_executor PRIVATE spider_core)
+target_link_libraries(
+    spider_task_executor
+    PRIVATE
+        Boost::filesystem
+        Boost::program_options
+        Boost::system
+        spdlog::spdlog
+)


🛠️ Refactor suggestion

Reduce code duplication by creating a shared library

The task executor and worker components share several source files (DllLoader, message_pipe). This duplication could lead to maintenance issues.

Consider creating a shared library for common components:

# Create shared library for common worker components set(SPIDER_WORKER_COMMON_SOURCES worker/DllLoader.cpp worker/DllLoader.hpp worker/message_pipe.cpp worker/message_pipe.hpp CACHE INTERNAL "spider worker common source files" ) add_library(spider_worker_common) target_sources(spider_worker_common PRIVATE ${SPIDER_WORKER_COMMON_SOURCES}) target_link_libraries(spider_worker_common PUBLIC spider_core) # Update existing targets to use common library target_link_libraries(spider_worker PRIVATE spider_worker_common) target_link_libraries(spider_task_executor PRIVATE spider_worker_common)

coderabbitai · 2024-11-30T20:28:26Z

src/spider/worker/task_executor.cpp

+        spider::worker::DllLoader& dll_loader = spider::worker::DllLoader::get_instance();
+        for (std::string const& lib : libs) {
+            if (false == dll_loader.load_dll(lib)) {
+                return cDllErr;
+            }
+        }


🛠️ Refactor suggestion

Enhance DLL loading validation and error handling

The DLL loading process could benefit from more detailed error reporting and validation.

spider::worker::DllLoader& dll_loader = spider::worker::DllLoader::get_instance(); for (std::string const& lib : libs) { + if (lib.empty()) { + spdlog::error("Empty library path provided"); + return cDllErr; + } if (false == dll_loader.load_dll(lib)) { + spdlog::error("Failed to load library: {}", lib); return cDllErr; } }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

spider::worker::DllLoader& dll_loader = spider::worker::DllLoader::get_instance();

for (std::string const& lib : libs) {

if (false == dll_loader.load_dll(lib)) {

return cDllErr;

}

}

spider::worker::DllLoader& dll_loader = spider::worker::DllLoader::get_instance();

for (std::string const& lib : libs) {

if (lib.empty()) {

spdlog::error("Empty library path provided");

return cDllErr;

}

if (false == dll_loader.load_dll(lib)) {

spdlog::error("Failed to load library: {}", lib);

return cDllErr;

}

}

coderabbitai · 2024-11-30T20:28:26Z

src/spider/worker/task_executor.cpp

+    // NOLINTNEXTLINE(misc-include-cleaner)
+    spdlog::set_default_logger(spdlog::stderr_color_mt("stderr"));
+


🛠️ Refactor suggestion

Add error handling for logger initialization

The logger initialization should be wrapped in a try-catch block to handle potential initialization failures.

// Set up spdlog to write to stderr - spdlog::set_default_logger(spdlog::stderr_color_mt("stderr")); + try { + spdlog::set_default_logger(spdlog::stderr_color_mt("stderr")); + } catch (const spdlog::spdlog_ex& ex) { + std::cerr << "Logger initialization failed: " << ex.what() << std::endl; + return cOtherErr; + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// NOLINTNEXTLINE(misc-include-cleaner)

spdlog::set_default_logger(spdlog::stderr_color_mt("stderr"));

// NOLINTNEXTLINE(misc-include-cleaner)

try {

spdlog::set_default_logger(spdlog::stderr_color_mt("stderr"));

} catch (const spdlog::spdlog_ex& ex) {

std::cerr << "Logger initialization failed: " << ex.what() << std::endl;

return cOtherErr;

}

coderabbitai · 2024-11-30T20:28:27Z

tests/worker/test-TaskExecutor.cpp

+    std::string path_env = nullptr == path_env_str ? path_env_str : "";
+    path_env.append(src_dir.string());


⚠️ Potential issue

Critical: Fix PATH environment variable handling

There are two issues in the PATH environment variable handling:

The condition is reversed: nullptr == path_env_str ? path_env_str : "" will return null when PATH exists

The path is appended without a separator

Apply this diff to fix the issues:

- std::string path_env = nullptr == path_env_str ? path_env_str : ""; - path_env.append(src_dir.string()); + std::string path_env = path_env_str ? std::string(path_env_str) + ":" : ""; + path_env += src_dir.string();

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

std::string path_env = nullptr == path_env_str ? path_env_str : "";

path_env.append(src_dir.string());

std::string path_env = path_env_str ? std::string(path_env_str) + ":" : "";

path_env += src_dir.string();

coderabbitai

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (1)

tests/worker/test-TaskExecutor.cpp (1)
47-97: Enhance test coverage and resource management

Consider the following improvements:

Add cleanup of resources after tests

Add test cases for timeout scenarios

Verify error messages in failure cases

Example test case for timeout:
TEST_CASE("Task execute timeout", "[worker]") {
    auto const environment_variable = get_environment_variable();
    boost::asio::io_context context;
    
    spider::worker::TaskExecutor
            executor{context, "long_running_test", get_libraries(), 
                    environment_variable, std::chrono::milliseconds{100}, 2};
    context.run();
    executor.wait();
    REQUIRE(executor.error());
    auto error = executor.get_error();
    REQUIRE(spider::core::FunctionInvokeError::Timeout == std::get<0>(error));
}
🧰 Tools

🪛 cppcheck (2.10-2)

[error] 47-47: syntax error

(syntaxError)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 1db25b8 and 5f8d2fb.

📒 Files selected for processing (1)

tests/worker/test-TaskExecutor.cpp (1 hunks)

🧰 Additional context used

🪛 cppcheck (2.10-2)

tests/worker/test-TaskExecutor.cpp

[error] 47-47: syntax error

(syntaxError)

🔇 Additional comments (3)

tests/worker/test-TaskExecutor.cpp (3)

1-16: LGTM! Headers and includes are well-organized.

The includes are properly organized and all necessary headers are present.

27-29: Improve PATH environment variable handling

The current PATH handling could be more idiomatic and efficient.

47-97: Move TEST_CASE definitions outside anonymous namespace

Test cases should be in the global scope for proper discovery by Catch2.

🧰 Tools

🪛 cppcheck (2.10-2)

[error] 47-47: syntax error

(syntaxError)

coderabbitai · 2024-11-30T20:36:29Z

tests/worker/test-TaskExecutor.cpp

+auto get_libraries() -> std::vector<std::string> {
+    boost::filesystem::path const executable_dir = boost::dll::program_location().parent_path();
+    boost::filesystem::path const lib_path = executable_dir / "libworker_test.so";
+    return {lib_path.string()};
+}


🛠️ Refactor suggestion

Add validation for library existence

The function should verify that the library file exists before returning its path.

Apply this diff to add validation:

auto get_libraries() -> std::vector<std::string> { boost::filesystem::path const executable_dir = boost::dll::program_location().parent_path(); boost::filesystem::path const lib_path = executable_dir / "libworker_test.so"; + if (!boost::filesystem::exists(lib_path)) { + throw std::runtime_error("Library not found: " + lib_path.string()); + } return {lib_path.string()}; }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

auto get_libraries() -> std::vector<std::string> {

boost::filesystem::path const executable_dir = boost::dll::program_location().parent_path();

boost::filesystem::path const lib_path = executable_dir / "libworker_test.so";

return {lib_path.string()};

}

auto get_libraries() -> std::vector<std::string> {

boost::filesystem::path const executable_dir = boost::dll::program_location().parent_path();

boost::filesystem::path const lib_path = executable_dir / "libworker_test.so";

if (!boost::filesystem::exists(lib_path)) {

throw std::runtime_error("Library not found: " + lib_path.string());

}

return {lib_path.string()};

}

sitaowang1998 added 3 commits November 30, 2024 12:11

Add task executor with tests and dll loader

3fc4236

Add worker test dll

df43365

Update macro name in worker test dll

dc55830

coderabbitai bot reviewed Nov 30, 2024

View reviewed changes

sitaowang1998 added 4 commits November 30, 2024 18:39

Improve code based on code rabbit suggestion

18e46a5

Remove const qualifer for task executor state checking

aa36b4a

Append to exising path variable instead of replacing with new ones

9b6e93e

Fix clang-tidy

1db25b8

coderabbitai bot reviewed Nov 30, 2024

View reviewed changes

Fix PATH env

5f8d2fb

coderabbitai bot reviewed Nov 30, 2024

View reviewed changes

sitaowang1998 merged commit 262ce58 into y-scope:main Nov 30, 2024
2 checks passed

sitaowang1998 deleted the task_executor branch November 30, 2024 21:13

This was referenced Dec 4, 2024

feat: Add scheduler scheduling #31

Merged

feat: Add variable size message sending and receiving through socket #32

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add task executor #28

feat: Add task executor #28

sitaowang1998 commented Nov 30, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 30, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot Nov 30, 2024

coderabbitai bot Nov 30, 2024

coderabbitai bot Nov 30, 2024

coderabbitai bot Nov 30, 2024

coderabbitai bot Nov 30, 2024

coderabbitai bot Nov 30, 2024

coderabbitai bot left a comment

coderabbitai bot Nov 30, 2024

coderabbitai bot Nov 30, 2024

coderabbitai bot Nov 30, 2024

coderabbitai bot Nov 30, 2024

coderabbitai bot left a comment

coderabbitai bot Nov 30, 2024

		// NOLINTNEXTLINE(misc-include-cleaner)
		spdlog::set_default_logger(spdlog::stderr_color_mt("stderr"));

		std::string path_env = nullptr == path_env_str ? path_env_str : "";
		path_env.append(src_dir.string());

feat: Add task executor #28

feat: Add task executor #28

Conversation

sitaowang1998 commented Nov 30, 2024 • edited by coderabbitai bot Loading

Description

Validation performed

Summary by CodeRabbit

coderabbitai bot commented Nov 30, 2024 • edited Loading

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Nov 30, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 30, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 30, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 30, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 30, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 30, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Nov 30, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 30, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 30, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 30, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Nov 30, 2024

Choose a reason for hiding this comment

sitaowang1998 commented Nov 30, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 30, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)