Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add task executor #28

Merged
merged 8 commits into from
Nov 30, 2024
Merged

Conversation

sitaowang1998
Copy link
Collaborator

@sitaowang1998 sitaowang1998 commented Nov 30, 2024

Description

Add task executor that spawns a subprocess and executes a task.

Validation performed

  • lint:check
  • Unit test in devcontainer

Summary by CodeRabbit

  • New Features

    • Introduced a DllLoader class for dynamic library loading.
    • Added a TaskExecutor class to manage task execution in a concurrent environment.
    • Implemented command-line task execution capabilities.
  • Bug Fixes

    • Enhanced error handling for task execution and library loading.
  • Tests

    • Added unit tests for the TaskExecutor class to ensure proper functionality and error handling.
    • Created test functions for task execution scenarios.
  • Documentation

    • Updated documentation to reflect new classes and methods for user guidance.

Copy link

coderabbitai bot commented Nov 30, 2024

Walkthrough

This pull request introduces updates to the Spider project, particularly in the CMakeLists.txt file and the addition of new source files for the worker module. Key changes include the introduction of the DllLoader and TaskExecutor classes, which facilitate dynamic library loading and task management, respectively. The changes also enhance the testing framework by adding unit tests for the TaskExecutor class and creating a new shared library for testing purposes.

Changes

File Path Change Summary
src/spider/CMakeLists.txt - Updated SPIDER_WORKER_SOURCES to include new source files.
- Added SPIDER_TASK_EXECUTOR_SOURCES and defined spider_task_executor executable.
- Marked source/header variables as CACHE INTERNAL.
src/spider/worker/DllLoader.cpp - Implemented DllLoader class with load_dll method for loading dynamic libraries.
src/spider/worker/DllLoader.hpp - Introduced DllLoader class with singleton pattern and load_dll method.
src/spider/worker/TaskExecutor.cpp - Implemented TaskExecutor class with methods for task management and subprocess handling.
src/spider/worker/TaskExecutor.hpp - Introduced TaskExecutor class with state management and asynchronous execution capabilities.
src/spider/worker/task_executor.cpp - Implemented command-line task executor logic with argument parsing and library loading.
tests/CMakeLists.txt - Added new test source worker/test-TaskExecutor.cpp and created worker_test shared library.
tests/worker/test-TaskExecutor.cpp - Added unit tests for TaskExecutor class covering success, argument errors, and failures.
tests/worker/worker-test.cpp - Introduced sum_test and error_test functions for testing purposes.

Possibly related PRs

Suggested reviewers

  • kirkrodrigues

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Outside diff range and nitpick comments (7)
tests/worker/worker-test.cpp (1)

15-18: Provide justification for NOLINT suppression.

The NOLINTBEGIN(cert-err58-cpp) and NOLINTEND(cert-err58-cpp) directives suppress warnings related to exception handling. It's advisable to include a comment explaining why this suppression is necessary to aid future maintainers.

tests/CMakeLists.txt (1)

35-37: Review the visibility scope in target definitions.

In target_sources(worker_test PUBLIC worker/worker-test.cpp), consider changing PUBLIC to PRIVATE since the source file is only used within the worker_test library and does not need to be exposed to dependent targets.

Similarly, in target_link_libraries(worker_test PUBLIC spider_core), evaluate if PRIVATE linkage is sufficient. Using PRIVATE reduces unnecessary propagation of include directories and compile definitions to dependents.

src/spider/worker/DllLoader.cpp (1)

39-39: Optimise library insertion by moving the shared_library object

To avoid unnecessary copying of the boost::dll::shared_library object, consider moving it into m_libraries using std::move(library).

Apply this diff to improve performance:

-            m_libraries.emplace(path_str, library);
+            m_libraries.emplace(path_str, std::move(library));
src/spider/CMakeLists.txt (1)

72-72: Correct the typo in the cache variable description

There's a minor typo: "filees" should be "files".

Apply this diff to fix the typo:

-        "spider task executor source filees"
+        "spider task executor source files"
src/spider/worker/task_executor.cpp (1)

76-78: Catch all parsing-related exceptions

Catching boost::bad_any_cast may not handle all potential errors during command-line argument parsing. Consider catching boost::program_options::error& e to handle all parsing-related exceptions.

Apply this diff to update the exception handling:

 } catch (boost::bad_any_cast& e) {
     return cCmdArgParseErr;
+} catch (boost::program_options::error& e) {
+    spdlog::error("Command-line parsing error: {}", e.what());
+    return cCmdArgParseErr;
 }
src/spider/worker/TaskExecutor.cpp (2)

19-22: Mark State Query Methods as const

The methods completed(), waiting(), succeed(), and error() do not modify the state of the object and can be marked as const. This improves const-correctness and allows these methods to be called on const instances of TaskExecutor.

Apply this diff to mark the methods as const:

-auto TaskExecutor::completed() -> bool {
+auto TaskExecutor::completed() const -> bool {
     std::lock_guard const lock(m_state_mutex);
     return TaskExecutorState::Succeed == m_state || TaskExecutorState::Error == m_state;
 }
-auto TaskExecutor::waiting() -> bool {
+auto TaskExecutor::waiting() const -> bool {
     std::lock_guard const lock(m_state_mutex);
     return TaskExecutorState::Waiting == m_state;
 }
-auto TaskExecutor::succeed() -> bool {
+auto TaskExecutor::succeed() const -> bool {
     std::lock_guard const lock(m_state_mutex);
     return TaskExecutorState::Succeed == m_state;
 }
-auto TaskExecutor::error() -> bool {
+auto TaskExecutor::error() const -> bool {
     std::lock_guard const lock(m_state_mutex);
     return TaskExecutorState::Error == m_state;
 }

Also applies to: 24-27, 29-32, 34-37


101-103: Handle Unknown Response Types Appropriately

The current implementation ignores unknown response types, which might lead to unhandled scenarios. Consider logging a warning or handling this case to aid in debugging.

Apply this diff to handle unknown response types:

 case TaskExecutorResponseType::Unknown:
-    break;
+    // Log a warning or handle the unknown response type
+    // Example: log_warning("Received unknown response type.");
+    break;
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between c0c9fbc and dc55830.

📒 Files selected for processing (9)
  • src/spider/CMakeLists.txt (2 hunks)
  • src/spider/worker/DllLoader.cpp (1 hunks)
  • src/spider/worker/DllLoader.hpp (1 hunks)
  • src/spider/worker/TaskExecutor.cpp (1 hunks)
  • src/spider/worker/TaskExecutor.hpp (1 hunks)
  • src/spider/worker/task_executor.cpp (1 hunks)
  • tests/CMakeLists.txt (2 hunks)
  • tests/worker/test-TaskExecutor.cpp (1 hunks)
  • tests/worker/worker-test.cpp (1 hunks)
🧰 Additional context used
🪛 cppcheck (2.10-2)
tests/worker/test-TaskExecutor.cpp

[error] 40-40: syntax error

(syntaxError)

🔇 Additional comments (4)
tests/worker/worker-test.cpp (2)

6-8: Function sum_test is correctly implemented.

The function accurately returns the sum of the two input integers.


10-12: Exception handling in error_test is appropriate.

The function correctly throws a std::runtime_error as intended, simulating an error scenario for testing purposes.

tests/CMakeLists.txt (1)

10-10: Test file test-TaskExecutor.cpp correctly added to test sources.

The new test file is properly included in SPIDER_TEST_SOURCES, ensuring it will be compiled and linked with the test executable.

src/spider/worker/TaskExecutor.hpp (1)

58-62: Ensure C++20 support for designated initialisers

The use of designated initialisers (e.g., .in = m_write_pipe) requires C++20 support. Please verify that the project is compiled with C++20 enabled. If not, consider initialising process_stdio members directly without designated initialisers.

Comment on lines +13 to +18
static auto get_instance() -> DllLoader& {
// Explicitly use new because DllLoader instance should not be destroyed
// NOLINTNEXTLINE(cppcoreguidelines-owning-memory)
static auto* instance = new DllLoader();
return *instance;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid intentional memory leak in singleton pattern.

The current implementation of get_instance() uses new to allocate the DllLoader instance without a corresponding delete, leading to an intentional memory leak:

static auto* instance = new DllLoader();

Consider using a local static instance to prevent the memory leak and ensure proper destruction upon program termination:

static DllLoader instance;
return instance;

This approach is thread-safe in C++11 and later and eliminates the need for dynamic memory allocation and NOLINT suppression.

boost::process::v2::environment::value>
environment_variables;

environment_variables.emplace("PATH", src_dir);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid overwriting the PATH environment variable

Overwriting the PATH environment variable may cause unexpected behaviour by removing existing paths. Consider appending src_dir to the existing PATH instead.

Apply this diff to modify the code:

     environment_variables.emplace("PATH", src_dir);
+    char const* existing_path_cstr = std::getenv("PATH");
+    std::string existing_path = existing_path_cstr ? existing_path_cstr : "";
+    existing_path += ":" + src_dir.string();
+    environment_variables["PATH"] = existing_path;

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines 17 to 93
namespace {
auto get_environment_variable() -> absl::flat_hash_map<
boost::process::v2::environment::key,
boost::process::v2::environment::value> {
boost::filesystem::path const executable_dir = boost::dll::program_location().parent_path();
boost::filesystem::path const src_dir = executable_dir.parent_path() / "src" / "spider";

absl::flat_hash_map<
boost::process::v2::environment::key,
boost::process::v2::environment::value>
environment_variables;

environment_variables.emplace("PATH", src_dir);

return environment_variables;
}

auto get_libraries() -> std::vector<std::string> {
boost::filesystem::path const executable_dir = boost::dll::program_location().parent_path();
boost::filesystem::path const lib_path = executable_dir / "libworker_test.so";
return {lib_path.string()};
}

TEST_CASE("Task execute success", "[worker]") {
absl::flat_hash_map<
boost::process::v2::environment::key,
boost::process::v2::environment::value> const environment_variable
= get_environment_variable();

boost::asio::io_context context;

spider::worker::TaskExecutor
executor{context, "sum_test", get_libraries(), environment_variable, 2, 3};
context.run();
executor.wait();
REQUIRE(executor.succeed());
std::optional<int> const result_option = executor.get_result<int>();
REQUIRE(result_option.has_value());
REQUIRE(5 == result_option.value_or(0));
}

TEST_CASE("Task execute wrong number of arguments", "[worker]") {
absl::flat_hash_map<
boost::process::v2::environment::key,
boost::process::v2::environment::value> const environment_variable
= get_environment_variable();

boost::asio::io_context context;

spider::worker::TaskExecutor
executor{context, "sum_test", get_libraries(), environment_variable, 2};
context.run();
executor.wait();
REQUIRE(executor.error());
std::tuple<spider::core::FunctionInvokeError, std::string> error = executor.get_error();
REQUIRE(spider::core::FunctionInvokeError::WrongNumberOfArguments == std::get<0>(error));
}

TEST_CASE("Task execute fail", "[worker]") {
absl::flat_hash_map<
boost::process::v2::environment::key,
boost::process::v2::environment::value> const environment_variable
= get_environment_variable();

boost::asio::io_context context;

spider::worker::TaskExecutor
executor{context, "error_test", get_libraries(), environment_variable, 2};
context.run();
executor.wait();
REQUIRE(executor.error());
std::tuple<spider::core::FunctionInvokeError, std::string> error = executor.get_error();
REQUIRE(spider::core::FunctionInvokeError::FunctionExecutionError == std::get<0>(error));
}
} // namespace

// NOLINTEND(cert-err58-cpp,cppcoreguidelines-avoid-do-while,readability-function-cognitive-complexity,cppcoreguidelines-avoid-non-const-global-variables,cppcoreguidelines-avoid-c-arrays,modernize-avoid-c-arrays)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Move TEST_CASE definitions outside the anonymous namespace

The TEST_CASE macros should be in the global scope. Defining them within an anonymous namespace can cause compilation errors or prevent the tests from being discovered by the Catch2 framework.

Apply this diff to move the TEST_CASE definitions outside the anonymous namespace:

 namespace {
     // Helper functions
+}  // Close the anonymous namespace before the TEST_CASEs

 TEST_CASE("Task execute success", "[worker]") {
     // Test code
 }

 // Additional TEST_CASEs ...

-}  // Remove this closing brace as the namespace is already closed
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
namespace {
auto get_environment_variable() -> absl::flat_hash_map<
boost::process::v2::environment::key,
boost::process::v2::environment::value> {
boost::filesystem::path const executable_dir = boost::dll::program_location().parent_path();
boost::filesystem::path const src_dir = executable_dir.parent_path() / "src" / "spider";
absl::flat_hash_map<
boost::process::v2::environment::key,
boost::process::v2::environment::value>
environment_variables;
environment_variables.emplace("PATH", src_dir);
return environment_variables;
}
auto get_libraries() -> std::vector<std::string> {
boost::filesystem::path const executable_dir = boost::dll::program_location().parent_path();
boost::filesystem::path const lib_path = executable_dir / "libworker_test.so";
return {lib_path.string()};
}
TEST_CASE("Task execute success", "[worker]") {
absl::flat_hash_map<
boost::process::v2::environment::key,
boost::process::v2::environment::value> const environment_variable
= get_environment_variable();
boost::asio::io_context context;
spider::worker::TaskExecutor
executor{context, "sum_test", get_libraries(), environment_variable, 2, 3};
context.run();
executor.wait();
REQUIRE(executor.succeed());
std::optional<int> const result_option = executor.get_result<int>();
REQUIRE(result_option.has_value());
REQUIRE(5 == result_option.value_or(0));
}
TEST_CASE("Task execute wrong number of arguments", "[worker]") {
absl::flat_hash_map<
boost::process::v2::environment::key,
boost::process::v2::environment::value> const environment_variable
= get_environment_variable();
boost::asio::io_context context;
spider::worker::TaskExecutor
executor{context, "sum_test", get_libraries(), environment_variable, 2};
context.run();
executor.wait();
REQUIRE(executor.error());
std::tuple<spider::core::FunctionInvokeError, std::string> error = executor.get_error();
REQUIRE(spider::core::FunctionInvokeError::WrongNumberOfArguments == std::get<0>(error));
}
TEST_CASE("Task execute fail", "[worker]") {
absl::flat_hash_map<
boost::process::v2::environment::key,
boost::process::v2::environment::value> const environment_variable
= get_environment_variable();
boost::asio::io_context context;
spider::worker::TaskExecutor
executor{context, "error_test", get_libraries(), environment_variable, 2};
context.run();
executor.wait();
REQUIRE(executor.error());
std::tuple<spider::core::FunctionInvokeError, std::string> error = executor.get_error();
REQUIRE(spider::core::FunctionInvokeError::FunctionExecutionError == std::get<0>(error));
}
} // namespace
// NOLINTEND(cert-err58-cpp,cppcoreguidelines-avoid-do-while,readability-function-cognitive-complexity,cppcoreguidelines-avoid-non-const-global-variables,cppcoreguidelines-avoid-c-arrays,modernize-avoid-c-arrays)
namespace {
auto get_environment_variable() -> absl::flat_hash_map<
boost::process::v2::environment::key,
boost::process::v2::environment::value> {
boost::filesystem::path const executable_dir = boost::dll::program_location().parent_path();
boost::filesystem::path const src_dir = executable_dir.parent_path() / "src" / "spider";
absl::flat_hash_map<
boost::process::v2::environment::key,
boost::process::v2::environment::value>
environment_variables;
environment_variables.emplace("PATH", src_dir);
return environment_variables;
}
auto get_libraries() -> std::vector<std::string> {
boost::filesystem::path const executable_dir = boost::dll::program_location().parent_path();
boost::filesystem::path const lib_path = executable_dir / "libworker_test.so";
return {lib_path.string()};
}
} // namespace
TEST_CASE("Task execute success", "[worker]") {
absl::flat_hash_map<
boost::process::v2::environment::key,
boost::process::v2::environment::value> const environment_variable
= get_environment_variable();
boost::asio::io_context context;
spider::worker::TaskExecutor
executor{context, "sum_test", get_libraries(), environment_variable, 2, 3};
context.run();
executor.wait();
REQUIRE(executor.succeed());
std::optional<int> const result_option = executor.get_result<int>();
REQUIRE(result_option.has_value());
REQUIRE(5 == result_option.value_or(0));
}
TEST_CASE("Task execute wrong number of arguments", "[worker]") {
absl::flat_hash_map<
boost::process::v2::environment::key,
boost::process::v2::environment::value> const environment_variable
= get_environment_variable();
boost::asio::io_context context;
spider::worker::TaskExecutor
executor{context, "sum_test", get_libraries(), environment_variable, 2};
context.run();
executor.wait();
REQUIRE(executor.error());
std::tuple<spider::core::FunctionInvokeError, std::string> error = executor.get_error();
REQUIRE(spider::core::FunctionInvokeError::WrongNumberOfArguments == std::get<0>(error));
}
TEST_CASE("Task execute fail", "[worker]") {
absl::flat_hash_map<
boost::process::v2::environment::key,
boost::process::v2::environment::value> const environment_variable
= get_environment_variable();
boost::asio::io_context context;
spider::worker::TaskExecutor
executor{context, "error_test", get_libraries(), environment_variable, 2};
context.run();
executor.wait();
REQUIRE(executor.error());
std::tuple<spider::core::FunctionInvokeError, std::string> error = executor.get_error();
REQUIRE(spider::core::FunctionInvokeError::FunctionExecutionError == std::get<0>(error));
}
// NOLINTEND(cert-err58-cpp,cppcoreguidelines-avoid-do-while,readability-function-cognitive-complexity,cppcoreguidelines-avoid-non-const-global-variables,cppcoreguidelines-avoid-c-arrays,modernize-avoid-c-arrays)
🧰 Tools
🪛 cppcheck (2.10-2)

[error] 40-40: syntax error

(syntaxError)

Comment on lines +107 to +116
if (nullptr == function) {
spider::worker::send_message(
out,
spider::core::create_error_response(
spider::core::FunctionInvokeError::FunctionExecutionError,
fmt::format("Function {} not found.", func_name)
)
);
return cResultSendErr;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use appropriate error code when function is not found

Returning FunctionInvokeError::FunctionExecutionError when the function is not found may be misleading. Consider using a more specific error code, such as FunctionInvokeError::FunctionNotFound, to accurately represent the error condition.

Apply this diff to update the error code:

             if (nullptr == function) {
                 spider::worker::send_message(
                         out,
                         spider::core::create_error_response(
-                                spider::core::FunctionInvokeError::FunctionExecutionError,
+                                spider::core::FunctionInvokeError::FunctionNotFound,
                                 fmt::format("Function {} not found.", func_name)
                         )
                 );
                 return cResultSendErr;
             }

Ensure that FunctionInvokeError includes a FunctionNotFound value.

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +54 to +60
void TaskExecutor::cancel() {
m_process->terminate();
std::lock_guard const lock(m_state_mutex);
m_state = TaskExecutorState::Cancelled;
msgpack::packer packer{m_result_buffer};
packer.pack("Task cancelled");
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Ensure Consistent Error Handling in cancel() Method

In the cancel() method, the cancellation message is directly packed into m_result_buffer without following the same error formatting used elsewhere. For consistency and to aid in error parsing, consider using core::create_error_buffer() or a similar mechanism.

Apply this diff to standardize error handling:

 void TaskExecutor::cancel() {
     m_process->terminate();
     std::lock_guard const lock(m_state_mutex);
     m_state = TaskExecutorState::Cancelled;
-    msgpack::packer packer{m_result_buffer};
-    packer.pack("Task cancelled");
+    core::create_error_buffer(
+            core::FunctionInvokeError::FunctionCancelled,
+            "Task cancelled",
+            m_result_buffer
+    );
 }

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +39 to +52
void TaskExecutor::wait() {
int const exit_code = m_process->wait();
if (exit_code != 0) {
std::lock_guard const lock(m_state_mutex);
if (m_state != TaskExecutorState::Cancelled && m_state != TaskExecutorState::Error) {
m_state = TaskExecutorState::Error;
core::create_error_buffer(
core::FunctionInvokeError::FunctionExecutionError,
fmt::format("Subprocess exit with {}", exit_code),
m_result_buffer
);
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add Exception Handling to wait() Method

If m_process->wait() throws an exception, it could lead to unhandled exceptions and potential crashes. Consider adding exception handling to ensure proper resource cleanup and error reporting.

Apply this diff to add exception handling:

 void TaskExecutor::wait() {
+    int exit_code = 0;
+    try {
+        exit_code = m_process->wait();
+    } catch (const std::exception& e) {
+        std::lock_guard const lock(m_state_mutex);
+        m_state = TaskExecutorState::Error;
+        core::create_error_buffer(
+                core::FunctionInvokeError::FunctionExecutionError,
+                fmt::format("Exception during wait(): {}", e.what()),
+                m_result_buffer
+        );
+        return;
+    }
-
-    int const exit_code = m_process->wait();
     if (exit_code != 0) {
         std::lock_guard const lock(m_state_mutex);
         if (m_state != TaskExecutorState::Cancelled && m_state != TaskExecutorState::Error) {
             m_state = TaskExecutorState::Error;
             core::create_error_buffer(
                     core::FunctionInvokeError::FunctionExecutionError,
                     fmt::format("Subprocess exit with {}", exit_code),
                     m_result_buffer
             );
         }
     }
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
void TaskExecutor::wait() {
int const exit_code = m_process->wait();
if (exit_code != 0) {
std::lock_guard const lock(m_state_mutex);
if (m_state != TaskExecutorState::Cancelled && m_state != TaskExecutorState::Error) {
m_state = TaskExecutorState::Error;
core::create_error_buffer(
core::FunctionInvokeError::FunctionExecutionError,
fmt::format("Subprocess exit with {}", exit_code),
m_result_buffer
);
}
}
}
void TaskExecutor::wait() {
int exit_code = 0;
try {
exit_code = m_process->wait();
} catch (const std::exception& e) {
std::lock_guard const lock(m_state_mutex);
m_state = TaskExecutorState::Error;
core::create_error_buffer(
core::FunctionInvokeError::FunctionExecutionError,
fmt::format("Exception during wait(): {}", e.what()),
m_result_buffer
);
return;
}
if (exit_code != 0) {
std::lock_guard const lock(m_state_mutex);
if (m_state != TaskExecutorState::Cancelled && m_state != TaskExecutorState::Error) {
m_state = TaskExecutorState::Error;
core::create_error_buffer(
core::FunctionInvokeError::FunctionExecutionError,
fmt::format("Subprocess exit with {}", exit_code),
m_result_buffer
);
}
}
}

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (7)
src/spider/CMakeLists.txt (2)

39-43: Consider reorganizing message-related components

The worker module's source organization could be improved. Currently, TaskExecutorMessage.hpp stands alone without a corresponding implementation file, while other components follow the header/implementation pattern.

Consider either:

  1. Moving the TaskExecutorMessage implementation into a .cpp file for consistency
  2. Creating a dedicated messages subdirectory to group all message-related components

78-85: Consolidate Boost dependencies

The Boost dependencies for spider_task_executor could be better organized.

Consider:

  1. Creating a variable for common Boost dependencies
  2. Using find_package with specific components needed
set(SPIDER_COMMON_BOOST_LIBS
    Boost::filesystem
    Boost::program_options
    Boost::system
)

target_link_libraries(
    spider_task_executor
    PRIVATE
        ${SPIDER_COMMON_BOOST_LIBS}
        spdlog::spdlog
)
src/spider/worker/task_executor.cpp (4)

28-46: Enhance argument parsing robustness and documentation

The argument parser could benefit from additional validation and improved help documentation:

 auto parse_arg(int const argc, char** const& argv) -> boost::program_options::variables_map {
     boost::program_options::options_description desc;
-    desc.add_options()("help", "spider task executor");
+    desc.add_options()("help", "Spider task executor - Executes functions from dynamically loaded libraries");
     desc.add_options()("func", boost::program_options::value<std::string>(), "function to run");
     desc.add_options()(
             "libs",
-            boost::program_options::value<std::vector<std::string>>(),
+            boost::program_options::value<std::vector<std::string>>()->multitoken(),
             "dynamic libraries that include the spider tasks"
     );
 
     boost::program_options::variables_map variables;
     boost::program_options::store(
             boost::program_options::parse_command_line(argc, argv, desc),
             variables
     );
+    if (variables.count("help")) {
+        std::cout << desc << "\n";
+        std::exit(0);
+    }
     boost::program_options::notify(variables);
     return variables;
 }

50-54: Add documentation for error constants

The error constants would benefit from documentation explaining their specific use cases.

+// Error codes returned by the task executor
+// Command-line argument parsing failed
 constexpr int cCmdArgParseErr = 1;
+// Dynamic library loading failed
 constexpr int cDllErr = 2;
+// Function argument parsing failed
 constexpr int cFuncArgParseErr = 3;
+// Error sending result back to parent process
 constexpr int cResultSendErr = 4;
+// Unexpected errors during execution
 constexpr int cOtherErr = 5;

92-103: Improve error messages for request handling

The error messages could be more descriptive to aid in debugging.

         std::optional<msgpack::sbuffer> request_buffer_option = spider::worker::receive_message(in);
         if (!request_buffer_option.has_value()) {
-            spdlog::error("Cannot read args buffer request");
+            spdlog::error("Failed to read argument buffer from stdin: buffer is empty or malformed");
             return cFuncArgParseErr;
         }
         msgpack::sbuffer const& request_buffer = request_buffer_option.value();
         if (spider::worker::TaskExecutorRequestType::Arguments
             != spider::worker::get_request_type(request_buffer))
         {
-            spdlog::error("Expect args request.");
+            spdlog::error("Invalid request type: expected Arguments request type");
             return cFuncArgParseErr;
         }

1-131: Consider implementing a consistent error handling strategy

The current implementation mixes different error handling approaches (return codes, exceptions, and error responses). Consider implementing a unified error handling strategy across the codebase.

Suggestions:

  1. Define a clear hierarchy of error types
  2. Implement consistent error propagation patterns
  3. Document error handling expectations for each component
  4. Consider using a Result type pattern for error handling
tests/worker/test-TaskExecutor.cpp (1)

46-96: Consider adding test cleanup handling

The tests create resources (io_context, executor) but don't explicitly handle cleanup. While RAII handles most cleanup, consider adding a test fixture to ensure consistent setup/teardown, especially for more complex scenarios that might be added later.

Example fixture structure:

class TaskExecutorTest {
protected:
    boost::asio::io_context context;
    absl::flat_hash_map<boost::process::v2::environment::key,
                       boost::process::v2::environment::value> environment_variable;
    
    TaskExecutorTest() : environment_variable(get_environment_variable()) {}
    
    ~TaskExecutorTest() {
        context.stop();
    }
};

TEST_CASE_METHOD(TaskExecutorTest, "Task execute success", "[worker]") {
    spider::worker::TaskExecutor executor{context, "sum_test", get_libraries(), 
                                        environment_variable, 2, 3};
    // ... rest of the test
}
🧰 Tools
🪛 cppcheck (2.10-2)

[error] 46-46: syntax error

(syntaxError)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between dc55830 and 1db25b8.

📒 Files selected for processing (5)
  • src/spider/CMakeLists.txt (2 hunks)
  • src/spider/worker/DllLoader.cpp (1 hunks)
  • src/spider/worker/task_executor.cpp (1 hunks)
  • tests/CMakeLists.txt (2 hunks)
  • tests/worker/test-TaskExecutor.cpp (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/CMakeLists.txt
  • src/spider/worker/DllLoader.cpp
🧰 Additional context used
🪛 cppcheck (2.10-2)
tests/worker/test-TaskExecutor.cpp

[error] 46-46: syntax error

(syntaxError)

🔇 Additional comments (4)
src/spider/CMakeLists.txt (1)

65-73: Verify build system integration

The task executor component appears to be properly configured, but we should verify its integration with the build system.

✅ Verification successful

Build system integration is properly configured

The CMake configuration for the spider_task_executor component is correctly integrated with the build system:

  • All source files specified in SPIDER_TASK_EXECUTOR_SOURCES exist in the correct locations
  • The executable target is properly defined with appropriate dependencies
  • Required libraries (Boost, spdlog) are correctly linked
  • No circular dependencies were found
  • Source files have consistent include dependencies
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify CMake configuration and dependency resolution

# Check if all required source files exist
echo "Verifying source files..."
for file in worker/DllLoader.cpp worker/DllLoader.hpp worker/message_pipe.cpp worker/message_pipe.hpp worker/task_executor.cpp; do
    if [ ! -f "src/spider/$file" ]; then
        echo "Missing file: $file"
        exit 1
    fi
done

# Check for potential circular dependencies in CMake files
echo "Checking for circular dependencies..."
fd -e cmake -e txt -x grep -l "spider_task_executor" {}

Length of output: 1131


Script:

#!/bin/bash
# Let's check the CMakeLists.txt content to understand how spider_task_executor is used
echo "Checking CMakeLists.txt content..."
cat src/spider/CMakeLists.txt

# Check for any other CMake targets that might depend on these files
echo -e "\nChecking for other related targets..."
rg "add_library|add_executable" src/spider/CMakeLists.txt

# Look for any potential include dependencies
echo -e "\nChecking for include dependencies..."
rg "#include" src/spider/worker/

Length of output: 10464

src/spider/worker/task_executor.cpp (1)

112-121: ⚠️ Potential issue

Use appropriate error code when function is not found

Returning FunctionInvokeError::FunctionExecutionError when the function is not found may be misleading.

         if (nullptr == function) {
             spider::worker::send_message(
                     out,
                     spider::core::create_error_response(
-                            spider::core::FunctionInvokeError::FunctionExecutionError,
+                            spider::core::FunctionInvokeError::FunctionNotFound,
                             fmt::format("Function {} not found.", func_name)
                     )
             );
             return cResultSendErr;
         }
tests/worker/test-TaskExecutor.cpp (2)

1-16: LGTM! Well-organized header inclusions.

The header organization follows best practices with project-specific headers first, followed by third-party and standard library headers. The IWYU pragma for BoostAsio.hpp is appropriately used.


46-96: Move TEST_CASE definitions outside anonymous namespace

The TEST_CASE macros should be in the global namespace scope for proper test discovery by Catch2.

🧰 Tools
🪛 cppcheck (2.10-2)

[error] 46-46: syntax error

(syntaxError)

Comment on lines +65 to +85
set(SPIDER_TASK_EXECUTOR_SOURCES
worker/DllLoader.cpp
worker/DllLoader.hpp
worker/message_pipe.cpp
worker/message_pipe.hpp
worker/task_executor.cpp
CACHE INTERNAL
"spider task executor source files"
)

add_executable(spider_task_executor)
target_sources(spider_task_executor PRIVATE ${SPIDER_TASK_EXECUTOR_SOURCES})
target_link_libraries(spider_task_executor PRIVATE spider_core)
target_link_libraries(
spider_task_executor
PRIVATE
Boost::filesystem
Boost::program_options
Boost::system
spdlog::spdlog
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Reduce code duplication by creating a shared library

The task executor and worker components share several source files (DllLoader, message_pipe). This duplication could lead to maintenance issues.

Consider creating a shared library for common components:

# Create shared library for common worker components
set(SPIDER_WORKER_COMMON_SOURCES
    worker/DllLoader.cpp
    worker/DllLoader.hpp
    worker/message_pipe.cpp
    worker/message_pipe.hpp
    CACHE INTERNAL
    "spider worker common source files"
)

add_library(spider_worker_common)
target_sources(spider_worker_common PRIVATE ${SPIDER_WORKER_COMMON_SOURCES})
target_link_libraries(spider_worker_common PUBLIC spider_core)

# Update existing targets to use common library
target_link_libraries(spider_worker PRIVATE spider_worker_common)
target_link_libraries(spider_task_executor PRIVATE spider_worker_common)

Comment on lines +73 to +78
spider::worker::DllLoader& dll_loader = spider::worker::DllLoader::get_instance();
for (std::string const& lib : libs) {
if (false == dll_loader.load_dll(lib)) {
return cDllErr;
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance DLL loading validation and error handling

The DLL loading process could benefit from more detailed error reporting and validation.

         spider::worker::DllLoader& dll_loader = spider::worker::DllLoader::get_instance();
         for (std::string const& lib : libs) {
+            if (lib.empty()) {
+                spdlog::error("Empty library path provided");
+                return cDllErr;
+            }
             if (false == dll_loader.load_dll(lib)) {
+                spdlog::error("Failed to load library: {}", lib);
                 return cDllErr;
             }
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
spider::worker::DllLoader& dll_loader = spider::worker::DllLoader::get_instance();
for (std::string const& lib : libs) {
if (false == dll_loader.load_dll(lib)) {
return cDllErr;
}
}
spider::worker::DllLoader& dll_loader = spider::worker::DllLoader::get_instance();
for (std::string const& lib : libs) {
if (lib.empty()) {
spdlog::error("Empty library path provided");
return cDllErr;
}
if (false == dll_loader.load_dll(lib)) {
spdlog::error("Failed to load library: {}", lib);
return cDllErr;
}
}

Comment on lines +58 to +60
// NOLINTNEXTLINE(misc-include-cleaner)
spdlog::set_default_logger(spdlog::stderr_color_mt("stderr"));

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling for logger initialization

The logger initialization should be wrapped in a try-catch block to handle potential initialization failures.

     // Set up spdlog to write to stderr
-    spdlog::set_default_logger(spdlog::stderr_color_mt("stderr"));
+    try {
+        spdlog::set_default_logger(spdlog::stderr_color_mt("stderr"));
+    } catch (const spdlog::spdlog_ex& ex) {
+        std::cerr << "Logger initialization failed: " << ex.what() << std::endl;
+        return cOtherErr;
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// NOLINTNEXTLINE(misc-include-cleaner)
spdlog::set_default_logger(spdlog::stderr_color_mt("stderr"));
// NOLINTNEXTLINE(misc-include-cleaner)
try {
spdlog::set_default_logger(spdlog::stderr_color_mt("stderr"));
} catch (const spdlog::spdlog_ex& ex) {
std::cerr << "Logger initialization failed: " << ex.what() << std::endl;
return cOtherErr;
}

Comment on lines 27 to 28
std::string path_env = nullptr == path_env_str ? path_env_str : "";
path_env.append(src_dir.string());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Critical: Fix PATH environment variable handling

There are two issues in the PATH environment variable handling:

  1. The condition is reversed: nullptr == path_env_str ? path_env_str : "" will return null when PATH exists
  2. The path is appended without a separator

Apply this diff to fix the issues:

-    std::string path_env = nullptr == path_env_str ? path_env_str : "";
-    path_env.append(src_dir.string());
+    std::string path_env = path_env_str ? std::string(path_env_str) + ":" : "";
+    path_env += src_dir.string();
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
std::string path_env = nullptr == path_env_str ? path_env_str : "";
path_env.append(src_dir.string());
std::string path_env = path_env_str ? std::string(path_env_str) + ":" : "";
path_env += src_dir.string();

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (1)
tests/worker/test-TaskExecutor.cpp (1)

47-97: Enhance test coverage and resource management

Consider the following improvements:

  1. Add cleanup of resources after tests
  2. Add test cases for timeout scenarios
  3. Verify error messages in failure cases

Example test case for timeout:

TEST_CASE("Task execute timeout", "[worker]") {
    auto const environment_variable = get_environment_variable();
    boost::asio::io_context context;
    
    spider::worker::TaskExecutor
            executor{context, "long_running_test", get_libraries(), 
                    environment_variable, std::chrono::milliseconds{100}, 2};
    context.run();
    executor.wait();
    REQUIRE(executor.error());
    auto error = executor.get_error();
    REQUIRE(spider::core::FunctionInvokeError::Timeout == std::get<0>(error));
}
🧰 Tools
🪛 cppcheck (2.10-2)

[error] 47-47: syntax error

(syntaxError)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 1db25b8 and 5f8d2fb.

📒 Files selected for processing (1)
  • tests/worker/test-TaskExecutor.cpp (1 hunks)
🧰 Additional context used
🪛 cppcheck (2.10-2)
tests/worker/test-TaskExecutor.cpp

[error] 47-47: syntax error

(syntaxError)

🔇 Additional comments (3)
tests/worker/test-TaskExecutor.cpp (3)

1-16: LGTM! Headers and includes are well-organized.

The includes are properly organized and all necessary headers are present.


27-29: Improve PATH environment variable handling

The current PATH handling could be more idiomatic and efficient.


47-97: Move TEST_CASE definitions outside anonymous namespace

Test cases should be in the global scope for proper discovery by Catch2.

🧰 Tools
🪛 cppcheck (2.10-2)

[error] 47-47: syntax error

(syntaxError)

Comment on lines +41 to +45
auto get_libraries() -> std::vector<std::string> {
boost::filesystem::path const executable_dir = boost::dll::program_location().parent_path();
boost::filesystem::path const lib_path = executable_dir / "libworker_test.so";
return {lib_path.string()};
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add validation for library existence

The function should verify that the library file exists before returning its path.

Apply this diff to add validation:

 auto get_libraries() -> std::vector<std::string> {
     boost::filesystem::path const executable_dir = boost::dll::program_location().parent_path();
     boost::filesystem::path const lib_path = executable_dir / "libworker_test.so";
+    if (!boost::filesystem::exists(lib_path)) {
+        throw std::runtime_error("Library not found: " + lib_path.string());
+    }
     return {lib_path.string()};
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
auto get_libraries() -> std::vector<std::string> {
boost::filesystem::path const executable_dir = boost::dll::program_location().parent_path();
boost::filesystem::path const lib_path = executable_dir / "libworker_test.so";
return {lib_path.string()};
}
auto get_libraries() -> std::vector<std::string> {
boost::filesystem::path const executable_dir = boost::dll::program_location().parent_path();
boost::filesystem::path const lib_path = executable_dir / "libworker_test.so";
if (!boost::filesystem::exists(lib_path)) {
throw std::runtime_error("Library not found: " + lib_path.string());
}
return {lib_path.string()};
}

@sitaowang1998 sitaowang1998 merged commit 262ce58 into y-scope:main Nov 30, 2024
2 checks passed
@sitaowang1998 sitaowang1998 deleted the task_executor branch November 30, 2024 21:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant