feat: Add client driver implementation #42

sitaowang1998 · 2025-01-02T18:30:49Z

Description

Add client driver implementation except the job start functionality.

Validation performed

GitHub workflows pass.
Unit tests pass in dev container.
Integration tests pass in dev container.

Summary by CodeRabbit

Release Notes

New Features
- Added a new Driver class for managing connections to MySQL storage
- Enhanced task graph management with improved input and output task tracking
- Introduced more robust function and task management capabilities
Improvements
- Refined task binding and execution mechanisms
- Updated key-value store interactions
- Improved error handling for task and driver operations
Bug Fixes
- Resolved issues with task graph dependencies and task identification
Chores
- Updated build scripts and library installation processes
- Refactored internal code structure for better modularity

coderabbitai · 2025-01-02T18:30:57Z

Walkthrough

The pull request introduces significant architectural changes to the Spider framework, focusing on reorganizing source files, enhancing task management, and improving the driver and storage components. Key modifications include restructuring header and source file locations, adding new methods to task and driver classes, and updating the function and task graph management. The changes aim to improve modularity, error handling, and flexibility in task graph creation and execution.

Changes

File	Change Summary
`src/spider/CMakeLists.txt`	Reorganized header and source file assignments between core and client libraries
`src/spider/client/Data.hpp`	Added `Driver` and `TaskContext` classes, modified `Builder` constructor and memory management
`src/spider/client/Driver.cpp`	Implemented new `Driver` class with MySQL storage connection and key-value store methods
`src/spider/client/Driver.hpp`	Enhanced `Driver` class with new methods and error handling
`src/spider/client/TaskGraph.hpp`	Added private constructor and implementation details for `TaskGraph`
`src/spider/client/task.hpp`	Modified `TaskFunction` definition and added new type handling structures
`src/spider/core/Task.hpp`	Added new constructors and methods for `TaskInput` and `Task` classes
`src/spider/core/TaskGraph.hpp`	Introduced methods for managing input and output tasks
`src/spider/core/TaskGraphImpl.hpp`	New implementation for task graph binding and creation
`src/spider/storage/MysqlStorage.cpp`	Updated job addition logic using new task graph methods
`src/spider/worker/FunctionManager.cpp`	Renamed variables and added function name retrieval method
`src/spider/worker/FunctionManager.hpp`	Added function name map and enhanced function registration
`tests/CMakeLists.txt`	Added new test source for `Driver`
`tests/client/test-Driver.cpp`	New test suite for `Driver` functionality
`tests/scheduler/*`	Updated task graph methods in scheduling tests
`tests/storage/*`	Modified task graph and identifier handling
`tests/worker/test-FunctionManager.cpp`	Updated function signatures and added name retrieval test
`tools/scripts/lib_install/boost.sh`	Changed Boost library source retrieval method

Sequence Diagram

sequenceDiagram
    participant Driver
    participant TaskGraph
    participant TaskGraphImpl
    participant Task
    participant FunctionManager

    Driver->>TaskGraph: bind task
    TaskGraph->>TaskGraphImpl: create task
    TaskGraphImpl->>FunctionManager: register function
    TaskGraphImpl->>Task: create task instance
    TaskGraphImpl-->>TaskGraph: return task graph
    Driver-->>Driver: store task graph

The sequence diagram illustrates the high-level interaction between components during task binding, showcasing how the Driver coordinates with TaskGraph, TaskGraphImpl, Task, and FunctionManager to create and register tasks.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (31)

src/spider/client/task.hpp (5)

4-4: Consider including only necessary headers
You introduced <type_traits> here. Verify if this header is necessary or if it can be relocated to a more appropriate place to reduce compile times.

48-48: Clarify concept naming
The name RunnableOrTaskIo might be more explicit if described as something like IsRunnableOrTaskIo, aligning better with typical concept naming patterns.

49-55: Consolidate template specializations
You've added ConcatTaskGraphType to handle different parameter combinations. This approach is fine, but verify if the partial specializations can be safely merged to reduce code duplication.

69-75: Double-check duplication between TaskGraph merges
Concatenating two TaskGraphs expands the parameter packs. Ensure users do not accidentally create ambiguous tasks if repeated types exist.

76-78: Avoid empty primary templates
If feasible, avoid empty definitions like template <class...> struct MergeTaskGraphTypes; by combining it with the initial partial specialization for improved clarity.

src/spider/client/Driver.cpp (2)

1-9: Header inclusions
The included headers appear consistent with the required functionality (threads, optional, string, stops, etc.). Good practice to confirm all are strictly needed to keep compilation lean.

63-99: Uniform handling for second constructor
Same suggestions for using an initialiser list and thorough error-handling of core::get_address() to match the first constructor. Ensuring consistent usage helps maintain code viability.

🧰 Tools

🪛 cppcheck (2.10-2)

[performance] 95-95: Variable 'id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)

src/spider/core/TaskGraph.hpp (3)

7-7: Potential leftover
You added <tuple>. Confirm it is used. If not, consider removing to clean up includes.

12-12: Unused absl/flat_hash_map import
Check if you retained all references to absl::flat_hash_map. If the container changed, remove this include.

102-105: Prevent duplicates in input/output tasks
When adding tasks to m_input_tasks or m_output_tasks, consider whether duplicates should be disallowed. If yes, insert a check or use a set structure.

src/spider/client/Driver.hpp (1)

20-20: Check unreferenced ‘utility’
Review whether #include "utility" is required or if you can remove it to declutter.

src/spider/core/TaskGraphImpl.hpp (5)

1-3: File introduction
New file for TaskGraphImpl. Good separation of concerns, but ensure documentation clarifies how TaskGraphImpl fits into the system’s architecture.

23-25: Cognitive complexity
bind can become quite large with multiple expansions. Consider extracting internal portions into smaller helper methods if it grows more.

71-74: Manufacturing tasks
You add every task from parent_graph into graph.m_graph. Confirm potential name collisions or ID collisions (though you do reset).

72-92: Task Io type consistency
The checks inside this block rely on matching types with typeid(...). Provide clearer error messages or logs if mismatches occur frequently.

168-190: Merging task outputs
Mapping each output from output_task to an input in task is a powerful approach, but be sure to handle index mismatch elegantly.
src/spider/worker/FunctionManager.hpp (3)
52-52: Consider alternative ways of storing function pointers.
Storing function pointers in a void* may lead to subtle portability concerns. For a more robust design, consider using a dedicated pointer-to-function type or a wrapper structure to avoid any aliasing issues.

387-391: Provide logging or diagnostic output on duplicate registrations.
While returning false effectively prevents duplicate registrations, it might be advantageous to log an error or warning message indicating that the function name is already registered, thereby helping with troubleshooting.
 if (m_function_map.contains(name)) {
+    spdlog::warn("Function \"{}\" is already registered. Skipping duplicate registration.", name);
     return false;
 }
406-406: Check for potential name conflicts in function invoker.
Consider performing a similar name conflict check under register_function_invoker, ensuring consistent duplicate handling analogous to register_function.
src/spider/storage/MysqlStorage.cpp (1)

542-542: Confirm readiness handling.
Marking tasks returned by get_input_tasks() as ready is logical if these tasks indeed have all inputs satisfied from the outset. Please double-check that no other tasks erroneously require a similar update to their initial state, particularly in partial or dynamic graphs.

Do you need a script to confirm no other tasks have zero dependencies or are partially handled?

src/spider/client/TaskGraph.hpp (1)

24-35: Check constructor’s privacy and friend relationships.
Granting friendship to multiple classes like core::TaskGraphImpl, Driver, and TaskContext is valid if they truly require direct access to m_impl. If not, reduce the friend scope or provide other controlled accessor methods to maintain strict encapsulation.

src/spider/worker/FunctionManager.cpp (1)

70-75: Safeguard potential pointer collisions.
The m_name_map uses void* as keys. Ensure that no dynamic library boundary or other compiler nuance can cause pointer identity changes. If you plan to load or unload function definitions, consider storing a stable ID instead of raw pointers.

tests/client/test-Driver.cpp (3)

35-37: Extend test coverage for boundary conditions.
While sum is straightforward, consider adding test inputs that involve negative, zero, or extreme integer values to exercise edge cases.

39-41: Validate concurrency with Data<int>&.
Ensure the test covers scenarios where multiple tasks might concurrently read from the same data. Although the design might be inherently single-threaded for test scenarios, concurrency tests can help confirm correctness with real usage.

54-60: Consider verifying final results of bound tasks.
The tests do not appear to assert the final returned value from the task graphs. Extend the tests to confirm correct outputs, e.g., verifying that sum(1, 1) + test_driver(...) indeed yields the expected results.

src/spider/client/Data.hpp (1)

167-167: Granting TaskGraphImpl friend visibility appears consistent with design.
Before finalizing, ensure that TaskGraphImpl only accesses necessary internals to preserve encapsulation.

src/spider/core/Task.hpp (2)

19-19: Slight repetition in multiple constructors.
Evaluate unifying or delegating constructors to reduce maintenance overhead.

147-148: get_input_ref improves in-place manipulations.
Be mindful of concurrency if multiple threads modify the same input reference.

tests/worker/test-FunctionManager.cpp (1)

50-54: Verifying retrieval of registered function name
Good approach to confirm an actual function name is returned. The usage of reinterpret_cast is consistent with the FunctionManager design.

Consider clarifying in a comment why the reinterpret_cast is necessary for those unfamiliar with your function pointer approach.

tests/storage/test-MetadataStorage.cpp (1)

259-261: Adding input/output tasks for concurrency
Assigning parent tasks as inputs and the child task as output fosters a clear test scenario. Keep an eye on concurrency or multi-thread usage if tasks eventually become parallel.

tools/scripts/lib_install/boost.sh (1)

33-37: Shifting from tarball to git-based Boost retrieval
This approach is more flexible and can simplify version management. Ensure the checked-out version is correct and that future updates are well-documented.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 636b8d3 and 1769a95.

📒 Files selected for processing (20)

src/spider/CMakeLists.txt (2 hunks)
src/spider/client/Data.hpp (4 hunks)
src/spider/client/Driver.cpp (1 hunks)
src/spider/client/Driver.hpp (6 hunks)
src/spider/client/TaskGraph.hpp (1 hunks)
src/spider/client/task.hpp (3 hunks)
src/spider/core/Task.hpp (4 hunks)
src/spider/core/TaskGraph.hpp (2 hunks)
src/spider/core/TaskGraphImpl.hpp (1 hunks)
src/spider/storage/MysqlStorage.cpp (2 hunks)
src/spider/worker/FunctionManager.cpp (1 hunks)
src/spider/worker/FunctionManager.hpp (3 hunks)
tests/CMakeLists.txt (1 hunks)
tests/client/test-Driver.cpp (1 hunks)
tests/scheduler/test-SchedulerPolicy.cpp (3 hunks)
tests/scheduler/test-SchedulerServer.cpp (1 hunks)
tests/storage/test-DataStorage.cpp (2 hunks)
tests/storage/test-MetadataStorage.cpp (4 hunks)
tests/worker/test-FunctionManager.cpp (1 hunks)
tools/scripts/lib_install/boost.sh (1 hunks)

🧰 Additional context used

🪛 cppcheck (2.10-2)

tests/client/test-Driver.cpp

[error] 30-30: syntax error

(syntaxError)

tests/worker/test-FunctionManager.cpp

[error] 43-43: syntax error

(syntaxError)

src/spider/client/Driver.cpp

[performance] 95-95: Variable 'id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)

[performance] 114-114: Variable 'm_id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)

[performance] 24-24: Variable 'm_id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)

🔇 Additional comments (62)

src/spider/client/task.hpp (6)

31-31: Change from std::function to function pointer
Switching from std::function<ReturnType(TaskContext, TaskParams...)> to a function pointer could enhance performance, but be cautious about the potential loss in flexibility (e.g., capturing lambdas).

57-61: Confirm type composition correctness
This overload ensures that adding a single InputType does not change the existing graph parameter pack. Please confirm that skipping new parameters for plain input types is intentional and covers all use cases.

62-68: Ensure extended pack correctness
When merging a TaskFunction with a TaskGraph, confirm you are applying the correct parameter expansions for TaskParams.... This logic can be fragile with complicated packs.

79-83: Preserve existing type ordering
Merging the graph into a single type is helpful. Confirm that no initial type ordering is lost that might alter scheduling or invocation sequences downstream.

84-92: Recursive approach is well-structured
Your recursion approach, using MergeTaskGraphTypes + ConcatTaskGraphType, is quite elegant. Just ensure that the recursion terminates cleanly for large input sets.

93-95: Alias usage
TaskGraphType nicely abstracts away the param expansions. This is a good pattern for simpler user-facing APIs.
src/spider/client/Driver.cpp (5)

10-12: Boost UUID usage
Using Boost’s random generator is acceptable here. If generation performance or custom seeding are concerns, evaluate an alternative or ensure robust randomness.

22-61: Consider constructor init lists
You assign m_id and then create storages in the constructor’s body. Using an initialiser list might be more efficient while maintaining clarity:
[performance]
-Driver::Driver(std::string const& storage_url) {
-    boost::uuids::random_generator gen;
-    m_id = gen();
-    ...
+Driver::Driver(std::string const& storage_url)
+    : m_id(boost::uuids::random_generator()()),
+      m_metadata_storage(std::make_shared<core::MySqlMetadataStorage>()),
+      m_data_storage(std::make_shared<core::MySqlDataStorage>())
+{
+    ...
🧰 Tools

🪛 cppcheck (2.10-2)

[performance] 24-24: Variable 'm_id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)

37-41: Check machine address retrieval
If core::get_address() might fail or return an empty address under specific network conditions, consider logging deeper diagnostic info.

101-107: Insert operation
kv_store_insert logic is straightforward. Good job verifying that the storage call succeeded before returning.

109-119: Optional usage
Correctly returning std::nullopt when not found. This suits the expected pattern for key-value retrieval. Good clarity in error handling for other failure types.

🧰 Tools

🪛 cppcheck (2.10-2)

[performance] 114-114: Variable 'm_id' is assigned in constructor body. Consider performing initialization in initialization list.

(useInitializationList)
src/spider/core/TaskGraph.hpp (4)

4-5: Header expansions
The addition of <cstddef> and <cstdint> is consistent with the usage of size/type definitions.

94-101: New getters for input/output tasks
get_input_tasks() and get_output_tasks() returning references is good for avoiding copies. Be sure no external code modifies these vectors unexpectedly.

111-152: Deep references replaced
Your reset_ids() method thoroughly updates all references in tasks, dependencies, and input/output vectors. Consider locking or concurrency-safety if multiple threads might read/modify graph data simultaneously.

153-153: Structure
Adding m_input_tasks and m_output_tasks improves clarity. Great approach to delineate them from the main tasks.

src/spider/client/Driver.hpp (6)

4-8: Header usage
Added <memory>, <stdexcept>, <thread>, etc. The new includes adequately cover the expanded functionality for storage, concurrency, and error handling.

13-13: Check circular dependencies
Noting the new includes TaskGraphImpl.hpp and others. Watch out for potential circular includes if you expand cross-dependencies in the future.

39-42: Forward declarations for storages
Forward declaring MetadataStorage and DataStorage is a good approach, ensuring minimal coupling here.

70-73: get_data_builder clarity
This method’s usage with Data<T>::Builder is straightforward, returning a fully prepared builder referencing this driver’s ID.

82-82: Explicit return type
Using auto ... -> void is consistent with the style used across your codebase. Retaining clarity on function signatures fosters maintainability.

110-122: Template binding checks
Validating that core::TaskGraphImpl::bind(...) returns an actual graph ensures robust error handling. Throwing an exception if binding fails is user-friendly for debugging.

src/spider/core/TaskGraphImpl.hpp (9)

4-9: Essential includes
The logic depends on <cstdint>, <optional>, <string>, <tuple>, <type_traits>. These appear necessary, ideal for a templated utility file.

39-42: Early exit
You check if (fail) inside the loop. This short-circuit approach is fine. Just confirm clients are aware any partially bound items are discarded if one fails.

57-61: Efficient usage of parent_graph.reset_ids()
Resetting ID ensures uniqueness of merged tasks. Verify that tasks in parent_graph haven’t been scheduled or run prior to merging to avoid conflicts.

82-85: Data references
When setting input.set_data_id(...), confirm that the data is guaranteed to exist within the same context or storage.

86-90: Serialisable inputs
Great approach packaging the input into a msgpack buffer. This fosters a flexible, consistent approach to serialisation.

98-100: Consume all inputs
If the user provides fewer or more inputs than the task parameters, returning std::nullopt is a neat fail-fast approach.

109-135: create_task method
This method’s reflection-based approach is interesting. Ensure FunctionManager consistently identifies function pointers across compilation units.

137-166: Adding parent tasks
The logic for building and connecting parent tasks is correct, though quite advanced. Keep an eye on possible index out-of-bounds with carefully validated position.

192-193: Encapsulation
m_graph as a private member ensures usage is funnelled through bind. Good job preserving internal states.

src/spider/worker/FunctionManager.hpp (1)

411-417: Ensure accessor const-correctness aligns with usage.
Accessors returning const& preserve immutability. Verify that no modifications to m_function_map or m_name_map are essential outside the class. This design is consistent with read-only usage but ensure that external consumers do not require mutation.

src/spider/storage/MysqlStorage.cpp (1)

489-493: Validate logic for head task retrieval.
The new approach builds a set of heads based on get_input_tasks(). Confirm that the DAG definition of input tasks matches your design intent. If any tasks outside those inputs ever need to be included as heads, ensure consistency with the new method.

src/spider/worker/FunctionManager.cpp (1)

64-64: Ensure at-callsite error handling remains unaffected.
Renaming m_map to m_function_map clarifies the usage. Confirm that all references are updated throughout the codebase, and no external calls rely on the old name.

tests/client/test-Driver.cpp (2)

30-30: Ignore potential false positive syntax error.
The static analysis tool flags a syntax error here, but the code appears valid. Treat this as a false positive unless local compilation fails.

🧰 Tools

🪛 cppcheck (2.10-2)

[error] 30-30: syntax error

(syntaxError)

46-52: Double-check multi-task binding logic.
Your usage of driver.bind(&sum, &sum, &sum, ...) is flexible, but confirm the correct ordering of parameters. If the tasks require sequential or parallel execution, ensure the underlying infrastructure supports the corresponding scheduling strategy.

tests/scheduler/test-SchedulerServer.cpp (1)

75-76: No concerns with the use of add_input_task and add_output_task.
These calls properly clarify the flow of input and output tasks in this test.

src/spider/client/Data.hpp (3)

25-28: Forward declarations contribute to improved compilation times and better decoupling.
These class forward declarations help avoid unnecessary includes in this header.

126-126: Use of move semantics is exemplary for performance.
Transferring ownership here prevents copying large buffers and maintains efficiency.

135-137: Constructor parameters neatly align with typical usage.
The parameters for Builder (data store, source ID, data source) enable flexible creation of data with distinct metadata.

src/spider/core/Task.hpp (4)

18-18: New constructor fosters clarity in instantiating TaskInput.
This fits well with existing constructors, enabling simpler creation of typed inputs.

47-50: Use set_output carefully to keep consistent output references.
Ensure that the newly set output aligns with the overall task dependencies.

51-51: Empty line for readability.
No issues here.

123-124: Allowing Task ID reassignment might introduce confusion.
Confirm that changing IDs will not disrupt references or stored states.

tests/scheduler/test-SchedulerPolicy.cpp (4)

46-47: Explicitly adding input and output tasks helps define the graph's scope.
Maintaining these calls ensures that job creation in the metadata store is correct.

54-55: Consistent usage of add_input_task and add_output_task.
This continuity clarifies the FIFO policy’s ability to acknowledge full task definitions.

102-103: Solidifies the task’s boundaries within the graph.
Marking tasks as input and output fosters better clarity for scheduling logic.

149-150: Ensuring uniform application of input and output references.
This helps the scheduler handle tasks consistently, especially when locality is considered.

tests/worker/test-FunctionManager.cpp (7)

22-24: Changing TaskContext to a non-const reference
This change is appropriate if the intention is to modify the TaskContext from within this function. Ensure it is necessary from the design standpoint, as it can allow side effects that increase complexity.

26-29: Updating signature of tuple_ret_test
Same as above, switching to a non-const reference enables modifications to the TaskContext. Validate whether these modifications are genuinely required.

31-33: Switching data_test to accept non-const reference
Confirm that side effects on the TaskContext are expected. If not, reverting to const reference may improve code safety.

35-37: New function not_registered introduced
Looks good for testing unregistered function flows. Ensure it is not inadvertently called in production code.

43-43: False-positive syntax error from static analysis
No evident syntax error found in this line. Likely a false alarm.

🧰 Tools

🪛 cppcheck (2.10-2)

[error] 43-43: syntax error

(syntaxError)

44-45: Testing get_instance for FunctionManager
The approach is straightforward and does not present obvious issues.

46-48: Verifying behaviour for non-registered function name
The check for std::nullopt is correct. This test helps ensure that only registered functions are recognized.

tests/storage/test-DataStorage.cpp (2)

91-92: Adding input and output tasks
Ensures that the test covers a complete lifecycle of a task. This clarifies input/output data flow but confirm that all tasks added in future require these calls.

132-134: Extending task structure with dependency and I/O tasks
It enhances the relationships among tasks. Validate that cyclical dependencies cannot occur unintentionally.

tests/storage/test-MetadataStorage.cpp (4)

141-143: Registering input and output tasks
Adding input/output explicitly clarifies the data flow for parent and child tasks. The approach seems logically correct.

146-149: Using get_input_tasks for vector-based head tasks
Changing from sets to vectors is a fine approach. Just confirm that the order of heads is stable and reliable for your use case.

159-160: Defining input and output tasks for simple_graph
Consistently mirrors above usage. This ensures the test environment remains uniform.

Also applies to: 162-162, 164-164

323-325: Setting up input and output tasks for resetting job
Ensures the reset feature is thoroughly tested. No issues spotted with this addition.
src/spider/CMakeLists.txt (2)

125-129: LGTM! Proper placement of the Driver implementation.

The Driver source file is correctly added to the shared client library sources, aligning with the PR's objective of implementing a client driver.

139-141: Verify architectural impact of header reorganization.

Moving implementation headers from core to client shared scope represents a significant architectural change. Let's ensure this reorganization doesn't introduce unintended coupling or break existing dependencies.

✅ Verification successful

Header reorganization appears safe with limited dependencies

Based on the analysis, the header reorganization has a contained impact:

DataImpl.hpp is only referenced in worker/FunctionManager.hpp

TaskContextImpl.hpp is used in worker/FunctionManager.hpp and its test file

TaskGraphImpl.hpp is only referenced in client/Driver.hpp

The directory structure shows that the implementation headers still exist in the core directory, suggesting this is a modification to the build system rather than an actual file move. The limited usage pattern indicates these implementation headers are appropriately encapsulated and the change doesn't introduce unintended coupling.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the impact of header reorganization

# Check for any remaining references to these headers in core
echo "Checking for remaining core references..."
rg -l "DataImpl\.hpp|TaskContextImpl\.hpp|TaskGraphImpl\.hpp" --type cpp --type hpp

# Verify include hierarchy
echo "Checking include dependencies..."
for header in DataImpl.hpp TaskContextImpl.hpp TaskGraphImpl.hpp; do
  echo "Analyzing dependencies for $header:"
  rg -l "include.*$header" --type cpp --type hpp
done
Length of output: 1037

Script:
#!/bin/bash
# Let's try with correct file extensions for C++
echo "Checking for remaining core references..."
rg -l "DataImpl\.hpp|TaskContextImpl\.hpp|TaskGraphImpl\.hpp" --type-add 'cpp:*.{cpp,hpp,h,cc,cxx}' --type cpp

echo -e "\nChecking include dependencies..."
for header in DataImpl.hpp TaskContextImpl.hpp TaskGraphImpl.hpp; do
  echo -e "\nAnalyzing dependencies for $header:"
  rg -l "include.*$header" --type-add 'cpp:*.{cpp,hpp,h,cc,cxx}' --type cpp
done

# Let's also check the directory structure to understand the reorganization
echo -e "\nChecking current directory structure:"
fd -e hpp -e cpp -e h
Length of output: 3584

coderabbitai · 2025-01-02T18:35:37Z

src/spider/client/Driver.cpp

+    // Start a thread to send heartbeats
+    // NOLINTNEXTLINE(performance-unnecessary-value-param)
+    m_heartbeat_thread = std::jthread([this](std::stop_token stoken) {
+        while (!stoken.stop_requested()) {
+            std::this_thread::sleep_for(std::chrono::seconds(1));
+            core::StorageErr const err = m_metadata_storage->update_heartbeat(m_id);
+            if (!err.success()) {
+                throw ConnectionException(err.description);
+            }
+        }
+    });


🛠️ Refactor suggestion

Heartbeat thread usage
Starting a continuous heartbeat in the constructor is helpful, but confirm that the destructor or move logic properly stops and joins the thread to prevent resource leaks.

coderabbitai · 2025-01-02T18:35:37Z

tests/CMakeLists.txt

@@ -11,6 +11,7 @@ set(SPIDER_TEST_SOURCES
    io/test-MsgpackMessage.cpp
    scheduler/test-SchedulerPolicy.cpp
    scheduler/test-SchedulerServer.cpp
+    client/test-Driver.cpp


💡 Codebase verification

Test coverage needs improvement for the Driver implementation

The test file covers basic functionality but is missing tests for several key features:

Missing test coverage for:

start() method for both task and task graph execution

get_jobs() method for retrieving scheduled/running jobs

Error handling scenarios (e.g., ConnectionException, DriverIdInUseException)

Driver constructor with custom UUID

Task timeout registration via SPIDER_REGISTER_TASK_TIMEOUT

Current tests only cover:

Basic key-value store operations

Data building

Task binding scenarios

🔗 Analysis chain

LGTM! Proper test integration for the new Driver implementation.

The test file is correctly integrated into the test suite. However, let's verify the test coverage for the new Driver implementation.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Description: Verify test coverage for the Driver implementation # Check if the test file exists and contains test cases fd "test-Driver.cpp" | xargs grep -l "SCENARIO\|TEST_CASE" # Check for comprehensive testing of Driver's key features rg "TEST_CASE|SECTION" client/test-Driver.cpp

Length of output: 198

Script:

#!/bin/bash # Let's check the content of the test file to verify test coverage cat tests/client/test-Driver.cpp # Also check for the actual Driver implementation to cross-reference tested features fd "Driver\.(cpp|hpp)" --type f

Length of output: 2653

Script:

#!/bin/bash # Let's check the Driver implementation to ensure all key features are tested cat src/spider/client/Driver.hpp

Length of output: 5687

sitaowang1998 added 19 commits December 31, 2024 00:40

Add function ptr to name map in function manager

9ca2b7d

Add basic driver impl

338c624

Add task binding

ba34a8b

Add tests for driver and fix some bugs

d166536

Fix function pointer problem

9559e0c

Fix task graph template deduction

b0467b5

Fix tests and task function matching

24142d8

Bump boost version try to solve github action install crash

ad450a6

Use boost official url for dependency

5486f02

Add more print to debug github workflow

1a0358d

Remove z flag for tar

cbe9644

Try fix the tar flags

fd43074

Use github for boost download instead of packaged tarball

0e418a3

Bug fix

aa806dd

Pull submodule for boost

60a486b

Revert boost version back to 1.86.0

e2ac3e0

Fix test storage tag

8d4f2b5

Fix clang tidy

58e85bd

Fix clang tidy

1769a95

coderabbitai bot reviewed Jan 2, 2025

View reviewed changes

sitaowang1998 merged commit c7da559 into y-scope:main Jan 2, 2025
5 checks passed

sitaowang1998 deleted the client branch January 2, 2025 20:40

This was referenced Jan 3, 2025

feat: Add job implementation and driver start job implementation #43

Merged

build: Add headers specification in CMakeLists.txt and remove spdlog in public headers #46

Merged

coderabbitai bot mentioned this pull request Jan 13, 2025

feat: Add host address as scheduler and worker command line argument #47

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add client driver implementation #42

feat: Add client driver implementation #42

sitaowang1998 commented Jan 2, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 2, 2025 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot Jan 2, 2025

coderabbitai bot Jan 2, 2025

feat: Add client driver implementation #42

feat: Add client driver implementation #42

Conversation

sitaowang1998 commented Jan 2, 2025 • edited by coderabbitai bot Loading

Description

Validation performed

Summary by CodeRabbit

Release Notes

coderabbitai bot commented Jan 2, 2025 • edited Loading

Walkthrough

Changes

Sequence Diagram

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Jan 2, 2025

Choose a reason for hiding this comment

coderabbitai bot Jan 2, 2025

Choose a reason for hiding this comment

sitaowang1998 commented Jan 2, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 2, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)