-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add job implementation and driver start job implementation #43
Conversation
WalkthroughThis pull request introduces significant enhancements to the Spider framework's client library, storage management, and testing infrastructure. The changes focus on improving type safety, error handling, and dependency management across multiple components. Key modifications include restructuring the client library, adding new job status tracking mechanisms, enhancing metadata storage interfaces, and expanding the testing framework with new integration and unit tests. Changes
Possibly related PRs
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (19)
src/spider/client/Job.hpp (1)
61-79
: wait_complete method
- The loop continuously polling with a short sleep is straightforward, though consider a more efficient approach (e.g., condition variables or server-side notifications) if scaling is needed.
- Throwing
ConnectionException
if the retrieval fails is consistent with your error handling design.For the current scope, this polling method is acceptable.
src/spider/core/JobMetadata.hpp (1)
38-43
: Consider adding descriptive doxygen docs
TheJobStatus
enum is straightforward, yet it would benefit from brief doxygen-style comments, describing each status (e.g., whetherFailed
covers all error conditions, etc.) for consistent usage across the codebase.tests/worker/worker-test.cpp (2)
16-16
: Enforce error context usage
Throwing astd::runtime_error
is fine, but capturing and logging more context (e.g., input parameters) can help with debugging.
24-24
: Optimise random seed usage
std::random_device
may be slow or non-deterministic. Consider more deterministic approaches in test code if reproducibility is required.tests/integration/test_client.py (5)
14-16
: Function docstring
start_scheduler_worker
would benefit from a docstring describing its logic and any assumptions about the executables.
17-27
: Consider robust error handling
Ifscheduler_process
fails to start, the code continues silently. Consider capturing stderr or re-throwing on external command failures.
39-40
: Centralise port constants
scheduler_port = 6103
is a magic number. Consider referencing a global constant or config file for clarity.
42-52
: Wait rationale
A fixed 5-second wait can be fragile if the system is under load or slower. Consider implementing a more robust readiness check or a retry loop.
54-64
: Validate client test
You are assertingp.returncode == 0
. Consider verifying logs or CLI output to confirm that the client truly performed the expected tasks.src/spider/client/type_utils.hpp (6)
5-5
: Validate necessary includes
<type_traits>
might be used for compile-time checks. Confirm that no unneeded includes slow down build time.
33-33
: Clear naming
cIsSpecializationV
is descriptive but ensure clarity on the naming convention (c
prefix).
34-38
: Helpful template doc
Num
is a handy compile-time container. Brief usage docs would help other devs understand its purpose quickly.
39-43
: Ensure clarity infor_n
for_n
sections can boost clarity if a comment explains the usage ofstd::initializer_list
trick.
49-53
: Potential generalisation
ExtractTemplateParam
might be expanded to handle multiple template parameters. If that is anticipated, plan for it.
54-58
: Template doc
template <template<class> class t, class P>
is a powerful pattern. Document it for maintainability.src/spider/storage/MetadataStorage.hpp (1)
40-43
: Verify the retrieve output tasks logic.This method is well-defined, but ensure that it accounts for scenarios with no output tasks or partial failures. You could consider returning an empty vector in those cases and ensuring upstream logic handles it gracefully.
tests/client/client-test.cpp (1)
1-11
: Review the includes for potential redundancy.It might be beneficial to remove any unused headers, if they are not needed for this file’s functionality.
tests/CMakeLists.txt (1)
63-72
: Well-structured test target addition, beauty!The new
client_test
executable is properly organized with appropriate dependencies. Consider adding it to theintegrationTest
target's dependencies for comprehensive testing coverage.Add the following after line 72:
+add_dependencies(integrationTest client_test)
src/spider/CMakeLists.txt (1)
134-136
: Standardize header file naming conventionThere's an inconsistency in the naming convention:
Job.hpp
uses PascalCasetask.hpp
uses snake_casespider.hpp
uses snake_casePlease standardize the naming convention across all headers.
Consider renaming to maintain consistency:
- client/Job.hpp - client/task.hpp - client/spider.hpp + client/job.hpp + client/task.hpp + client/spider.hpp
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (16)
src/spider/CMakeLists.txt
(3 hunks)src/spider/client/Driver.hpp
(4 hunks)src/spider/client/Job.hpp
(4 hunks)src/spider/client/type_utils.hpp
(2 hunks)src/spider/core/JobMetadata.hpp
(2 hunks)src/spider/core/TaskGraphImpl.hpp
(4 hunks)src/spider/storage/MetadataStorage.hpp
(1 hunks)src/spider/storage/MysqlStorage.cpp
(12 hunks)src/spider/storage/MysqlStorage.hpp
(1 hunks)src/spider/worker/FunctionManager.hpp
(0 hunks)test-tasks.yaml
(1 hunks)tests/CMakeLists.txt
(2 hunks)tests/client/client-test.cpp
(1 hunks)tests/integration/test_client.py
(1 hunks)tests/worker/worker-test.cpp
(2 hunks)tests/worker/worker-test.hpp
(1 hunks)
💤 Files with no reviewable changes (1)
- src/spider/worker/FunctionManager.hpp
🧰 Additional context used
🪛 cppcheck (2.10-2)
src/spider/storage/MysqlStorage.cpp
[performance] 95-95: Variable 'id' is assigned in constructor body. Consider performing initialization in initialization list.
(useInitializationList)
🔇 Additional comments (55)
src/spider/client/Driver.hpp (5)
9-10
: Headers look appropriate and relevant
These new headers (<tuple>
,<type_traits>
,<boost/uuid/random_generator.hpp>
,<fmt/format.h>
,"../core/Error.hpp"
, and"Exception.hpp"
) are consistent with your new functionality and error handling logic.Also applies to: 13-13, 15-15, 17-17, 22-22
48-49
: Forward declarations verified
Declaringclass Task;
andclass TaskGraph;
insidenamespace spider::core
is clear and avoids unnecessary includes.
137-137
: Enhanced start method for single tasks
- The compile-time check using
static_assert
ensures the number of parameters matches the number of provided inputs, which is a sound type safety approach.- The loop checking type compatibility between
Inputs...
andParams...
is also valuable, though be mindful that tasks might need more flexible matching if you later introduce typeless logic or dynamic conversions.- The thrown exceptions (
std::invalid_argument
,ConnectionException
) are appropriate for user-facing errors and database connection issues.- The approach of creating a single-task
TaskGraph
on the fly is sensible. Consider clarifying concurrency or ordering semantics in the documentation if tasks can interact with each other.Overall, this function is well-designed for correctness and readability.
Also applies to: 143-182
189-189
: Enhanced start method for task graphs
- The code mirrors the single-task approach well. The static assertions and type checks similarly guard correct usage.
- Throwing
std::invalid_argument
upon mismatch or failure is consistent and helps unify error handling.- Reusing the implementation in
TaskGraphImpl
fosters maintainability.No immediate issues spotted here.
Also applies to: 195-195, 197-228
238-245
: Improved error handling in get_jobs
The new check that throws aConnectionException
if fetching job IDs fails ensures robust error handling. This improves reliability when listing active jobs.src/spider/client/Job.hpp (5)
4-5
: Includes and forward references
The inclusion of<chrono>
,<thread>
,<tuple>
,<fmt/format.h>
, etc. is consistent with the new functionality inJob
. The presence of advanced data structures and message packing/unpacking is suitable.Also applies to: 7-8, 10-11, 13-13, 15-17, 18-20, 21-23, 25-25
28-36
: Enum class for JobStatus
Defining your own enum class is a good design choice to keep statuses strongly typed. TheRunning
,Succeeded
,Failed
, andCancelled
states provide a clear job lifecycle.
92-111
: get_status method
- The logic to detect running tasks, check for failed/cancelled tasks, or conclude success is methodical.
- It would be helpful to extend docstrings clarifying how partial failures are handled (e.g., if one sub-task fails, the entire job is reported as
Failed
).Implementation is otherwise sound.
121-243
: get_result method
- Correct approach for handling multiple outputs and verifying each output type.
- Good usage of
msgpack
to unpack non-Data
values.- The loop logic to pair tasks with outputs neatly addresses multiple outputs using
for_n
.- The layered error throwing (
ConnectionException
) is beneficial for diagnosing user mistakes in job usage.Everything seems consistent with the rest of the design.
259-264
: Constructor usage
This constructor is properly using an initialization list form_id
,m_metadata_storage
, andm_data_storage
, which is both efficient and clear.src/spider/core/TaskGraphImpl.hpp (6)
4-10
: New #includes
The new includes (<tuple>
,<type_traits>
,<vector>
, etc.) reflect the advanced template and container usage for binding tasks.
57-57
: bind method logic
- The step to unify tasks/graphs by linking dependencies is helpful.
- The short-circuit approach (
fail
flag) on mismatch is straightforward. Thoroughly logging or throwing might further aid debugging.- Checking if all inputs are consumed near the end helps detect user mistakes early.
Implementation is quite robust.
Also applies to: 60-60, 68-68, 76-81, 85-85, 92-92
149-179
: task_add_input method
- The method ensures that the number of inputs matches the number of parameters, returning false otherwise.
- The inline type check with
typeid().name()
is correct but be mindful of possible cross-platform differences in type names.- In the future, consider more explicit typed structures or friendly type metadata to reduce reliance on raw type names.
This is a good start for input management.
181-235
: add_inputs method
- Reusing the
m_graph.get_input_tasks()
to fetch relevant tasks is a creative approach.- The final check that all inputs are consumed (
return input_task_index == input_task_ids.size() && task_input_index == 0;
) is prudent.No major concerns.
239-242
: reset_ids and get_graph
These utility methods help maintain clarity and facilitate reusability.
257-257
: add_task_input
Some tasks have multiple outputs, and you set the relevant output position for each input. This is a neat approach to connect tasks.Also applies to: 265-265
src/spider/storage/MysqlStorage.cpp (7)
3-3
: Rearranged includes
The newly included<algorithm>
,<array>
, etc., are consistent with the advanced usage such asstd::array
forcCreateStorage
and container operations.
92-101
: Creating input_tasks and output_tasks tables
- The addition of
input_tasks
andoutput_tasks
is a logical extension for referencing a job’s tasks.- The foreign key constraints help maintain referential integrity.
Implementation is acceptable.
Also applies to: 102-111
🧰 Tools
🪛 cppcheck (2.10-2)
[performance] 95-95: Variable 'id' is assigned in constructor body. Consider performing initialization in initialization list.
(useInitializationList)
Line range hint
202-214
: Expanding cCreateStorage
Appending new table creation strings tocCreateStorage
ensures the schema includesinput_tasks
andoutput_tasks
.
Line range hint
494-606
: add_job method
- The BFS-like approach builds tasks in layers, ensuring the correct insertion order.
- The rollback on
Task graph inconsistent
is a good pattern to maintain data integrity.- Marking head tasks as
ready
is a convenient way to begin job execution.Implementation is thorough.
917-934
: get_job_complete
The logic for checking incomplete states (NOT IN ('success', 'cancel', 'fail')
) is straightforward and consistent with your job lifecycle.
936-978
: get_job_status
- Checking for any running tasks first, then scanning for failures or cancellations, is methodical.
- Returning
JobStatus::Succeeded
if no tasks remain in a running state is logically consistent.Well-structured.
980-1001
: get_job_output_tasks
Selecting fromoutput_tasks
by job ID in ascending position is a helpful approach to preserve intended output ordering.tests/worker/worker-test.hpp (1)
1-2
: New test functions
- These test functions (
sum_test
,error_test
,data_test
,random_fail_test
) look straightforward and are well-suited for verifying the correctness of spider’s worker behaviour.- Consider adding quick docstrings explaining each function’s purpose for more clarity.
Also applies to: 4-5, 7-14
src/spider/core/JobMetadata.hpp (1)
5-5
: Include usage is appropriate
Including<cstdint>
is advisable when using fixed-width integral types likestd::uint8_t
, ensuring robust portability across various platforms.tests/worker/worker-test.cpp (5)
1-1
: Header include is coherent
Including"worker-test.hpp"
locally keeps test definitions neatly scoped.
11-13
: Use caution with non-const reference
Changingspider::TaskContext
from const to non-const allows modifications but also opens risk of unintended side-effects. Ensure concurrency protection if this context is shared.
20-20
: Confirm data ownership
spider::Data<int>& data
implies possible in-place modifications. Ensure no concurrent usage conflicts.
39-42
: Refactor macros
The switch toSPIDER_REGISTER_TASK
suggests a standardised approach. Double-check for side-effects in dynamic task registration.
8-8
: Validate external dependency
You are including"../../src/spider/client/Driver.hpp"
. Confirm the path is correct in all build environments and that you are not introducing relative path fragility.✅ Verification successful
Relative path usage is consistent across test files
The relative path
"../../src/spider/client/Driver.hpp"
is valid and follows the established pattern in the codebase. Multiple test files consistently use the same relative path pattern (../../src/spider/
) to include files from the source directory. This appears to be a deliberate and standardized approach across the test suite, reducing the risk of path fragility.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Checking references to ensure the path is valid across the build environment. fd "Driver.hpp" | xargs ls -lLength of output: 183
Script:
#!/bin/bash # Check the actual content of the test file to verify the include path cat tests/worker/worker-test.cppLength of output: 1380
Script:
#!/bin/bash # Check if other test files follow the same include pattern fd -e cpp -e hpp -t f . tests/ -x grep -l "../../src/spider"Length of output: 598
tests/integration/test_client.py (4)
1-5
: Ensure cross-platform compatibility
Relying onsubprocess
withPath
is fine, but watch for path separators on non-*nix systems.
6-7
: Pytest import
Importingpytest
is good. Ensure the environment you run in uses the correct venv or system packages.
8-13
: Check import from .client
Confirm that the localclient
package is resolvable by Python in all test contexts, including CI/CD.
28-36
: Re-check worker management
Similar to the scheduler, consider capturing the return code of the worker process if it exits prematurely.src/spider/client/type_utils.hpp (4)
4-4
: Usecstddef
carefully
No issues here. Good practice when working with size-related types.
7-7
: Utility includes
<utility>
is used forstd::make_index_sequence
and related. This usage looks fine.
44-48
: Consider use-case
When callingfor_n(func)
, confirm that capturing the indices and function usage is intuitive to future maintainers.
[approve]
59-61
: Appropriate alias
ExtractTemplateParamT
is concise and improves readability when extracting nested types.src/spider/storage/MetadataStorage.hpp (2)
38-38
: Looks good.This addition to retrieve job completion status aligns well with the rest of the interface.
39-39
: Ensure consistent enum usage.Confirm that the returned
JobStatus
enum is used uniformly in the codebase, including within the job orchestration logic.tests/client/client-test.cpp (8)
12-17
: Data and driver locations.The relative paths suggest a specific directory structure. Confirm the folder layout is stable to avoid broken references in future reorganizations.
18-36
: Argument parsing is efficient.This chunk effectively sets up argument parsing. Good usage of descriptive options and storing variables in a map.
38-40
: Constants declared for error codes.These constants are clearly named and simplify code readability. Good approach.
43-44
: Excellent main function structure.Clear initialization of the logger pattern. This sets a consistent debug format across the application.
51-64
: Validate command-line error cases.Your exception handling logic properly addresses missing arguments and type casting issues. This approach will help ensure robust input validation.
66-69
: Instantiation of Driver is straightforward.No issues noted here. The driver is created with the provided storage URL, which is consistent with the rest of the code.
70-110
: Comprehensive job testing.Nice coverage by testing a successful job, a guaranteed fail job, and a random fail job. This approach ensures different edge cases are exercised, helping to uncover hidden issues.
111-113
: Clean exit.Returning 0 explicitly at the end clarifies the application’s successful termination.
src/spider/storage/MysqlStorage.hpp (3)
43-43
: Check potential race conditions with concurrent reads.
get_job_complete
focuses on reading a job’s completion status. Ensure internal locking or transaction consistency is present to handle concurrent job updates.
44-44
: Confirm consistent usage of JobStatus.Ensure the newly fetched
JobStatus
is utilized consistently across other modules to avoid mismatch issues.
45-46
: Validate output tasks retrieval.Ensure the method returns a correct, well-formed vector. Edge cases (e.g., no tasks) should properly reflect the state by returning an empty vector without errors.
test-tasks.yaml (1)
43-43
: Expanded integration test coverage.Including the
client_test
target in the integration section is a good practice. It ensures comprehensive testing of the new client and job functionality.tests/CMakeLists.txt (2)
39-39
: Library dependency update looks good, eh!The replacement of
spider_client_lib
withspider_client
maintains consistency with the library restructuring.
54-61
: Verify the necessity of exposing worker-test.hpp as publicMaking the worker test header public might expose implementation details. Consider whether this visibility is required, as test headers typically don't need public visibility.
✅ Verification successful
Public visibility of worker-test.hpp is appropriate
The public visibility of worker-test.hpp is justified as it contains test utility functions that are shared between worker and client tests. The header exposes clean test interfaces that are intentionally used by client-test.cpp for integration testing purposes.
- The header contains well-defined test functions (sum_test, error_test, data_test, random_fail_test) that appear to be part of the test infrastructure
- These functions are used by both worker-test.cpp and client-test.cpp, demonstrating proper test code reuse
- The exposed interfaces are test-specific and don't leak implementation details
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check if any other targets depend on worker_test's public headers rg -l "include.*worker-test\.hpp" --type cppLength of output: 104
Script:
#!/bin/bash # Check the content of client-test.cpp to understand the dependency rg -C 5 "worker-test\.hpp" tests/client/client-test.cppLength of output: 486
Script:
#!/bin/bash # Check the worker-test.hpp content to understand what's being exposed cat tests/worker/worker-test.hppLength of output: 514
src/spider/CMakeLists.txt (1)
90-90
: Dependency addition looks properThe addition of
spider_client
as a private dependency for the task executor is appropriate and consistent with the library restructuring.
add_library(spider_client) | ||
target_sources(spider_client PRIVATE ${SPIDER_CLIENT_SHARED_SOURCES}) | ||
target_sources(spider_client PUBLIC ${SPIDER_CLIENT_SHARED_HEADERS}) | ||
target_link_libraries( | ||
spider_client_lib | ||
spider_client | ||
PUBLIC | ||
spider_core |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add the missing library alias
According to the summary, an alias spider::spider
should be defined for the spider_client
library. This alias is missing from the implementation.
Add the following after the target_link_libraries block:
+add_library(spider::spider ALIAS spider_client)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
add_library(spider_client) | |
target_sources(spider_client PRIVATE ${SPIDER_CLIENT_SHARED_SOURCES}) | |
target_sources(spider_client PUBLIC ${SPIDER_CLIENT_SHARED_HEADERS}) | |
target_link_libraries( | |
spider_client_lib | |
spider_client | |
PUBLIC | |
spider_core | |
add_library(spider_client) | |
target_sources(spider_client PRIVATE ${SPIDER_CLIENT_SHARED_SOURCES}) | |
target_sources(spider_client PUBLIC ${SPIDER_CLIENT_SHARED_HEADERS}) | |
target_link_libraries( | |
spider_client | |
PUBLIC | |
spider_core | |
add_library(spider::spider ALIAS spider_client) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
tests/CMakeLists.txt (2)
54-61
: Consider the implications of exposing test functionalityMaking worker-test.hpp public suggests test functionality is meant to be reused. While this enables test code sharing, it could create unwanted dependencies on test implementations. Consider whether this test functionality should be part of the public interface.
63-72
: Review test organization and dependenciesThe new client_test executable's dependencies look appropriate for testing client functionality. However:
- The dependency on worker_test creates a coupling between test suites
- Consider organizing test utilities into a separate test_utils library to avoid direct dependencies between test suites
src/spider/CMakeLists.txt (1)
90-90
: Review architectural coupling between task executor and clientThe addition of spider_client as a dependency to spider_task_executor suggests increased coupling between these components. Consider whether this dependency is necessary or if the required functionality could be abstracted through spider_core.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/spider/CMakeLists.txt
(3 hunks)tests/CMakeLists.txt
(3 hunks)
🔇 Additional comments (5)
tests/CMakeLists.txt (2)
39-39
: LGTM: Library dependency update is correctThe change from spider_client_lib to spider_client aligns with the architectural changes in src/spider/CMakeLists.txt.
83-87
: LGTM: Integration test dependencies are properly configuredThe addition of worker_test and client_test as dependencies ensures proper build order for integration testing.
src/spider/CMakeLists.txt (3)
134-136
: LGTM: New headers align with job implementation featureThe addition of Job.hpp and spider.hpp to shared headers is appropriate for the job implementation feature.
148-154
: LGTM: Library definition and dependencies are well-structuredThe replacement of spider_client_lib with spider_client and its dependencies are properly configured. The public dependency on spider_core is appropriate as it's likely required by the public interface.
159-159
: LGTM: Library alias is properly definedThe spider::spider alias for spider_client is now correctly defined.
Description
As title.
Validation performed
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Tests
Refactoring
These updates improve the overall robustness and flexibility of the Spider framework, with a focus on type safety and error handling.