Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add client interface. #16

Merged
merged 88 commits into from
Nov 30, 2024
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
88 commits
Select commit Hold shift + click to select a range
6ffea6e
feat: Add client code structure and interface for Data, Future and ex…
sitaowang1998 Oct 31, 2024
94af15b
feat: Split client into two libraries and add interface
sitaowang1998 Oct 31, 2024
f69523a
fix: Add boost library for spider_client_lib
sitaowang1998 Nov 1, 2024
ccf6cc8
style: Improve code style for data based on pr comments
sitaowang1998 Nov 1, 2024
5e26f58
fix: Add absl as public library for core
sitaowang1998 Nov 1, 2024
020093c
style: Improve code style for client interface based on pr reviea pri…
sitaowang1998 Nov 1, 2024
ee222f0
fix: Try fix clang-tidy find nout found
sitaowang1998 Nov 1, 2024
1b0ccac
docs: Add quick start doc
sitaowang1998 Nov 1, 2024
b3a2e1e
style: Change markdown headings to sentence style and hard wrap markd…
sitaowang1998 Nov 3, 2024
ec8f500
docs: Update doc according to pr comments
sitaowang1998 Nov 3, 2024
5dc12cb
docs: Remove the worker note section and put the content in run task …
sitaowang1998 Nov 3, 2024
0d59c62
docs: Return a Job instead of Future for run and support user to pass…
sitaowang1998 Nov 5, 2024
4cd6233
Merge branch 'main' into interface
sitaowang1998 Nov 6, 2024
fec5e73
Change future to job
sitaowang1998 Nov 14, 2024
a5e799b
Change task to context
sitaowang1998 Nov 14, 2024
1b15b5d
Remove TaskGraph::run to simplify interface
sitaowang1998 Nov 16, 2024
98104f1
Add separate key-value store interface
sitaowang1998 Nov 16, 2024
cb369fd
Edit some docstrings.
kirkrodrigues Nov 19, 2024
b4a6f36
Fix include guard
sitaowang1998 Nov 19, 2024
f3de2ca
Merge branch 'main' into interface
sitaowang1998 Nov 19, 2024
70547ae
Add serialzable concept
sitaowang1998 Nov 20, 2024
525311c
Merge remote-tracking branch 'origin/interface' into interface
sitaowang1998 Nov 20, 2024
c776376
Fix clang-tidy
sitaowang1998 Nov 20, 2024
49c571e
Fix typo
sitaowang1998 Nov 20, 2024
43d7e16
Fix clang-tidy
sitaowang1998 Nov 20, 2024
5e8e1dd
Remove macOS build
sitaowang1998 Nov 20, 2024
fe23c3c
Change driver constructor
sitaowang1998 Nov 20, 2024
064edd8
Add exception to interface
sitaowang1998 Nov 20, 2024
e7c5240
Change run to start
sitaowang1998 Nov 20, 2024
b0b414e
Add get jobs to driver
sitaowang1998 Nov 20, 2024
97761e1
Add get jobs in context
sitaowang1998 Nov 20, 2024
91d36f2
Update doc with new interface
sitaowang1998 Nov 20, 2024
84c2f41
Fix clang-tidy
sitaowang1998 Nov 20, 2024
f7ab013
Refactor Context.hpp.
kirkrodrigues Nov 21, 2024
302e68a
style: Fix header guard name
sitaowang1998 Nov 21, 2024
a2dc8bc
style: Rename Context to TaskContext
sitaowang1998 Nov 21, 2024
046e740
style: Add missing class docstring
sitaowang1998 Nov 21, 2024
92d6489
feat: Add concepts for task argument
sitaowang1998 Nov 21, 2024
d27f042
Refactor Context.hpp.
kirkrodrigues Nov 21, 2024
2b49746
feat: Change the arguments from Serializable to TaskArgument
sitaowang1998 Nov 21, 2024
c4ee015
style: Update docstring for Driver
sitaowang1998 Nov 21, 2024
0cc231b
style: Update docstring for Data and Job
sitaowang1998 Nov 21, 2024
069a7a7
style: Update clang-format for library headers
sitaowang1998 Nov 21, 2024
9069030
style: Clean up unused headers and Change TaskGraph template
sitaowang1998 Nov 21, 2024
0fe063f
doc: Update quick start guide
sitaowang1998 Nov 21, 2024
e089107
style: Fix clang-tidy
sitaowang1998 Nov 21, 2024
159aa08
Rename TaskArgument to TaskIo
sitaowang1998 Nov 21, 2024
8e035b7
feat: Add Runnable concept and TaskFunction type
sitaowang1998 Nov 21, 2024
9d90c37
refactor: Rename insert_kv and get_kv to kv_store_insert and kv_store…
sitaowang1998 Nov 21, 2024
8de26ec
fix: Fix the template instantiation of TaskFunction
sitaowang1998 Nov 22, 2024
b9bfcdd
style: Fix clang-tidy
sitaowang1998 Nov 22, 2024
351c8b5
docs: Move cluster setup after run task and change all mentions of da…
sitaowang1998 Nov 22, 2024
7dae8d4
docs: Add task graph to group task example
sitaowang1998 Nov 22, 2024
5d15b22
Refactor Data.hpp
kirkrodrigues Nov 25, 2024
1588b51
Refactor Driver.hpp
kirkrodrigues Nov 25, 2024
1e1e41d
Refactor Exception.hpp
kirkrodrigues Nov 25, 2024
c0a6e6f
Refactor Job.hpp.
kirkrodrigues Nov 25, 2024
99c5935
Refactor TaskContext.hpp.
kirkrodrigues Nov 25, 2024
80f314a
Refactor TaskGraph.hpp.
kirkrodrigues Nov 25, 2024
7796e15
Refactor Concepts.hpp.
kirkrodrigues Nov 25, 2024
036dd51
Add absl to libraray list and sort library list
sitaowang1998 Nov 26, 2024
8affa10
Rename template types to satisfy clang-tidy
sitaowang1998 Nov 26, 2024
77d2458
Change set_cleanup to set_cleanup_func
sitaowang1998 Nov 26, 2024
026b6f1
Change set_cleanup to set_cleanup_func
sitaowang1998 Nov 26, 2024
448f693
Change job state enum name and error docstring
sitaowang1998 Nov 26, 2024
769a708
Restruct all the concepts
sitaowang1998 Nov 26, 2024
3555d2e
Add todo for task registration with timeout
sitaowang1998 Nov 26, 2024
a0c5b3a
Fix circular dependency
sitaowang1998 Nov 26, 2024
e185bf3
Restruct quick start guide
sitaowang1998 Nov 26, 2024
f0d79e9
Fix clang-tidy
sitaowang1998 Nov 26, 2024
f444772
Remove all cpp files in client
sitaowang1998 Nov 26, 2024
530da78
Move driver id section after task restart
sitaowang1998 Nov 26, 2024
db21fe7
Add Job::cancel
sitaowang1998 Nov 28, 2024
165eb84
Fix typo
sitaowang1998 Nov 29, 2024
06de774
Fix clean up function signature
sitaowang1998 Nov 29, 2024
c7a07b1
Fix set_locality argument in docstring example
sitaowang1998 Nov 29, 2024
f8c623a
Add void return type for kv_store_insert
sitaowang1998 Nov 29, 2024
488eaa3
Add noreturn and void return type for TaskContext::abort
sitaowang1998 Nov 29, 2024
f0729d9
Fix some header guards.
kirkrodrigues Nov 29, 2024
4d2aa6c
Edit some docstrings and comments.
kirkrodrigues Nov 29, 2024
88ed638
Fix typo in Data docstring example.
kirkrodrigues Nov 29, 2024
bd55552
Add exception in docstring
sitaowang1998 Nov 29, 2024
9897995
Remove pImpl in interface
sitaowang1998 Nov 29, 2024
73eabef
Fix clang-tidy
sitaowang1998 Nov 29, 2024
85d2475
Fix exception what
sitaowang1998 Nov 29, 2024
61be939
Fix docstring job state name
sitaowang1998 Nov 29, 2024
5bffeee
Refactor exceptions.
kirkrodrigues Nov 29, 2024
22f370d
Remove quick start guide
sitaowang1998 Nov 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,12 @@ project(
set(CMAKE_CXX_STANDARD 20)
set(CMAKE_CXX_STANDARD_REQUIRED ON)

# AppleClang complains about file has no symbol and abort the build.
if(APPLE)
set(CMAKE_CXX_ARCHIVE_CREATE "<CMAKE_AR> Scr <TARGET> <LINK_FLAGS> <OBJECTS>")
set(CMAKE_CXX_ARCHIVE_FINISH "<CMAKE_RANLIB> -no_warning_for_no_symbols -c <TARGET>")
endif()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this anymore considering we've dropped support for building on macOS?


# Enable exporting compile commands
set(CMAKE_EXPORT_COMPILE_COMMANDS
ON
Expand Down
204 changes: 204 additions & 0 deletions docs/QuickStart.md
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our naming convention is to use kebab-case for markdown files except for the main readme in a repo which is called README.md. So this file should be called quick-start.md. I've added this to our internal guidelines.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add a linter to do this later, but treat Markdown as code and wrap lines to 100 characters. This makes it easier to review and read (not everyone has a good Markdown renderer on their local machine).

Original file line number Diff line number Diff line change
@@ -0,0 +1,204 @@
# Spider Quick Start Guide
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our convention is to use sentence case for headings rather than capitlizing every word. So this should be written as # Spider quick start guide.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed all titles, but I treat Spider as a special terminology and keep it capitalized.


## Set Up Spider
To get started, first start a database supported by Spider, e.g. MySql. Second, start a scheduler and connect it to the database by running `spider start --scheduler --db <db_url> --port <scheduler_port>`. Third, start some workers and connect them to the database by running `spider start --worker --db <db_url>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to use lists rather than a block of text. Lists are faster to read than blocks of text.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have a caveat for starting the worker below. You should note that here otherwise someone will try to start the worker, it won't work, and they will give up.


## Start a Client
Client first creates a Spider client driver and connects it to the database. Spider automatically cleans up the resource in driver's destructor, but you can close the driver to release the resource early.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have an intro section where we describe the actors and architecture in the system (client, scheduler, worker, database, etc.). It doesn't need to be too detailed (that can be in another doc), but it should provide enough context for the user to visualize what we're talking about.

```c++
#include <spider/Spider.hpp>

auto main(int argc, char **argv) -> int {
spider::Driver driver{};
driver.connect("db_url");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason we can't or don't want to do this in the constructor?


driver.close();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to allow this if possible. If we allow the user to close the driver before the destructor is called, then most if not all public methods of the driver class need to have a check at the beginning: if (m_closed) { return error; }. This adds a lot of unnecessary code to the class and is error prone if a developer forgets to check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, let's not try to make the library "too" easy to use. Our goal right now is to get a robust end-to-end implementation. Since we will be the first users of this framework, it's okay if we need to write a bit of ugly code to get it to work, as long as it's robust. Through the process of using the library, I'm sure we will figure out what are the important features that we should try to simplify.

}
```

## Create a Task
In Spider, a task is a non-member function that takes the first argument a `spider::Context` object. It can then take any number of arguments of POD type.

Task can return any POD type. If a task needs to return more than one result, uses `std::tuple`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is std::tuple a POD type?

Copy link
Collaborator Author

@sitaowang1998 sitaowang1998 Nov 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::tuple for more than one output is special case. Spider handles it differently and only requires all element to be POD.


The `Context` object represents the context of a running task. It provides methods to get the task metadata information like task id. It also supports the creating task inside a task. We will cover this later.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's clearer if you defer this point until later. In this section, you can just say "the first argument needs to be a Context. Contexts will be described in section XX."

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I should add a section about context. Context provides functions to get task id and run new tasks. The later is covered in "Run task inside task` section. The former is only useful to get a random number or distinguish two instances to de-duplicate their output.

```c++
auto sum(spider::Context &context, int x, int y) -> int {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I can infer these are tasks, it would still be helpful to have some description of these examples (could be as simple as "Examples of task methods."). This is especially true in your later examples which are more complicated.

return x + y;
}

auto sort(spider::Context &context, int x, int y) -> std::tuple<int, int> {
if (x >= y) {
return { x, y };
}
return { y, x };
}
```

## Run a Task
Spider enables user to run a task on the cluster. First register the functions statically so it is known by Spider. Simply call `Driver::run` and provide the arguments of the task. `Driver::run` returns a `spider::Future` object, which represents the result that will be available in the future. You can call `Future::ready` to check if the value in future is available yet. You can use `Future::get` to block and get the value once it is available.
```c++
spider::register_task(sum);
spider::register_task(sort);

auto main(int argc, char **argv) -> int {
// driver initialization skipped
spider::Future<int> sum_future = driver.run(sum, 2);
assert(4 == sum_future.get());

spider::Future<std::tuple<int, int>> sort_future = driver.run(4, 3);
assert(std::tuple{3, 4} == sort_future.get());
}
```

## Group Tasks Together
In real world, running a single task is too simple to be useful. Spider lets you bind outputs of tasks as inputs of another task, similar to `std::bind`. Binding the tasks together forms a dependencies among tasks, which is represented by `spider::TaskGraph`. `TaskGraph` can be further bound into more complicated `TaskGraph` by serving as inputs for another task. You can run the task using `Driver::run` in the same way as running a single task.
```c++
auto square(spider::Context& context, int x) -> int {
return x * x;
}

auto square_root(spider::Context& context, int x) -> int {
return sqrt(x);
}
// task registration skipped
auto main(int argc, char **argv) -> auto {
// driver initialization skipped
spider::TaskGraph<int(int, int)> sum_of_square = spider::bind(sum, square, square);
spider::TaskGraph<int(int, int)> rss = spider::bind(square_root, sum_of_square);
spider::Future<int> future = driver::run(rss, 3, 4);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand how arguments are passed between tasks. You mention that this interface is similar to std::bind, but if I were to replace every call to spider::bind with std::bind, I don't think this example would work. So from both a design perspective and a usage perspective, we need to clear this up.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the doc with more explanation, but I don't think my description is clear enough. I need some advice on this.

assert(5 == future.get());
}
```

## Run Task inside Task
Static task graph is enough to solve a lot of real work problems, but dynamically add tasks on-the-fly could become handy. As mentioned before, spider allows you to add another task as child of the running task by calling `Context::add_child`.

```c++
auto gcd(spider::Conect& context, int x, int y) -> std::tuple<int, int> {
if (x == y) {
std::cout << "gdc is: " << x << std::endl;
return { x, y };
}
if (x > y) {
context.add_child(gcd);
return { x % y, y };
}
context.add_child(gcd);
return { x, y % x };
}
```

However, it is impossible to get the return value of the task graph from a client. We have a solution by sharing data using key-value store, which will be discussed later. Another solution is to run task or task graph inside a task and wait for its value, just like a client. This solution is closer to the conventional function call semantic.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, it is impossible to get the return value of the task graph from a client.

This sentence doesn't make sense since the previous section has an example where the client is retrieving the result of a task graph.

Reading the later parts of this guide, I guess you mean that you can't return the value of a dynamically created task to the client (which is intuitive since the interface doesn't provide a mechanism to do so).


```c++
auto gcd(spider:Context& context, int x, int y) -> int {
if (x < y) {
std::swap(x, y);
}
while (x != y) {
spider::Future<std:tuple<int, int>> future = context.run(gcd_impl, x, y);
x = future.get().get().get<0>();
y = future.get().get().get<1>();
}
return x;
}

auto gcd_impl(spider::Context& context, int x, int y) -> std::tuple<int, int> {
return { x, x % y};
}
```

## Data on External Storage
Often simple POD data are not enough. However, passing large amount of data around is expensive. Usually these data is stored on disk or a distributed storage system. For example, an ETL workload usually reads in data from an external storage, writes temporary data on an external storage, and writes final data into an external storage.

Spider lets user pass the metadata of these data around in `spider::Data` objects. `Data` stores the value of the metadata information of external data, and provides crucial information to Spider for correct and efficient scheduling and failure recovery. `Data` stores a list of nodes which has locality of the external data, and user can specify if locality is a hard requirement, i.e. task can only run on the nodes in locality list. `Data` can include a `cleanup`function, which will run when the `Data` object is no longer reference by any task and client. `Data` has a persist flag to represent that external data is persisted and do not need to be cleaned up.

```c++
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example is complicated and should definitely have a description. To not have a description means that we're slowing down the user since they first need to infer what the goal of the example's code is, then they can focus on the details of using the framework.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example is also a good reason why our internal guidelines say that we should organize things from public to private. If you apply that thinking here, we would put main first and then the task methods. That is the order that a user should read the methods (the opposite order requires the user to remember a lot of context before they get to main).

struct HdfsFile {
std::string url;
};

auto filter(spider::Data<Hdfsfile> input) -> spider::Data<HdfsFile> {
std::string const output_path = std::format("/path/%s", context.task_id());
std::string const input_path = input.get().url;
// Create HdfsFile Data first in case task fails and Spider can clean up the data.
spider::Data<HdfsFile> output = spider::Data<HdfsFile>::Builder()
.cleanup([](HdfsFile const& file) { delete_hdfs_file(file); })
.build(HdfsFile { output_path });
auto file = hdfs_create(output_path);
std::vector<std::string> nodes = hdfs_get_nodes(file);
output.set_locality(nodes, false); // not hard locality

run_filter(input_path, file);

return output;
}

auto map(spider::Data<HdfsFile> input) -> spider::Data<HdfsFile> {
std::string const output_path = "/path/to/output";
std::string const input_path = input.get().url;

spider::Data<HdfsFile> output = spider::Data<HdfsFile>::Builder()
.cleanup([](HdfaFile const& file) { delete_hdfs_file(file); })
.build(HdfsFile { output_path });

run_map(input_path, output_path);

// Now that map finishes, the file is persisted on Hdfs as output of job.
output.mark_persist();
return output;
}

auto main(int argc, char** argv) -> int {
spider::Data<HdfsFile> input = spider::Data<HdfsFile>::Builder()
.mark_persist(true)
.build(HdfsFile { "/path/to/input" });
spider::Future<spider::Data<HdfsFile>> future = spider::run(
spider::bind(map, filter),
input);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

input is not a POD, so that violates the constraint mentioned above about task arguments being POD.

std::string const output_path = future.get().get().url;
std::cout << "Result is stored in " << output_path << std::endl;
}
```

## Data as Key-Value Store
`Data` can also be used a a key-value store. User can specify a key when creating the data, and the data can be accessed later by its key. Notice that a task can only access the `Data` created by itself or passed to it. Client can access any data with the key.
Using the key value store, we can solve the dynamic task result problem.

```c++
auto gcd(spider::Context& context, int x, int y, std::string key)
-> std::tuple<int, int, std::string> {
if (x == y) {
spider::Data<int>.Builder()
.set_key(key)
.build(x);
return { x, y, key };
}
if (x > y) {
context.add_child(gcd);
return { x % y, y, key };
}
context.add_child(gcd);
return { x, y % x, key };
}

auto main(int argc, char** argv) -> int {
std::string const key = "random_key";
driver.run(gcd, 48, 18, key);
while (!driver.get_data_by_key(key)) {
int value = driver.get_data_by_key(key).get();
std::cout << "gcd of " << x << " and " << y << " is " << value << std::endl;
}
}
```

## Straggler Mitigation
`Driver::register_task` can take a second argument for timeout milliseconds. If a task executes for longer than the specified timeout, Spider spawns another task instance running the same function. The task that finishes first wins. Other running task instances are cancelled, and associated data is cleaned up.

The new task has a different task id, and it is the responsibility of the user to avoid any data race and deduplicate the output if necessary.

## Note on Worker Setup
The setup section said that we can start a worker by running `spider start --worker --db <db_url>`. This is oversimplified. The worker has to know the function it will run.

When user compiles the client code, an executable and a library are generated. The executable executes the client code as expected. The library contains all the functions registered by user. Worker needs to run with a copy of this library. The actual commands to start a worker is `spider start --worker --db <db_url> --libs [client_libraries]`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense, so in the guide above, wouldn't it be easier to just say that we need to create a task library with all the tasks the user wants to run? Then we can reference those tasks in the client code. Right now, the guide says we need to register the tasks by calling spider::register_task which sounds pretty opaque to me.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User is responsible to create a task library and the client executable. Actually a good practice for user is to put the spider::register_task with the tasks' declaration instead of main, so maybe I should task registration into "Create a task section?

64 changes: 59 additions & 5 deletions src/spider/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,15 +1,16 @@
# set variable as CACHE INTERNAL to access it from other scope
set(SPIDER_CORE_SOURCES
set(SPIDER_CORE_SOURCES storage/MysqlStorage.cpp CACHE INTERNAL "spider core source files")

set(SPIDER_CORE_HEADERS
core/Error.hpp
core/Data.hpp
core/Task.hpp
core/TaskGraph.hpp
storage/MetadataStorage.hpp
storage/DataStorage.hpp
storage/MysqlStorage.cpp
storage/MysqlStorage.hpp
CACHE INTERNAL
"spider core source files"
"spider core header files"
)

if(SPIDER_USE_STATIC_LIBS)
Expand All @@ -18,11 +19,64 @@ else()
add_library(spider_core SHARED)
endif()
target_sources(spider_core PRIVATE ${SPIDER_CORE_SOURCES})
target_link_libraries(spider_core PUBLIC Boost::boost PRIVATE absl::flat_hash_map)
target_sources(spider_core PUBLIC ${SPIDER_CORE_HEADERS})
target_link_libraries(
spider_core
PUBLIC
Boost::boost
absl::flat_hash_map
)

set(SPIDER_CLIENT_SHARED_SOURCES
client/Data.cpp
client/Task.cpp
client/TaskGraph.cpp
CACHE INTERNAL
"spider client shared source files"
)

set(SPIDER_CLIENT_SHARED_HEADERS
client/Data.hpp
client/Task.hpp
client/TaskGraph.hpp
CACHE INTERNAL
"spider client shared header files"
)

add_library(spider_client_lib)
target_sources(spider_client_lib PRIVATE ${SPIDER_CLIENT_SHARED_SOURCES})
target_sources(spider_client_lib PUBLIC ${SPIDER_CLIENT_SHARED_HEADERS})
target_link_libraries(
spider_client_lib
PUBLIC
Boost::boost
absl::flat_hash_map
)

set(SPIDER_CLIENT_SOURCES client/Future.cpp CACHE INTERNAL "spider client source files")

set(SPIDER_CLIENT_HEADERS
client/Spider.hpp
client/Future.hpp
CACHE INTERNAL
"spider client header files"
)

add_library(spider_client)
target_sources(spider_client PRIVATE ${SPIDER_CLIENT_SOURCES})
target_sources(spider_client PUBLIC ${SPIDER_CLIENT_HEADERS})
target_link_libraries(spider_client PRIVATE spider_core)
target_link_libraries(spider_client PUBLIC spider_client_lib)
add_library(spider::spider ALIAS spider_client)

set(SPIDER_WORKER_SOURCES worker/worker.cpp CACHE INTERNAL "spider worker source files")

add_executable(spider_worker)
target_sources(spider_worker PRIVATE ${SPIDER_WORKER_SOURCES})
target_link_libraries(spider_worker PRIVATE spider_core)
target_link_libraries(
spider_worker
PRIVATE
spider_core
spider_client_lib
)
add_executable(spider::worker ALIAS spider_worker)
40 changes: 40 additions & 0 deletions src/spider/client/Data.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
#include "Data.hpp"

#include <functional>
#include <string>
#include <vector>

namespace spider {

class DataImpl {};

template <class T>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should document template parameters as well.

auto Data<T>::get() -> T {
return T();
}

template <class T>
void Data<T>::set_locality(std::vector<std::string> const& /*nodes*/, bool /*hard*/) {}

template <class T>
auto Data<T>::Builder::set_key(std::string const& /*key*/) -> Data<T>::Builder& {
return this;
}

template <class T>
auto Data<T>::Builder::set_locality(std::vector<std::string> const& /*nodes*/, bool /*hard*/)
-> Data<T>::Builder& {
return this;
}

template <class T>
auto Data<T>::Builder::set_cleanup(std::function<T const&()> const& /*f*/) -> Data<T>::Builder& {
return this;
}

template <class T>
auto Data<T>::Builder::build(T const& /*t*/) -> Data<T> {
return Data<T>();
}

} // namespace spider
Loading