Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add client interface. #16

Merged
merged 88 commits into from
Nov 30, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
88 commits
Select commit Hold shift + click to select a range
6ffea6e
feat: Add client code structure and interface for Data, Future and ex…
sitaowang1998 Oct 31, 2024
94af15b
feat: Split client into two libraries and add interface
sitaowang1998 Oct 31, 2024
f69523a
fix: Add boost library for spider_client_lib
sitaowang1998 Nov 1, 2024
ccf6cc8
style: Improve code style for data based on pr comments
sitaowang1998 Nov 1, 2024
5e26f58
fix: Add absl as public library for core
sitaowang1998 Nov 1, 2024
020093c
style: Improve code style for client interface based on pr reviea pri…
sitaowang1998 Nov 1, 2024
ee222f0
fix: Try fix clang-tidy find nout found
sitaowang1998 Nov 1, 2024
1b0ccac
docs: Add quick start doc
sitaowang1998 Nov 1, 2024
b3a2e1e
style: Change markdown headings to sentence style and hard wrap markd…
sitaowang1998 Nov 3, 2024
ec8f500
docs: Update doc according to pr comments
sitaowang1998 Nov 3, 2024
5dc12cb
docs: Remove the worker note section and put the content in run task …
sitaowang1998 Nov 3, 2024
0d59c62
docs: Return a Job instead of Future for run and support user to pass…
sitaowang1998 Nov 5, 2024
4cd6233
Merge branch 'main' into interface
sitaowang1998 Nov 6, 2024
fec5e73
Change future to job
sitaowang1998 Nov 14, 2024
a5e799b
Change task to context
sitaowang1998 Nov 14, 2024
1b15b5d
Remove TaskGraph::run to simplify interface
sitaowang1998 Nov 16, 2024
98104f1
Add separate key-value store interface
sitaowang1998 Nov 16, 2024
cb369fd
Edit some docstrings.
kirkrodrigues Nov 19, 2024
b4a6f36
Fix include guard
sitaowang1998 Nov 19, 2024
f3de2ca
Merge branch 'main' into interface
sitaowang1998 Nov 19, 2024
70547ae
Add serialzable concept
sitaowang1998 Nov 20, 2024
525311c
Merge remote-tracking branch 'origin/interface' into interface
sitaowang1998 Nov 20, 2024
c776376
Fix clang-tidy
sitaowang1998 Nov 20, 2024
49c571e
Fix typo
sitaowang1998 Nov 20, 2024
43d7e16
Fix clang-tidy
sitaowang1998 Nov 20, 2024
5e8e1dd
Remove macOS build
sitaowang1998 Nov 20, 2024
fe23c3c
Change driver constructor
sitaowang1998 Nov 20, 2024
064edd8
Add exception to interface
sitaowang1998 Nov 20, 2024
e7c5240
Change run to start
sitaowang1998 Nov 20, 2024
b0b414e
Add get jobs to driver
sitaowang1998 Nov 20, 2024
97761e1
Add get jobs in context
sitaowang1998 Nov 20, 2024
91d36f2
Update doc with new interface
sitaowang1998 Nov 20, 2024
84c2f41
Fix clang-tidy
sitaowang1998 Nov 20, 2024
f7ab013
Refactor Context.hpp.
kirkrodrigues Nov 21, 2024
302e68a
style: Fix header guard name
sitaowang1998 Nov 21, 2024
a2dc8bc
style: Rename Context to TaskContext
sitaowang1998 Nov 21, 2024
046e740
style: Add missing class docstring
sitaowang1998 Nov 21, 2024
92d6489
feat: Add concepts for task argument
sitaowang1998 Nov 21, 2024
d27f042
Refactor Context.hpp.
kirkrodrigues Nov 21, 2024
2b49746
feat: Change the arguments from Serializable to TaskArgument
sitaowang1998 Nov 21, 2024
c4ee015
style: Update docstring for Driver
sitaowang1998 Nov 21, 2024
0cc231b
style: Update docstring for Data and Job
sitaowang1998 Nov 21, 2024
069a7a7
style: Update clang-format for library headers
sitaowang1998 Nov 21, 2024
9069030
style: Clean up unused headers and Change TaskGraph template
sitaowang1998 Nov 21, 2024
0fe063f
doc: Update quick start guide
sitaowang1998 Nov 21, 2024
e089107
style: Fix clang-tidy
sitaowang1998 Nov 21, 2024
159aa08
Rename TaskArgument to TaskIo
sitaowang1998 Nov 21, 2024
8e035b7
feat: Add Runnable concept and TaskFunction type
sitaowang1998 Nov 21, 2024
9d90c37
refactor: Rename insert_kv and get_kv to kv_store_insert and kv_store…
sitaowang1998 Nov 21, 2024
8de26ec
fix: Fix the template instantiation of TaskFunction
sitaowang1998 Nov 22, 2024
b9bfcdd
style: Fix clang-tidy
sitaowang1998 Nov 22, 2024
351c8b5
docs: Move cluster setup after run task and change all mentions of da…
sitaowang1998 Nov 22, 2024
7dae8d4
docs: Add task graph to group task example
sitaowang1998 Nov 22, 2024
5d15b22
Refactor Data.hpp
kirkrodrigues Nov 25, 2024
1588b51
Refactor Driver.hpp
kirkrodrigues Nov 25, 2024
1e1e41d
Refactor Exception.hpp
kirkrodrigues Nov 25, 2024
c0a6e6f
Refactor Job.hpp.
kirkrodrigues Nov 25, 2024
99c5935
Refactor TaskContext.hpp.
kirkrodrigues Nov 25, 2024
80f314a
Refactor TaskGraph.hpp.
kirkrodrigues Nov 25, 2024
7796e15
Refactor Concepts.hpp.
kirkrodrigues Nov 25, 2024
036dd51
Add absl to libraray list and sort library list
sitaowang1998 Nov 26, 2024
8affa10
Rename template types to satisfy clang-tidy
sitaowang1998 Nov 26, 2024
77d2458
Change set_cleanup to set_cleanup_func
sitaowang1998 Nov 26, 2024
026b6f1
Change set_cleanup to set_cleanup_func
sitaowang1998 Nov 26, 2024
448f693
Change job state enum name and error docstring
sitaowang1998 Nov 26, 2024
769a708
Restruct all the concepts
sitaowang1998 Nov 26, 2024
3555d2e
Add todo for task registration with timeout
sitaowang1998 Nov 26, 2024
a0c5b3a
Fix circular dependency
sitaowang1998 Nov 26, 2024
e185bf3
Restruct quick start guide
sitaowang1998 Nov 26, 2024
f0d79e9
Fix clang-tidy
sitaowang1998 Nov 26, 2024
f444772
Remove all cpp files in client
sitaowang1998 Nov 26, 2024
530da78
Move driver id section after task restart
sitaowang1998 Nov 26, 2024
db21fe7
Add Job::cancel
sitaowang1998 Nov 28, 2024
165eb84
Fix typo
sitaowang1998 Nov 29, 2024
06de774
Fix clean up function signature
sitaowang1998 Nov 29, 2024
c7a07b1
Fix set_locality argument in docstring example
sitaowang1998 Nov 29, 2024
f8c623a
Add void return type for kv_store_insert
sitaowang1998 Nov 29, 2024
488eaa3
Add noreturn and void return type for TaskContext::abort
sitaowang1998 Nov 29, 2024
f0729d9
Fix some header guards.
kirkrodrigues Nov 29, 2024
4d2aa6c
Edit some docstrings and comments.
kirkrodrigues Nov 29, 2024
88ed638
Fix typo in Data docstring example.
kirkrodrigues Nov 29, 2024
bd55552
Add exception in docstring
sitaowang1998 Nov 29, 2024
9897995
Remove pImpl in interface
sitaowang1998 Nov 29, 2024
73eabef
Fix clang-tidy
sitaowang1998 Nov 29, 2024
85d2475
Fix exception what
sitaowang1998 Nov 29, 2024
61be939
Fix docstring job state name
sitaowang1998 Nov 29, 2024
5bffeee
Refactor exceptions.
kirkrodrigues Nov 29, 2024
22f370d
Remove quick start guide
sitaowang1998 Nov 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,10 @@ project(
set(CMAKE_CXX_STANDARD 20)
set(CMAKE_CXX_STANDARD_REQUIRED ON)

# AppleClang complains about file has no symbol and abort the build.
set(CMAKE_CXX_ARCHIVE_CREATE "<CMAKE_AR> Scr <TARGET> <LINK_FLAGS> <OBJECTS>")
set(CMAKE_CXX_ARCHIVE_FINISH "<CMAKE_RANLIB> -no_warning_for_no_symbols -c <TARGET>")

# Enable exporting compile commands
set(CMAKE_EXPORT_COMPILE_COMMANDS
ON
Expand Down
55 changes: 51 additions & 4 deletions src/spider/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,15 +1,16 @@
# set variable as CACHE INTERNAL to access it from other scope
set(SPIDER_CORE_SOURCES
set(SPIDER_CORE_SOURCES storage/MysqlStorage.cpp CACHE INTERNAL "spider core source files")

set(SPIDER_CORE_HEADERS
core/Error.hpp
core/Data.hpp
core/Task.hpp
core/TaskGraph.hpp
storage/MetadataStorage.hpp
storage/DataStorage.hpp
storage/MysqlStorage.cpp
storage/MysqlStorage.hpp
CACHE INTERNAL
"spider core source files"
"spider core header files"
)

if(SPIDER_USE_STATIC_LIBS)
Expand All @@ -18,11 +19,57 @@ else()
add_library(spider_core SHARED)
endif()
target_sources(spider_core PRIVATE ${SPIDER_CORE_SOURCES})
target_sources(spider_core PUBLIC ${SPIDER_CORE_HEADERS})
target_link_libraries(spider_core PUBLIC Boost::boost PRIVATE absl::flat_hash_map)

set(SPIDER_CLIENT_SHARED_SOURCES
client/Data.cpp
client/Task.cpp
CACHE INTERNAL
"spider client shared source files"
)

set(SPIDER_CLIENT_SHARED_HEADERS
client/Data.hpp
client/Task.hpp
CACHE INTERNAL
"spider client shared header files"
)

add_library(spider_client_lib)
target_sources(spider_client_lib PRIVATE ${SPIDER_CLIENT_SHARED_SOURCES})
target_sources(spider_client_lib PUBLIC ${SPIDER_CLIENT_SHARED_HEADERS})

set(SPIDER_CLIENT_SOURCES
client/TaskGraph.cpp
client/Future.cpp
CACHE INTERNAL
"spider client source files"
)

set(SPIDER_CLIENT_HEADERS
client/Spider.hpp
client/TaskGraph.hpp
client/Future.hpp
CACHE INTERNAL
"spider client header files"
)

add_library(spider_client)
target_sources(spider_client PRIVATE ${SPIDER_CLIENT_SOURCES})
target_sources(spider_client PUBLIC ${SPIDER_CLIENT_HEADERS})
target_link_libraries(spider_client PRIVATE spider_core)
target_link_libraries(spider_client PUBLIC spider_client_lib)
add_library(spider::spider ALIAS spider_client)

set(SPIDER_WORKER_SOURCES worker/worker.cpp CACHE INTERNAL "spider worker source files")

add_executable(spider_worker)
target_sources(spider_worker PRIVATE ${SPIDER_WORKER_SOURCES})
target_link_libraries(spider_worker PRIVATE spider_core)
target_link_libraries(
spider_worker
PRIVATE
spider_core
spider_client_lib
)
add_executable(spider::worker ALIAS spider_worker)
40 changes: 40 additions & 0 deletions src/spider/client/Data.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
#include "Data.hpp"

#include <functional>
#include <string>
#include <vector>

namespace spider {

class DataImpl {};

template <class T>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should document template parameters as well.

auto Data<T>::get() -> T {
return T();
}

template <class T>
void Data<T>::set_locality(std::vector<std::string> const& /*nodes*/, bool /*hard*/) {}

template <class T>
auto Data<T>::Builder::key(std::string const& /*key*/) -> Data<T>::Builder& {
return this;
}

template <class T>
auto Data<T>::Builder::locality(std::vector<std::string> const& /*nodes*/, bool /*hard*/)
-> Data<T>::Builder& {
return this;
}

template <class T>
auto Data<T>::Builder::cleanup(std::function<T const&()> const& /*f*/) -> Data<T>::Builder& {
return this;
}

template <class T>
auto Data<T>::Builder::build(T const& /*t*/) -> Data<T> {
return Data<T>();
}

} // namespace spider
81 changes: 81 additions & 0 deletions src/spider/client/Data.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
#ifndef SPIDER_CLIENT_DATA_HPP
#define SPIDER_CLIENT_DATA_HPP

#include <functional>
#include <memory>
#include <string>
#include <vector>

namespace spider {

kirkrodrigues marked this conversation as resolved.
Show resolved Hide resolved
class DataImpl;

template <class T>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like I mentioned in Spider.hpp, we can restrict the types using C++20 concepts.

class Data {
kirkrodrigues marked this conversation as resolved.
Show resolved Hide resolved
private:
std::unique_ptr<DataImpl> m_impl;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Private declarations should come after public.


public:
/**
* Gets the values stored in Data.
* @return value stored in Data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Gets the values stored in Data.
* @return value stored in Data.
* @return The stored value.

As our internal guidelines mention, for most getters, we can omit the docstring description and only describe the return value.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the special case. This is not an ordinary getter. It accesses data storage under the hood and could throw exception, or return error once we get error included in the interface.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since neither the exception or error are currently in the code, let's write the docstring according to what exists (not what will exist later). When we eventually add the error information, we can update the docstring appropriately.

*/
auto get() -> T;
/**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto get() -> T;
/**
auto get() -> T;
/**

Add an empty line between methods.

* Indicates that the data is persisted and should not be rollbacked
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rollbacked is not a word. Did your IDE report this as an error?

* on failure recovery.
*/
// Not implemented in milestone 1
// void mark_persist();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my experience, it's a bad idea to add code that we'll use in the future since that day may never come. If that day comes, we will remember what to do based on the design doc rather than this commented code.

/**
* Sets locality list of the data.
* @param nodes nodes that has locality
* @param hard true if the locality list is a hard requirement, false otherwise
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These docstrings are a bit unclear to me. I guess the idea is that if hard=false, the data can be accessed from any node in the cluster, but if possible, it should be accessed from the nodes specified in this list?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Hard requirement means data can only be accessed on nodes in the locality list. Will change the docstring to make it clear.

*/
void set_locality(std::vector<std::string> const& nodes, bool hard);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • We prefer to use std::span and std::string_view instead of const vectors or strings.
  • Why do we need this method compared to the one in the builder?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Locality information might not be available at the time we create the data. For example, we create an associated data for HDFS files before we create the actual HDFS files and write to them so that Spider can gc the HDFS files on failure during write. However, we don't know the locality of the HDFS files before creating them, and thus we need a way to set locality on the fly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nodes represents an array of addresses. I'm not sure if std::span or std::string_view is suitable for this case.


class Builder {
private:
public:
/**
* Sets the key for the data. If no key is provided, Spider generates a key.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the key is optional, how about using std::optional to make that explicit in the API itself?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. Wrong docstring. Spider doesn't generate a key if no key is provided.
The point of using builder pattern is to provide flexibility so that users don't call the function they don't need.

* @param key of the data
*/
auto key(std::string const& key) -> Data<T>::Builder&;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return value is missing from docstring.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be clearer to name all these setters as set_xxx where xxx is the thing being set.

/**
* Sets locality list of the data to build.
* @param nodes nodes that has locality
* @param hard true if the locality list is a hard requirement, false otherwise
* @return self
*/
auto locality(std::vector<std::string> const& nodes, bool hard) -> Data<T>::Builder&;
/**
* Indicates that the data to build is persisted and should not be rollbacked on failure
* recovery.
* @return self
*/
// Data<T>::Builder Builder& mark_persist(); // Not implemented in milestone 1
/**
* Defines clean up functions of the data to build.
* @param f clean up function of data
*/
auto cleanup(std::function<T const&()> const& f) -> Data<T>::Builder&;
/**
* Defines rollback functions of the data to build.
* @param f rollback function of data
*/
// Not implemented for milestone 1
// auto rollback(std::function<const T&()> const& f) -> Data<T>::Builder&;
/**
* Builds the data. Stores the value of data into storage with locality list, persisted
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this class store the data?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internally all values are serialized and stored in data storage.

* flag, cleanup and rollback functions.
* @param t value of the data
* @return data object
*/
auto build(T const& /*t*/) -> Data<T>;
};
};

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

} // namespace spider

#endif // SPIDER_CLIENT_DATA_HPP
30 changes: 30 additions & 0 deletions src/spider/client/Future.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#include "Future.hpp"

#include <boost/uuid/uuid.hpp>
#include <boost/uuid/uuid_io.hpp>
#include <string>

namespace spider {

class FutureImpl {
// Implementation details subject to change
private:
boost::uuids::uuid m_id;

public:
auto value() -> std::string { return boost::uuids::to_string(m_id); }

auto ready() -> bool { return m_id.is_nil(); }
};

template <class T>
auto Future<T>::get() -> T {
return T();
}

template <class T>
auto Future<T>::ready() -> bool {
return true;
}

} // namespace spider
31 changes: 31 additions & 0 deletions src/spider/client/Future.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
#ifndef SPIDER_CLIENT_FUTURE_HPP
#define SPIDER_CLIENT_FUTURE_HPP

#include <memory>

namespace spider {

class FutureImpl;

template <class T>
class Future {
private:
std::unique_ptr<FutureImpl> m_impl;

public:
/**
* Gets the value of the future. Blocks until the value is available.
* @return value of the future
*/
auto get() -> T;

/**
* Checks if value of the future is ready.
* @return true if future is ready, false otherwise
*/
auto ready() -> bool;
};

} // namespace spider

#endif // SPIDER_CLIENT_FUTURE_HPP
45 changes: 45 additions & 0 deletions src/spider/client/Spider.hpp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this file doesn't correspond to a class, the filename should be lowercase.

Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
#ifndef SPIDER_CLIENT_SPIDER_HPP
#define SPIDER_CLIENT_SPIDER_HPP

#include <functional>
#include <string>

// NOLINTBEGIN(misc-include-cleaner)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be using IWYU's pragmas, right?

#include "Data.hpp"
#include "Future.hpp"
#include "Task.hpp"
#include "TaskGraph.hpp"

// NOLINTEND(misc-include-cleaner)

namespace spider {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading the functions in this file and those in KeyValueData, I feel like it makes more sense to have a Client/Driver class that encapsulates the functions (similar to Context). Otherwise, a user could call, for example, insert_kv without initializing a client and then it would fail or invoke undefined behaviour, right?

/**
* Initializes Spider library
*/
void init();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will this function do internally?


/**
* Connects to storage
* @param url url of the storage to connect
*/
void connect(std::string const& url);

/**
* Registers function to Spider
* @param function function to register
*/
template <class R, class... Args>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's replace R and Args with C++20 concepts that specify the exact properties of acceptable inputs. If I understand correctly, currently, return values and arguments can be a certain set of serializable values.

On that note though, if the requirement is that the value types need to be serializable, why don't we define an interface for serializable value types and then the user has the flexibility to use any serializable type they want? We could do this later, but conceptually, is it possible?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also use the same C++20 concepts wherever we have task inputs/outputs. This would also simplify the docstrings since we don't need to repeat what types are acceptable.

void register_task(std::function<R(Args...)> const& function);

/**
* Registers function to Spider with timeout
* @param function_name name of the function to register
* @param timeout task is considered straggler after timeout ms, and Spider triggers replicate the
* task
*/
template <class R, class... Args>
void register_task(std::function<R(Args...)> const& function, float timeout);

} // namespace spider

#endif // SPIDER_CLIENT_SPIDER_HPP
15 changes: 15 additions & 0 deletions src/spider/client/Task.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#include "Task.hpp"

#include <optional>
#include <string>

#include "Data.hpp"

namespace spider {

template <typename T>
auto get_data(std::string const& /*key*/) -> std::optional<Data<T>> {
return std::nullopt;
}

} // namespace spider
Loading
Loading