Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Add quick-start guide and example. #36

Merged
merged 20 commits into from
Jan 10, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
203 changes: 203 additions & 0 deletions docs/quick-start.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,203 @@
# Quick start

Spider is a distributed system for executing user-defined tasks. It is designed to achieve low
latency, high throughput, and robust fault tolerance.

The guide below briefly describes how to get started with running a task on Spider. At a high-level,
you'll need to:

* Write a task
* Build the task into a shared library
* Write a client to manage the task
* Build the client
* Set up a Spider cluster
* Run the client

The example source code for this guide is in `examples/quick-start`.

> [!NOTE] In the rest of this guide:
> 1. we specify source file paths relative to `examples/quick-start`.
> 2. all CMake commands should be run from inside `examples/quick-start`.

# Requirements

In the guide below, you'll need:

* CMake 3.22.1+
* GCC 10+ or Clang 7+
* [Docker] 20.10+
* If you're not running as root, ensure `docker` can be run
[without superuser privileges][docker-non-root].

# Writing a task

In Spider, a task is a C++ function that satisfies the following conditions:

* It is a non-member function.
* It takes one or more parameters:
* The first parameter must be a `TaskContext`.
* All other parameters must have types that conform to the `Serializable` or `Data` interfaces.
* It returns a value that conforms to the `Serializable` or `Data` interfaces.

Comment on lines +34 to +41
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Document task error handling

Consider adding information about:

  • How tasks should handle errors
  • Exception handling expectations
  • Error propagation to the client

> [!NOTE]
> You don't immediately need to understand the TaskContext, Serializable, or Data types as we'll
> explain them in other guides.

For example, the task in `src/tasks.cpp` computes and returns the sum of two integers.

> [!NOTE]
> The task is split into a header file and an implementation file so that it can be loaded as a
> library in the worker, as we'll see in later sections.

The integer parameters and return value are `Serializable` values.

The `SPIDER_REGISTER_TASK` macro at the bottom of `src/tasks.cpp` is how we inform Spider that a
function should be treated as a task.

# Building the task into a shared library

In order for Spider to run a task, the task needs to be compiled into a shared library that Spider
can load. The example's `CMakeLists.txt` demonstrates how to do this.

To build the shared library, run:

```shell
cmake -S . -B build
cmake --build build --parallel $(nproc) --target tasks
```

# Writing a client to manage the task

To make Spider to run a task, we first need to write a client application. Generally, a client:

1. connects to Spider;
2. submits the task for execution;
3. waits for its completion—whether it succeeds or fails;
4. and then handles the result.

For example, the client in `src/client.cpp` runs the `sum` task from the previous section and
verifies its result.

When we submit a task to Spider, Spider returns a `Job`, which represents a scheduled, running, or
completed task (or `TaskGraph`) in a Spider cluster.

> [!NOTE]
> `Job`s and `TaskGraph`s will be explained in another guide.

# Building the client

The client can be compiled like any normal C++ application, except that we need to link it to the
Spider client library and the `tasks` library. The example's `CMakeLists.txt` demonstrates how to do
this.

To build the client executable, run:

```shell
cmake --build build --parallel $(nproc) --target client
```

# Setting up a Spider cluster

Before we can run the client, we need to start a Spider cluster. The simplest Spider cluster
consists of:

* a storage backend;
* a scheduler instance;
* and a worker instance.

## Setting up a storage backend

Spider currently supports using MySQL or MariaDB as a storage backend. In this guide, we'll start
MariaDB in a Docker container:

```shell
docker run \
--detach \
--rm \
--name spider-storage \
--env MARIADB_USER=spider \
--env MARIADB_PASSWORD=password \
--env MARIADB_DATABASE=spider-storage \
--env MARIADB_ALLOW_EMPTY_ROOT_PASSWORD=true \
--publish 3306:3306 mariadb:latest
Comment on lines +118 to +122
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add security warning for credentials

The example uses hardcoded credentials in plain text. Consider adding a security note about:

  • Using environment variables for credentials
  • Implementing proper authentication in production
  • Using secure password practices

```

> [!WARNING]
> When the container above is stopped, the database will be deleted. In production, you should set
> up a database instance with some form of data persistence.

> [!WARNING]
> The container above is using hardcoded default credentials that shouldn't be used in production.

Comment on lines +125 to +131
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance security considerations

While the warnings about credentials and data persistence are good, consider adding:

  • Instructions for secure credential management
  • Network security recommendations
  • Access control best practices
  • SSL/TLS configuration guidelines

Would you like me to help draft these security guidelines?

🧰 Tools
🪛 Markdownlint (0.37.0)

248-248: null
Blank line inside blockquote

(MD028, no-blanks-blockquote)

Alternatively, if you have an existing MySQL/MariaDB instance, you can use that as well. Simply
create a database and authorize a user to access it.

## Setting up the scheduler

To build the scheduler, run:

```shell
cmake --build build --parallel $(nproc) --target spider_scheduler
```

To start the scheduler, run:

```shell
build/spider/src/spider/spider_scheduler \
--storage_url \
"jdbc:mariadb://localhost:3306/spider-storage?user=spider&password=password" \
--port 6000
```

NOTE:

* If you used a different set of arguments to set up the storage backend, ensure you update the
`storage_url` argument in the command.
* If the scheduler fails to bind to port `6000`, change the port in the command and try again.

## Setting up a worker

To build the worker, run:

```shell
cmake --build build --parallel $(nproc) --target spider_worker
```

To start a worker, run:

```shell
build/spider/src/spider/spider_worker \
--storage_url \
"jdbc:mariadb://localhost:3306/spider-storage?user=spider&password=password" \
--port 6000
```

NOTE:

If you used a different set of arguments to set up the storage backend, ensure you update the
`storage_url` argument in the command.

> [!TIP]
> You can start multiple workers to increase the number of concurrent tasks that can be run on the
> cluster.

# Running the client

To run the client:

```shell
build/client "jdbc:mariadb://localhost:3306/spider-storage?user=spider&password=password"
```

NOTE:

If you used a different set of arguments to set up the storage backend, ensure you update the
storage backend URL in the command.

# Next steps

In future guides, we'll explain how to write more complex tasks, as well as how to leverage Spider's
support for fault tolerance.

[Docker]: https://docs.docker.com/engine/install/
[docker-non-root]: https://docs.docker.com/engine/install/linux-postinstall/#manage-docker-as-a-non-root-user
20 changes: 20 additions & 0 deletions examples/quick-start/.clang-format
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
BasedOnStyle: "InheritParentConfig"

IncludeCategories:
# NOTE: A header is grouped by first matching regex
# Project headers
- Regex: "^\""
Priority: 4
# Library headers. Update when adding new libraries.
# NOTE: clang-format retains leading white-space on a line in violation of the YAML spec.
# Ex:
# - Regex: "<(fmt|spdlog)"
# Priority: 3
- Regex: "^<(absl|boost|catch2|fmt|mariadb|msgpack|spdlog|spider)"
Priority: 3
# C system headers
- Regex: "^<.+\\.h>"
Priority: 1
# C++ standard libraries
- Regex: "^<.+>"
Priority: 2
27 changes: 27 additions & 0 deletions examples/quick-start/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
cmake_minimum_required(VERSION 3.22.1)
project(spider_getting_started)

# Add the Spider library
add_subdirectory(../../ spider EXCLUDE_FROM_ALL)

# Add the tasks library
add_library(
tasks
SHARED
src/tasks.cpp
src/tasks.hpp
)

# Link the Spider library to the tasks library
target_link_libraries(tasks PRIVATE spider::spider)

# Add the client
add_executable(client src/client.cpp)

# Link the Spider and tasks library to the client
target_link_libraries(
client
PRIVATE
spider::spider
tasks
)
61 changes: 61 additions & 0 deletions examples/quick-start/src/client.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
#include <iostream>
#include <string>
#include <type_traits>
#include <utility>

#include <spider/client/spider.hpp>

#include "tasks.hpp"

// NOLINTBEGIN(bugprone-exception-escape)
auto main(int argc, char const* argv[]) -> int {
// Parse the storage backend URL from the command line arguments
if (argc < 2) {
std::cerr << "Usage: ./client <storage-backend-url>" << '\n';
return 1;
}
// NOLINTNEXTLINE(cppcoreguidelines-pro-bounds-pointer-arithmetic)
std::string const storage_url{argv[1]};
if (storage_url.empty()) {
std::cerr << "storage-backend-url cannot be empty." << '\n';
return 1;
}

// Create a driver that connects to the Spider cluster
spider::Driver driver{storage_url};
Comment on lines +24 to +25
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling for driver initialization.

The Spider driver initialization might throw exceptions. Consider wrapping it in a try-catch block:

     // Create a driver that connects to the Spider cluster
+    try {
         spider::Driver driver{storage_url};
+    } catch (const std::exception& e) {
+        std::cerr << "Failed to initialize Spider driver: " << e.what() << '\n';
+        return 1;
+    }

Committable suggestion skipped: line range outside the PR's diff.


// Submit the task for execution
int const x = 2;
int const y = 3;
spider::Job<int> job = driver.start(&sum, x, y);

Comment on lines +27 to +31
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add error handling for job submission

The start method could throw exceptions. Consider wrapping it in a try-catch block:

     // Submit the task for execution
     int const x = 2;
     int const y = 3;
+    spider::Job<int> job;
+    try {
         job = driver.start(&sum, x, y);
+    } catch (const std::exception& e) {
+        std::cerr << "Failed to submit job: " << e.what() << '\n';
+        return 1;
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Submit the task for execution
int const x = 2;
int const y = 3;
spider::Job<int> job = driver.start(&sum, x, y);
// Submit the task for execution
int const x = 2;
int const y = 3;
spider::Job<int> job;
try {
job = driver.start(&sum, x, y);
} catch (const std::exception& e) {
std::cerr << "Failed to submit job: " << e.what() << '\n';
return 1;
}
🧰 Tools
🪛 GitHub Actions: code-linting-checks

[error] 1: Code should be clang-formatted

// Wait for the job to complete
job.wait_complete();

// Handle the job's success/failure
switch (auto job_status = job.get_status()) {
case spider::JobStatus::Succeeded: {
auto result = job.get_result();
int const expected = x + y;
if (expected == result) {
return 0;
}
std::cerr << "`sum` returned unexpected result. Expected: " << expected
<< ". Actual: " << result << '\n';
return 1;
}
case spider::JobStatus::Failed: {
std::pair<std::string, std::string> const error_and_fn_name = job.get_error();
std::cerr << "Job failed in function " << error_and_fn_name.second << " - "
<< error_and_fn_name.first << '\n';
return 1;
}
default:
std::cerr << "Job is in unexpected state - "
<< static_cast<std::underlying_type_t<decltype(job_status)>>(job_status)
<< '\n';
return 1;
}
}

// NOLINTEND(bugprone-exception-escape)
12 changes: 12 additions & 0 deletions examples/quick-start/src/tasks.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
#include "tasks.hpp"

#include <spider/client/spider.hpp>

// Task function implementation
auto sum(spider::TaskContext& /*context*/, int x, int y) -> int {
return x + y;
}

// Register the task with Spider
// NOLINTNEXTLINE(cert-err58-cpp)
SPIDER_REGISTER_TASK(sum);
15 changes: 15 additions & 0 deletions examples/quick-start/src/tasks.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#ifndef TASKS_HPP
#define TASKS_HPP

#include <spider/client/spider.hpp>

// Task function prototype
/**
* @param context
* @param x
* @param y
* @return The sum of x and y.
*/
auto sum(spider::TaskContext& context, int x, int y) -> int;

#endif // TASKS_HPP
Loading
Loading