Skip to content
This repository has been archived by the owner on Nov 4, 2024. It is now read-only.

Commit

Permalink
Documents update and annotation completion (#346)
Browse files Browse the repository at this point in the history
Signed-off-by: Jiayu Wu <[email protected]>
  • Loading branch information
JiayuZzz authored Sep 30, 2022
1 parent 61bb464 commit 5cff991
Show file tree
Hide file tree
Showing 11 changed files with 446 additions and 163 deletions.
11 changes: 10 additions & 1 deletion doc/benchmark.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,15 @@ To test performance of KVDK, you can run our benchmark tool "bench", the tool is

You can manually run individual benchmark follow the examples as shown bellow, or simply run our basic benchmark script "scripts/run_benchmark.py" to test all the basic read/write performance.

To run the script, you shoulf first build kvdk, then run:

```
scripts/run_benchmark.py [data_type] [key distribution]
```

data_type: Which data type to benchmark, it can be string/sorted/hash/list/blackhole/all

key distribution: Distribution of key of the benchmark workloads, it can be random/zipf/all
## Fill data to new instance

To test performance, we need to first fill key-value pairs to the KVDK instance. Since KVDK did not support cross-socket access yet, we need to bind bench program to a numa node:
Expand All @@ -20,7 +29,7 @@ Explanation of arguments:

-space: PMem space that allocate to the KVDK instance.

-max_access_threads: Max concurrent access threads of the KVDK instance, set it to the number of the hyper-threads for performance consideration.
-max_access_threads: Max concurrent access threads in the KVDK instance, set it to the number of the hyper-threads for performance consideration. You can call KVDK API with any number of threads, but if your parallel threads more than max_access_threads, the performance will be degraded due to synchronization cost

-type: Type of key-value pairs to benchmark, it can be "string", "hash" or "sorted".

Expand Down
98 changes: 68 additions & 30 deletions doc/user_doc.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
KVDK
=======

KVDK(Key-Value Development Kit) is a Key-Value store for Persistent memory(PMem).
KVDK(Key-Value Development Kit) is a Key-Value store for Persistent Memory (PMem).

KVDK supports both sorted and unsorted KV-Pairs.
KVDK supports basic read and write operations on both sorted and unsorted KV-Pairs, it also support some advanced features, such as **backup**, **checkpoint**, **expire key**, **atomic batch write** and **transactions**.

Code snippets in this user documents are from `./examples/tutorial/cpp_api_tutorial.cpp`, which is built as `./build/examples/tutorial/cpp_api_tutorial`.

Expand Down Expand Up @@ -70,7 +70,7 @@ int main()
`kvdk::Status` indicates status of KVDK function calls.
Functions return `kvdk::Status::Ok` if such a function call is a success.
If exceptions are raised during function calls, other `kvdk::Status` is returned,
such as `kvdk::Status::MemoryOverflow`.
such as `kvdk::Status::MemoryOverflow` while no enough memory to allocate.

## Close a KVDK instance

Expand All @@ -97,26 +97,45 @@ int main()
```

## Data types
KVDK currently supports string type for both keys and values.
### Strings
All keys and values in a KVDK instance are strings.
KVDK currently supports raw string, sorted collection, hash collection and list data type.

### Raw String

All keys and values in a KVDK instance are strings. You can directly store or read key-value pairs in global namespace, which is accessible via Get, Put, Delete and Modify operations, we call them string type data in kvdk.

Keys are limited to have a maximum size of 64KB.

A string value can be at max 64MB in length by default. The maximum length can be configured when initializing a KVDK instance.
A value can be at max 64MB in length by default. The maximum length can be configured when initializing a KVDK instance.

### Collections

Instead of raw string, you can organize key-value pairs to a collection, each collection has its own namespace.

Currently we have three types of collection:

#### Sorted Collection

KV pairs are stored with some kind of order (lexicographical order by default) in Sorted Collection, they can be iterated forward or backward starting from an arbitrary point(at a key or between two keys) by an iterator. They can also be directly accessed via SortedGet, SortedPut, SortedDelete operations.

#### Hash Collection

Hash Collection is like Raw String with a name space, you can access KV pairs via HashGet, HashPut, HashDelete and HashModify operations.

In current version, performance of operations on hash collection is similar to sorted collection, which much slower than raw-string, so we recomend use raw-string or sorted collection as high priority.

#### List

## Collections
All Key-Value pairs(KV-Pairs) are organized into collections.
List is a list of string elements, you can access elems at the front or back via ListPushFront, ListPushBack, ListPopFron, ListPopBack, or operation elems with index via ListInsertAt, ListInsertBefore, ListInsertAfter and ListErase. Notice that operation with index take O(n) time, while operation on front and back only takes O(1).

There is an anonymous global collection with KV-Pairs directly accessible via Get, Put, Delete operations. The anonymous global collection is unsorted.
### Namespace

Users can also create named collections.
Each collection has its own namespace, so you can store same key in every collection. Howevery, collection name and raw string key are in a same namespace, so you can't assign same name for a collection and a string key, otherwise a error status (Status::WrongType) will be returned.

KVDK currently supports sorted named collections. Users can iterate forward or backward starting from an arbitrary point(at a key or between two keys) by an iterator. Elements can also be directly accessed via SortedGet, SortedPut, SortedDelete operations.
## API Examples

## Reads and Writes in Anonymous Global Collection
### Reads and Writes with String type

A KVDK instance provides Get, Put, Delete methods to query/modify/delete entries.
A KVDK instance provides Get, Put, Delete methods to query/modify/delete raw string kvs.

The following code performs a series of Get, Put and Delete operations.

Expand All @@ -125,7 +144,7 @@ int main()
{
... Open a KVDK instance as described in "Open a KVDK instance" ...

// Reads and Writes on Anonymous Global Collection
// Reads and Writes String KV
{
std::string key1{"key1"};
std::string key2{"key2"};
Expand Down Expand Up @@ -173,11 +192,11 @@ int main()
}
```

## Reads and Writes in a Named Collection
### Reads and Writes in a Sorted Collection

A KVDK instance provides SortedGet, SortedPut, SortedDelete methods to query/modify/delete sorted entries.

The following code performs a series of SortedGet, SortedPut and SortedDelete operations, which also initialize a named collection implicitly.
The following code performs a series of SortedGet, SortedPut and SortedDelete operations on a sorted collection.

```c++
int main()
Expand All @@ -194,9 +213,13 @@ int main()
std::string value2{"value2"};
std::string v;

// You must create sorted collections before you do any operations on them
status = engine->SortedCreate(collection1);
assert(status == kvdk::Status::Ok);
status = engine->SortedCreate(collection2);
assert(status == kvdk::Status::Ok);

// Insert key1-value1 into "my_collection_1".
// Implicitly create a collection named "my_collection_1" in which
// key1-value1 is stored.
status = engine->SortedPut(collection1, key1, value1);
assert(status == kvdk::Status::Ok);

Expand All @@ -206,8 +229,6 @@ int main()
assert(v == value1);

// Insert key1-value2 into "my_collection_2".
// Implicitly create a collection named "my_collection_2" in which
// key1-value2 is stored.
status = engine->SortedPut(collection2, key1, value2);
assert(status == kvdk::Status::Ok);

Expand Down Expand Up @@ -236,8 +257,13 @@ int main()
status = engine->SortedDelete(collection1, key1);
assert(status == kvdk::Status::Ok);

printf("Successfully performed SortedGet, SortedPut, SortedDelete operations on named "
"collections.\n");
// Destroy sorted collections
status = engine->SortedDestroy(collection1);
assert(status == kvdk::Status::Ok);
status = engine->SrotedDestroy(collection2);
assert(status == kvdk::Status::Ok);

printf("Successfully performed SortedGet, SortedPut, SortedDelete operations.\n");
}

... Do something else with KVDK instance ...
Expand All @@ -246,17 +272,18 @@ int main()
}
```

## Iterating a Named Collection
The following example demonstrates how to iterate through a named collection. It also demonstrates how to iterate through a range defined by Key.
### Iterating a Sorted Collection
The following example demonstrates how to iterate through a sorted collection at a consistent view of data. It also demonstrates how to iterate through a range defined by Key.

```c++
int main()
{
... Open a KVDK instance as described in "Open a KVDK instance" ...

// Iterating a Sorted Named Collection
// Iterating a Sorted Sorted Collection
{
std::string sorted_collection{"my_sorted_collection"};
engine->SortedCreate(sorted_collection);
// Create toy keys and values.
std::vector<std::pair<std::string, std::string>> kv_pairs;
for (int i = 0; i < 10; ++i) {
Expand All @@ -282,7 +309,9 @@ int main()
// Sort kv_pairs for checking the order of "my_sorted_collection".
std::sort(kv_pairs.begin(), kv_pairs.end());

// Iterate through collection "my_sorted_collection"
// Iterate through collection "my_sorted_collection", the iter is
// created on a consistent view while you create it, e.g. all
// modifications after you create the iter won't be observed
auto iter = engine->SortedIteratorCreate(sorted_collection);
iter->SeekToFirst();
{
Expand Down Expand Up @@ -320,7 +349,7 @@ int main()
}
}

printf("Successfully iterated through a sorted named collections.\n");
printf("Successfully iterated through a sorted collections.\n");
engine->SortedIteratorRelease(iter);
}

Expand All @@ -330,7 +359,7 @@ int main()
}
```

## Atomic Updates
### Atomic Updates
KVDK supports organizing a series of Put, Delete operations into a `kvdk::WriteBatch` object as an atomic operation. If KVDK fail to apply the `kvdk::WriteBatch` object as a whole, i.e. the system shuts down during applying the batch, it will roll back to the status right before applying the `kvdk::WriteBatch`.

```c++
Expand Down Expand Up @@ -387,7 +416,12 @@ A KVDK instance can be accessed by multiple read and write threads safely. Synch
Users can configure KVDK to adapt to their system environment by setting up a `kvdk::Configs` object and passing it to 'kvdk::Engine::Open' when initializing a KVDK instance.

### Max Access Threads
Maximum number of access threads is specified by `kvdk::Configs::max_access_threads`. Defaulted to 48. It's recommended to set this number to the number of threads provided by CPU.
Maximum number of internal access threads in kvdk is specified by `kvdk::Configs::max_access_threads`. Defaulted to 64. It's recommended to set this number to the number of threads provided by CPU.

You can call KVDK API with any number of threads, but if your parallel threads more than max_access_threads, the performance will be degraded due to synchronization cost

### Clean Threads
KVDK reclaim space of updated/deleted data in background with dynamic number of clean threads, you can specify max clean thread number with `kvdk::Configs::clean_threads`. Defaulted to 8, you can config more clean threads in delete intensive workloads to avoid space be exhausted.

### PMem File Size
`kvdk::Configs::pmem_file_size` specifies the space allocated to a KVDK instance. Defaulted to 2^38Bytes = 256GB.
Expand Down Expand Up @@ -418,3 +452,7 @@ Specified by `kvdk::Configs::hash_bucket_num`. Greater number will improve perfo

### Buckets per Slot
Specified by `kvdk::Configs::num_buckets_per_slot`. Smaller number will improve performance by reducing lock contentions and improving caching at the cost of greater DRAM space. Please read Architecture Documentation for details before tuning this parameter.

## Advanced features and more API

Please read examples/tutorial for more API and advanced features in KVDK.
11 changes: 11 additions & 0 deletions engine/hash_collection/hash_list.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,17 @@ class HashList : public Collection {
// Notice: the deleting key should already been locked by engine
WriteResult Delete(const StringView& key, TimestampType timestamp);

// Modify value of "key" in the hash list
//
// Args:
// * modify_func: customized function to modify existing value of key. See
// definition of ModifyFunc (types.hpp) for more details.
// * modify_args: customized arguments of modify_func.
//
// Return:
// Status::Ok if modify success.
// Status::Abort if modify function abort modifying.
// Return other non-Ok status on any error.
WriteResult Modify(const StringView key, ModifyFunc modify_func,
void* modify_args, TimestampType timestamp);

Expand Down
18 changes: 9 additions & 9 deletions engine/kv_engine.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1226,10 +1226,10 @@ Status KVEngine::batchWriteRollbackLogs() {
return Status::Ok;
}

Status KVEngine::GetTTL(const StringView str, TTLType* ttl_time) {
Status KVEngine::GetTTL(const StringView key, TTLType* ttl_time) {
*ttl_time = kInvalidTTL;
auto ul = hash_table_->AcquireLock(str);
auto res = lookupKey<false>(str, ExpirableRecordType);
auto ul = hash_table_->AcquireLock(key);
auto res = lookupKey<false>(key, ExpirableRecordType);

if (res.s == Status::Ok) {
ExpireTimeType expire_time;
Expand Down Expand Up @@ -1266,15 +1266,15 @@ Status KVEngine::TypeOf(StringView key, ValueType* type) {
if (res.s == Status::Ok) {
switch (res.entry_ptr->GetIndexType()) {
case PointerType::Skiplist: {
*type = ValueType::SortedSet;
*type = ValueType::SortedCollection;
break;
}
case PointerType::List: {
*type = ValueType::List;
break;
}
case PointerType::HashList: {
*type = ValueType::HashSet;
*type = ValueType::HashCollection;
break;
}
case PointerType::StringRecord: {
Expand All @@ -1289,7 +1289,7 @@ Status KVEngine::TypeOf(StringView key, ValueType* type) {
return res.s == Status::Outdated ? Status::NotFound : res.s;
}

Status KVEngine::Expire(const StringView str, TTLType ttl_time) {
Status KVEngine::Expire(const StringView key, TTLType ttl_time) {
auto thread_holder = AcquireAccessThread();

int64_t base_time = TimeUtils::millisecond_time();
Expand All @@ -1298,10 +1298,10 @@ Status KVEngine::Expire(const StringView str, TTLType ttl_time) {
}

ExpireTimeType expired_time = TimeUtils::TTLToExpireTime(ttl_time, base_time);
auto ul = hash_table_->AcquireLock(str);
auto ul = hash_table_->AcquireLock(key);
auto snapshot_holder = version_controller_.GetLocalSnapshotHolder();
// TODO: maybe have a wrapper function(lookupKeyAndMayClean).
auto lookup_result = lookupKey<false>(str, ExpirableRecordType);
auto lookup_result = lookupKey<false>(key, ExpirableRecordType);
if (lookup_result.s == Status::Outdated) {
return Status::NotFound;
}
Expand All @@ -1313,7 +1313,7 @@ Status KVEngine::Expire(const StringView str, TTLType ttl_time) {
ul.unlock();
version_controller_.ReleaseLocalSnapshot();
lookup_result.s = Modify(
str,
key,
[](const std::string* old_val, std::string* new_val, void*) {
new_val->assign(*old_val);
return ModifyOperation::Write;
Expand Down
4 changes: 2 additions & 2 deletions engine/kv_engine.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -77,13 +77,13 @@ class KVEngine : public Engine {
// 1. Expire assumes that str is not duplicated among all types, which is not
// implemented yet
// 2. Expire is not compatible with checkpoint for now
Status Expire(const StringView str, TTLType ttl_time) final;
Status Expire(const StringView key, TTLType ttl_time) final;
// Get time to expire of str
//
// Notice:
// Expire assumes that str is not duplicated among all types, which is not
// implemented yet
Status GetTTL(const StringView str, TTLType* ttl_time) final;
Status GetTTL(const StringView key, TTLType* ttl_time) final;

Status TypeOf(StringView key, ValueType* type) final;

Expand Down
9 changes: 4 additions & 5 deletions include/kvdk/configs.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,6 @@ enum class LogLevel : uint8_t {
None,
};

// A snapshot indicates a immutable view of a KVDK engine at a certain time
struct Snapshot {};

// Configs of created sorted collection
// For correctness of encoding, please add new config field in the end of the
// existing fields
Expand All @@ -31,12 +28,14 @@ struct SortedCollectionConfigs {
};

struct Configs {
// TODO: rename to concurrent internal threads
//
// Max number of concurrent threads read/write the kvdk instance internally.
// Set it >= your CPU core number to get best performance
// Set it to the number of the hyper-threads to get best performance
//
// Notice: you can call KVDK API with any number of threads, but if your
// parallel threads more than max_access_threads, the performance will be
// damaged due to synchronization cost
// degraded due to synchronization cost
uint64_t max_access_threads = 64;

// Size of PMem space to store KV data, this is not scalable in current
Expand Down
Loading

0 comments on commit 5cff991

Please sign in to comment.