Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Timeplus APIs #18

Merged
merged 1 commit into from
Aug 15, 2024
Merged

add Timeplus APIs #18

merged 1 commit into from
Aug 15, 2024

Conversation

yuzifeng1984
Copy link
Contributor

@yuzifeng1984 yuzifeng1984 commented Aug 12, 2024

  1. Idempotent insert
  2. Thread-safe insert APIs
  3. More better controlled retry on exception.

@@ -0,0 +1,243 @@
#pragma once
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cloned from proton

@yuzifeng1984 yuzifeng1984 force-pushed the feature/huatai-requirements branch from 388c528 to 7568fe7 Compare August 12, 2024 14:29

namespace timeplus {

struct TimeplusConfig {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TimeplusConfig can contain a ClientConfig directly ? would this simplify code ?

Copy link
Contributor Author

@yuzifeng1984 yuzifeng1984 Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We overwrite some client config items and not want user to change it. So only expose a portion of the client parameters to config. The draw back is user could not change the unexposed parameters (such as ssl).

I am thinking of what is the best way to do this...

refer to https://github.com/timeplus-io/timeplus-cpp/pull/18/files#diff-cab949bec8607457ecea447bb6500ef4f75df665c6b193a77a12afce1a48417fR74

PS. segregate user from directly config client might make future extension easier, (eg. auto node host/port discovery)

std::string password;

/// Max number of connections maintained in pool.
unsigned int max_connections = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's stick to uint32_t etc new style

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this per host_port max_connections ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the total number of clients in the pool (so at-most max_connections will be created). Each client will round-robin pick the host/port when connect/re-connect, starting from the first host/port.

block.AppendColumn("i", col_i);
block.AppendColumn("s", col_s);

class Inserter {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this class and use a plain loop in the main which could be way more direct usage

inserter.InsertBlock();
}

while (inserter.BlockInserted() != INSERT_BLOCKS) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can have a bit advanced usage by using atomic wait https://en.cppreference.com/w/cpp/atomic/atomic/wait instead of spin

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently sdk is based on C++17. I will add notification later.

}

} else {
std::cout << "[" << timestamp() << "]\t Failed to insert block: insert_id=" << id << std::endl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a fail message / reason ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added to print out result.err_msg

}

void InsertBlock() {
tp_.InsertAsync(table_name_, block_, [this](uint64_t id, const auto& result) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the id is passed via callback, what can users do with it ?

Copy link
Contributor Author

@yuzifeng1984 yuzifeng1984 Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the internal id and changed the insert APIs in timeplus.h
The InsertResult contains the original parameters for user convinience.

auto handle_insert_result = [&](size_t user_block_id, const InsertResult& result) {...}

tp.InsertAsync("table", block,
    [user_block_id, &handle_insert_result](const BaseResult& result) {
            const auto& insert_result = static_cast<const InsertResult&>(result);
            handle_insert_result(block_id, insert_result);
});

Query query("INSERT INTO " + table_name + " ( " + fields_section.str() + " ) VALUES", query_id);
std::string settings;
if (!idempotent_id.empty()) {
settings = " SETTINGS idempotent_id='" + idempotent_id + "'";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a bit more perf

settings.reserve(128);
settings.append(" SETTINGS idempotent_id='").append(idempotent_id).append("'");

settings = " SETTINGS idempotent_id='" + idempotent_id + "'";
}

Query query("INSERT INTO " + table_name + " ( " + fields_section.str() + " )" + settings + " VALUES", query_id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, let's try to minimize string memory allocation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This in general seems very verbose string concatenation and lower perf, it is not even using native wire protocol ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to use the query settings serialization.


using InsertResult = BaseResult;

using Callback = std::function<void(uint64_t id, const BaseResult&)>;
Copy link
Contributor

@chenziliang chenziliang Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we provide an id for the callback, what users can do with the id ? It seems too late.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed the insert arguments and return. please check the update.


namespace timeplus {

struct TimeplusConfig {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nite : we can consider creating a separate timeplus_config.h file


std::optional<std::pair<ClientPtr, bool>> Acquire(int64_t timeout_ms);

void Release(ClientPtr& client, bool valid) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's take ClientPtr since it will be more convenient and perf-wise, it is still fine and caller can std::move(...) here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed.


~ClientPool() { assert(clients_.size() == pool_size_); }

std::optional<std::pair<ClientPtr, bool>> Acquire(int64_t timeout_ms);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need return a bool ? Can acquire return a valid ClientPtr ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking to distinct the cases between blocking queue is empty and the returned client cannot connect within timeout.

I changed the behavior to throw different exceptions when could not acquire the client.


auto& [client, valid] = maybe_client.value();
try {
/// Lazy init client
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can init the client greedy-ly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Client init will create connection to server which is relative heavy. May take even longer time when timeout. So put the init where user is using the it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The challenge with lazy init is we don't have a good way to report back, oh, your hostname port configure is wrong, we can't connect. For me an initial connect to report this configuration issue (which happens often) is good

timeplus/timeplus.cpp Outdated Show resolved Hide resolved
auto task_id = next_task_id_.fetch_add(1);
auto task = std::make_shared<InsertTask>();
task->task_id = task_id;
task->table_name = std::move(table_name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

task->table_name.swap(table_name)
task->block.swap(block)
task->idempotent_id.swap(idempotent_id);

Or provide a construct to InsertTask which takes these && parameters there

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a constructor

timeplus/timeplus.cpp Outdated Show resolved Hide resolved
timeplus/client.h Outdated Show resolved Hide resolved
timeplus/client.h Outdated Show resolved Hide resolved
timeplus/timeplus.cpp Outdated Show resolved Hide resolved
timeplus/timeplus.cpp Outdated Show resolved Hide resolved
timeplus/timeplus.cpp Outdated Show resolved Hide resolved
timeplus/timeplus.cpp Outdated Show resolved Hide resolved
timeplus/timeplus.cpp Outdated Show resolved Hide resolved
settings = " SETTINGS idempotent_id='" + idempotent_id + "'";
Query query("INSERT INTO " + table_name + " ( " + fields_section.str() + " ) VALUES", query_id);

if (!idempotent_id.empty() && versionNumber(server_info_.version_major, server_info_.version_minor, server_info_.version_patch) >=
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we cache version number ? and in our case probably we don't need this version number guard since all of our production enterprise users have version number bigger than this ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the server version check for idempotent insert

timeplus/timeplus.cpp Outdated Show resolved Hide resolved
timeplus/client.cpp Outdated Show resolved Hide resolved
void Insert(const std::string& table_name, const Block& block);
void Insert(const std::string& table_name, const std::string& query_id, const Block& block);
/// Insertion will be idempotent when `idempotent_id` is not empty.
void Insert(const std::string& table_name, const Block& block, const std::string & idempotent_id = "");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems it is good to introduce BlockPtr which is a std::shared_ptr<const Block> in this way we don't need copy block during ingest and when retry

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, the current legacy interface / implementation of this SDK is not necessary optimal, we can revise when necessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I change the new timeplus APIs to receive BlockPtr; while keep the legacy client interface since it does not need to own the block for usage and performance consideration. so user has more flexibility in using client::insert

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does legacy client copy the Block internally ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just double checked and no copy as far as I saw. all sub-function access/keep the block as const reference.

size_t query_text_size = INSERT_INTO.size() + table_name.size() + LEFT_PAREN.size() + fields_section_size + RIGHT_PAREN_VALUES.size();

std::string query_text;
query_text.reserve(query_text_size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can do query_text_size * 1.2 for example for safety

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed.

also refactor TimeplusConfig to allow user directly set client options.

@yuzifeng1984 yuzifeng1984 force-pushed the feature/huatai-requirements branch 2 times, most recently from 041ac46 to a0446c5 Compare August 15, 2024 06:36
@yuzifeng1984 yuzifeng1984 force-pushed the feature/huatai-requirements branch from 1299dc7 to cfbd06b Compare August 15, 2024 08:14
@yuzifeng1984 yuzifeng1984 merged commit ccd4178 into master Aug 15, 2024
9 checks passed
@yuzifeng1984 yuzifeng1984 deleted the feature/huatai-requirements branch August 15, 2024 09:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants