-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ffi: Add class for key-value pair log events. #507
Conversation
if (false == value.has_value()) { | ||
// Empty value | ||
if (SchemaTreeNode::Type::Obj != type) { | ||
return std::errc::protocol_error; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kinda feel like this can be moved into the value type validation method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefer to leave it outside for two reasons:
- empty isn't a
Value
type - This simplifies the helper function so it doesn't have to check
has_value()
and we don't need to symbols inside the helper (one for thestd::optional<Value>
and the over for the reference of the underlyingValue
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just quickly skimmed through the class and haven't read the unit tests yet.
* `node_id_value_pairs`. A node is considered a leaf if none of its descendants appear in the | ||
* `node_id_value_pairs`. | ||
*/ | ||
[[nodiscard]] auto is_leaf_node( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disclaimer: I only have a rough understanding on how schema tree works.
This function is bit confusing for me.
"A node is considered a leaf if none of its descendants appear in the node_id_value_pairs
." <- This doesn't seem to be the normal definition for a leaf node. Is this expected?
In addition, "the sub schema tree defined by node_id_value_pairs
." Is there any guarantee that nodes in the node_id_value_pairs
will form a subtree? Just looking at the argument without thinking about how it is generated, it is possible node_id_value_pairs contains a list of unconnected nodes.
And lastly a highlevel comment is that I feel this function should belong to the Schema tree class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a validation method: if an object type node appears in the key value pairs, it must be a leaf node. The way we store key value pairs is to store the schema (all as leaf nodes), so the path from the root to the leaves are implicitly indicated. In the merged schema tree, usually an object type node is a non leaf node. But in a key value pairs' schema, this node can be a leaf node as long as its value is {}
or null
. Our goal is to validate that if such a node appears, all its descendants in the merged schema tree can't appear in the schema of the given key value pair log event. it is possible node_id_value_pairs contains a list of unconnected nodes
isn't true since everything will be connected to the root if they are valid leaf nodes. This function is used to validate pairs using schema tree's information. I think we should split it from a schema tree's method.
auto const& node{schema_tree.get_node(node_id)}; | ||
auto const node_type{node.get_type()}; | ||
if (false == value.has_value()) { | ||
// Value is an empty object (`{}`, which is not the same as `null`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this comment tries to explain why the line doesn't use value.is_null();
in a way similar to line 68?
I don't think I follow the comment though. Can you clarify a bit?
And we can combine the two if statements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{}
andnull
are two different types of values.- I'd prefer to handle them separately. Combining these two statements will make it very hard to read
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't get when there will be {} and when there will be null (and why we only check for {} but not null in this function)
Is it because null is already handled in the validation? If so, I think you should leave this information in the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{}
is considered as a special case, where null
is a valid primitive value (like an int or a string). These are user-input values.
m_utc_offset{utc_offset} {} | ||
|
||
// Variables | ||
std::shared_ptr<SchemaTree> m_schema_tree; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Techinically it is possible to modify the original schema tree using this pointer in the class, right?
Should we consider using std::shared_ptr<SchemaTree const>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense
KeyValuePairLogEvent::NodeIdValuePairs const& node_id_value_pairs | ||
) -> std::optional<std::errc> { | ||
std::optional<std::errc> ret_val; | ||
std::unordered_map<SchemaTreeNode::id_t, std::unordered_set<std::string_view>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move this into the try to reduce its scope.
* @param node_id_value_pairs | ||
* @return `std::nullopt` if the inputs are valid, or an error code indicating the failure: | ||
* - std::errc::operation_not_permitted if a node ID doesn't represent a valid node in the | ||
* schema tree, or a non-leaf node ID is paired with a value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A non-leaf node paired with a value actually returns protocol_not_supported
. That said, I think it should return operation_not_permitted
since if it was permitted, node_id_value_pairs
wouldn't be a tree.
} | ||
} | ||
|
||
auto clone_value(Value const& value) -> Value { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I missed this in the previous PRs but it seems like we should just have copy constructors for Value
and EncodedTextAst
. It feels like an overoptimization to force people to never use copy construction unless they write a method manually like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add a copy constructor for EncodedTextAst
, however, I'd prefer to not have a copy constructor for Value
to enforce move semantics. How about making this clone
method as a member of Value
, so that people can explicitly copy a value when they really need
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's bad about copying a Value
? Is it not similar to copying a string
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copying string and EncodedTextAst
can be expensive in deserialization. With copy disabled, we are more confident that there's no copy overhead in deserialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding move constructor for EncodedTextAst
and Value
} | ||
|
||
{ | ||
// Empty invalid_node_id_value_pairs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overzealous refactoring.
} | ||
|
||
// NOLINTNEXTLINE(readability-function-cognitive-complexity) | ||
TEST_CASE("ffi_KeyValuePairLogEvent_create", "[ffi]") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can reduce the amount of code for this test by using SECTION
and even nested SECTION
macros. Recall that CMake runs every SECTION
from the beginning of the test, so the schema tree, node_id_value_pairs would be reset for each outer SECTION
.
E.g., instead of cloning valid_node_id_value_pairs
, you could just create it outside a SECTION
and use it inside different SECTION
s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually thought about using SECTION
but I didn't. Two reasons:
- I don't want to rerun everything from the beginning
- I want to test that the same schema tree is used across different log events (which have different life cycles)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- But by using sections, you'd only run the things outside the section(s) from the beginning, right?
- Testing the same instance of the schema tree across all of the tests seems a bit extra since it should be const in the class anyway (and if it wasn't const, you're still relying on finding a bug in the implementation with a set of test cases that isn't exhaustive). Imo, having less code in the test is more valuable than testing that corner case.
From my reading, you could do something like:
- Create schema tree
- Section (optional): Test empty id-value pairs
- Section (optional): Test mismatched types
- Section (optional): Test valid id-value pairs
- Section: Test duplicate key conflicts
- Section: Test implicit key conflicts
- Section (optional): Test out of bound node ID
That would allow you to at least get rid of the clone_node_id_value_pairs
function (and make the code a bit easier to read).
Co-authored-by: kirkrodrigues <[email protected]>
auto operator=(Value const&) -> Value& = delete; | ||
|
||
// Default move constructor and assignment operator | ||
// Default copy/move constructor and assignment operator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since everything except the constructor is default, we don't need to say they're default explicitly, right?
auto operator=(EncodedTextAst const&) -> EncodedTextAst& = delete; | ||
|
||
// Default move constructor and assignment operator | ||
// Default copy/move constructor and assignment operator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto about default.
REQUIRE_FALSE(result.has_error()); | ||
|
||
SECTION("Test duplicated key conflict on node #3") { | ||
auto conflict_pairs{valid_node_id_value_pairs}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the idea of using nested sections (besides breaking the code up) is that you wouldn't need to duplicate valid_node_id_value_pairs
since it would start with a fresh state in each section, no? (You'd obviously call it something other than valid_node_id_value_pairs
, but still.
Co-authored-by: kirkrodrigues <[email protected]>
Co-authored-by: kirkrodrigues <[email protected]>
Description
This PR introduces a new class
KeyValuePairLogEvent
. This class represents a key-value pair log event serialized by the key-value pair IR format. It consists of the following members:Value
, where the id refers to the schema tree node id.An instance must be created using the factory function. The factory function will validate whether the value's type matches the corresponding key. The input of the factory function should come from the deserialization methods, which will be used in the coming PRs.
Validation performed