-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding derive macros to parse structs to bins and values #126
base: master
Are you sure you want to change the base?
Conversation
…ospike wrapper crate
# Conflicts: # aerospike-core/src/client.rs # aerospike-core/src/cluster/mod.rs # aerospike-core/src/commands/admin_command.rs # aerospike-core/src/commands/stream_command.rs
I'm going to improve this one to make it less memory heavy. Did you already start to implement a method for generic input on the value part? Otherwise I would add it here to reduce it to one function again. |
Thanks @jonas32 . I have not done anything in this regard, but I don't think this approach would yield the desired outcome, although any improvements are welcome. I think the best way to go about it is to have traits like eg. |
Something that just occurred to me is that using const generics to return an array instead of a vec. That would be allocated on the stack and theoretically much more efficient. The issue with that is longer compile times and bloated code. |
The BinWireWriter idea sounds good to me. Sounds possible, but I'm not sure if that's in scope of this PR. Maybe worth a second PR to first introduce the Traits and afterwards patching this one. If you want, i can start to work on the Traits first. |
Exactly, although I guess we probably have to come up with better names.
I agree. I was just bouncing ideas off you and the rest of the community, not particularly asking for changes in this PR. With this new scheme, we could even alias the struct fields to custom bin names and a lot more capabilities. |
Yes, i think we could use the serde crate and its derive features as a reference.
any other suggestions? |
How should we handle structs that cant be parsed by the derive macro or need custom logic? |
Yes it needs the buffer, and that's the best part. For writes, we could provide a I looked around yesterday and find this library very useful in understanding how this could be implemented: |
I'm thinking about adding a second abstraction layer over the buffer. That could provide "easy" functions for wire writing instead. |
I don't see how the user would have access to any of the lower level API on the buffer. Aren't they all private? The only public API would be those required to read and write |
no, a lot of them are public. The encoder for operations works that way. If we give the user the ability to directly interact with the buffer instance, they can also access them. |
Wow, what a noob mistake on my part. We need to mark them Now that I think of it, we probably do not need to pass |
I tested a bit how this could be done. Maybe you want to give some thoughts on it before i continue to implement it everywhere. In this version, i completely work around the original encoders and do the encoding completely in the trait implementations. The focus on this was primarily to deprecate Bins and Value and only work in this style. But to keep the API more or less stable, its easily possible to also implement the Traits on Bins and Values like i did in traits.rs. Currently this is only implemented for WriteCommand. There is a unit test that passes for it, but most other tests are probably broken now. Here you see the Traits for Bins and Values as Writers. At first i wanted to make it one Trait, but this gives a lot of potential for passing wrong arguments to some functions. Client is modified to directly accept the Trait instances: This is the derive part that builds the functions. Nearly all allocations are compile time now. In the end, it can be used like this: The code is not beautiful yet and not really optimized. Its just a "working poc" version. |
Thank you @jonas32 , this is more or less what I had in mind.
I don't think there's a clean way around this, but you could have one writer with a parameter that is passed to it selecting one or the other if you wanted. I prefer the current distinction myself.
Great. I don't like the idea of having a different set of APIs as much as we can.
I believe this is a step in the right direction and we are closer to the final design now. We have to come up with better names for the macros and traits though. There are a few more ideas I have that we may want to pursue. Off the top of my head:
ps. Jst a little nit: I changed the |
yes, definitely. This will be needed.
I'm not sure if that's in scope for the normal lib. I feel like this is rather the job of a webserver etc. instead of the DB layer.
Yes, will come too. Worked on it a little already. I think ill introduce another function to check if a bin should be parsed at all. That would make Result or Option easy to build.
Maybe as a additional field modifier? I'm not a big fan of that, since i think parsing structs from and to maps is a huge use case. For me probably the primary one since building nested structures manually on Values is not very friendly right now. |
@jonas32 can |
great idea, didn't think of that yet. Will come! |
I just found another problem while working on the read macro. As far as i can see, the required full size information should be included in this part of the buffer |
Just to clarify: Do you mean that you keep the buffer engaged until the record is dropped to avoid memory allocation at all cost? |
No, the record is not getting dropped. I just need it until the Record is fully parsed. But i already came to the conclusion that its better the other way around and changing the normal read command to smaller Buffers. Now the smaller Buffers with just the Value Data inside get cloned, since they are allocated anyway and the original Buffer will discard them right after the clone. Should result with nearly the same memory usage and makes it a lot easier. |
@jonas32 I see, you want to move to a streaming paradigm to reduce the memory usage. That's fantastic, but with caveats:
I would love to see the streaming API implemented, that would allow us to have constant sized, small buffers and get rid of one the most annoying parts of the client tuning. As great would be to support this in sending commands as well, circumventing the buffer allocation, sizing, resizing, pooling and deallocation. My initial design of the client was to support this, that's why the buffer struct is there, but we deviated from that idea at some point. ps. Calling |
Per this post, the first issue is easily fixable: https://stackoverflow.com/questions/19076719/how-do-i-convert-a-vector-of-bytes-u8-to-a-string |
I'm still fully relying on the Buffer and encoder/decoder methods that are in place, so this should already solve it: I also thought about the consequences of reading so often in small pieces. Ill test some things and check if we can increase the performance there. |
This still requires an intermediary buffer to read into. My solution would allow reading directly into the string builder's buffer, and would remove the need to have a big intermediary buffer completely. All other data types would require mere bytes of buffer to read, and would work with buffers allocated on stack.
I think you should first take the naive approach. Improve after you hit performance issues. |
Ah ok, now i get what you mean with the Buffer Strings. |
I think we are talking past each other due to my inability to internalize the problem you are facing. When you have your initial POC (even if it does not work), I'll take a look and then we can then talk about the specifics. Does that sound good? |
ReadableBins is now working too now. Supports the basic stuff like ints and string, lists, nested lists, options and field annotations (only rename) by now. While doing that, i found 3 questions.
My full list is this currently (Read/Write):
Ill probably also have a look at Bitwise/HLL later on, but I'm not sure yet if i want to introduce custom Types or Annotations. |
The way that works is that it allocates the data for one command, and then trims back the buffer and releases the excess memory. Not the most efficient, but works if tweaked well by setting
That's because of our less-than-standard use of Msgpack. Don't mind it, it is what it is.
I believe it was and still is a bad idea to automatically cast/convert data, outside of integers. I'm not sure why f32 was even supported. I'd rather remove it. I believe we should not venture outside of what we already have with Values, and for custom converters, it should happen in a rigid and lossless manner.
Can be casted down at runtime, it's lossless. For the one's that don't match the size, a parsing error should be returned.
same as above.
We probably should remove this completely. Let the users cast up/down at their own risk.
Why no read?
Is this for GeoJson? Now that you have made the change, where do I have to look to understand the point you were making before regarding non-streaming buffers? |
Ok, limiting the Data Types sounds good to me. I want to keep supporting Value, since its not that easy to build every data layout. Since aerospike does not care for data types inside of CDT contexts, it could become hard to build Lists/Maps otherwise. Also as a compatibility layer to not completely break everything. About the Buffers: |
This PR adds the
ToBins
andToValue
derive macros.Manually parsing large structs into Aerospike Bins or Values is a lot of boilerplate work currently.
The macros can simply be attached to any struct. They will add a
to_bins
orto_value
function.The
to_bins
function returns aVec<Bin>
that can be passed to write functions like theclient::put
function.The
to_vec
function builds a HashMap Value for the struct.ToBins is based on ToValue so sub-structs are also parsed.
The ToValue trait has implementations for the common data types including variations with HashMaps or Vecs (if i didn't forget any, otherwise please tell me).
I also added a
client::put_struct
method to make it more convenient.Caution, this branch is based on #108 (Async Client)