Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Types design #17

Open
CthulhuDen opened this issue Dec 28, 2018 · 10 comments
Open

Types design #17

CthulhuDen opened this issue Dec 28, 2018 · 10 comments

Comments

@CthulhuDen
Copy link

CthulhuDen commented Dec 28, 2018

Why should the Trie itself be parameterized over key type? Currently both having K as type parameter and needing it passed as owned to insert and (especially) to entry pretty much forces us to make copy of key for each invocation. I think we only need key when first inserting into the trie, which could be better achieved with ToOwned (btw this comment makes sense for me:

// A node in the trie. `K` must be `ToOwned` because the `Owned` version is what we store.
). This way, insert and entry would require Borrow + ToOwned and would clone only when really necessary.

Then if we stored [u8] in some form in node and parameterize functions like insert and entry themselves which would allow passing them &'a [u8] with different 'a which is currently impossible.

@sdleffler
Copy link
Owner

Storing [u8] in a node can only be done by some sort of heap indirection (minus small_vec and similar optimizations.) Using ToOwned/Borrow escapes the need for any such thing when using small or fixed-size keys, for example small byte arrays ([u8; 4].)

@sdleffler
Copy link
Owner

In addition, Trie is (despite having a byte slice interface) effectively a key-value map. I don't like the idea of allowing users to insert keys of multiple different types into the same Trie; that seems like a good recipe for a type error, and something that should be solved by making an enum of different key types. Do you have a use case for this lack of parametricity over K?

@CthulhuDen
Copy link
Author

CthulhuDen commented Dec 28, 2018

But isn't it the case that the keys are usually non-fixed-length?

My use case involves storing (small number of) different strings and finding/inserting (if new) them frequently (like 1 mil rps). I really don't want to clone key on every request, and I cannot pass string ownership to entry() either. The fact that Trie (persistent) is parameterized over K and lifetime of key must be at least that of Trie I cannot specify K to reference type.

@CthulhuDen
Copy link
Author

Now imagine if Trie wasn't parameterized (I'm new to rust so not sure on exact signature)

struct Trie(Vec<u8>);

impl Trie {
    fn insert<K: ?Sized>(&mut self, key: &K)
    where
        Vec<u8>: Borrow<K>,
        K: ToOwned<Owned = Vec<u8>>,
    {
        self.0 = key.to_owned();
    }
}

This would allow me pass &[u8] as key and it would only be copied when necessary.

@sdleffler
Copy link
Owner

It does not ring true for me that cloning the string is necessary. It's been a while since I looked at this code but Borrow<K> should allow you to pass in a reference instead and have it not be cloned unless it is not in the trie. Have you looked through the source? I'll need to dig through to check this.

@CthulhuDen
Copy link
Author

I could pass reference (instantiate K to reference type), but I would have to fix lifetime of references which I can pass to insert/entry for each Trie instance, which is exactly why I have a problem with parameterizing the Trie data type.

@CthulhuDen
Copy link
Author

CthulhuDen commented Dec 28, 2018

Ok so maybe example:

let mut trie: Trie<&[u8], u16> = Trie::new();
{
    let keyFromSomewhere = [a, b, c];
    trie.entry(&keyFromSomewhere[..]);
}
{
    let keyFromSomewhereElse = [d, e, f];
    trie.entry(&keyFromSomewhereElse[..]);
}

This currently won't work because lifetimes of two references will not match. It would not be a problem if the function entry was parameterized instead of whole Trie.

@sdleffler
Copy link
Owner

Mm, I see your problem.

This does not need to be solved by removing the K parameter (and I am immediately against that solution, I think it destroys part of the type safety which is preserved by its presence.) It should be possible to add a new type variable in insert, call L: ToOwned<Owned = K>, for which you can pass in either K or &K (or other types implementing ToOwned<Owned = K>. Calling to_owned on K would be a noop and simply pass the value through, while calling to_owned on &K would end in a clone.

I don't remember if there's a reason I didn't implement it this way. It may have been an oversight, or caused other issues with types. Will try to take a look tonight and see if I can figure it out.

@sdleffler
Copy link
Owner

If these keys will never be removed, and it is acceptable to keep their memory for the lifetime of your binary, I believe there are libraries which can be used to intentionally leak memory, getting you an &'static [u8] which will not have the lifetime issue. Could be a usable temporary solution.

@CthulhuDen
Copy link
Author

Thanks for having a look at the issue. However considering I want to avoid copying the keys because of high intensity of requests, I dont think leaking the keys memory forever is a solution for me =)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants