-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pv naming #22
Add pv naming #22
Conversation
When printing trie it'll show the keys in hierarchical way like file system Signed-off-by: Pavel Abramov <[email protected]>
So if you want to save /my/awesome/file.txt it will create /my/awesome folders if they don't exist, otherwise it would return error on writing Signed-off-by: Pavel Abramov <[email protected]>
dd8a615
to
5a9e6c2
Compare
@deitch ready for review |
I don't get what this does. It replaces the second parameter to |
Converter is to change string to appropriate format for your StorageEngine. So key could be keyspace1.keyspace2.key but in reality you want to store this as keyspace1/keyspace2/key.json on a file system or your current running system is using this structure. In theory I can make it part of FileStorage and say that if you want other formatting, just rewrite FileStorage as you wish. Or I can parametrise that saying that maybe you want your key to be converted to an appropriate format before using the driver. I'm not sure which approach is the best one here. |
I think keep it part of StorageEngine. It is fundamental to a storage engine that it knows how to store things. The interface between core and storage is, "here is a key and value, please store it". How it decides to do it is up to the engine. Much simpler to think about. |
5a9e6c2
to
406aba5
Compare
Rewrote it :) |
Looks cleaner. This part here looks like NkvCore is reading from files? Why is that in core and not part of the storage engine? |
@deitch we need to restore all the persist values from the checkpoints from file system, we should make it part of storage engine, I can abstract it and return some kind of vector of StorageEngine |
What do you think about StroageEngine instead of using this API pub trait StorageEngine: PartialEq {
fn new(data: Box<[u8]>, key: &str, root: PathBuf) -> std::io::Result<Self>
where
Self: Sized;
fn from_checkpoint(filepath: &Path, root: PathBuf) -> std::io::Result<Self>
where
Self: Sized;
fn update(&mut self, new_data: Box<[u8]>) -> std::io::Result<()>;
fn delete_checkpoint(&self) -> std::io::Result<()>;
fn data(&self) -> Arc<[u8]>;
fn key(&self) -> String;
} would use this pub trait StorageEngine: PartialEq {
// Restore if exists
fn new(root: PathBuf) -> std::io::Result<Self>
where
Self: Sized;
fn put(&mut self, key: &str, data: Box<[u8]>) -> std::io::Result<()>;
fn delete(&self, key: &str) -> std::io::Result<()>;
fn data(&self, key: &str) -> Arc<[u8]>;
fn key_to_nkv_notation(&self, key: &str) -> String; // not sure how to name this right
} Meaning that StorageEngine actually oversees the whole collection of entries rather than we create an object of StorageEngine for each entry. So on a high level I'm saying this thing handles all my entries and it takes care of part of nkv API Edit: I think this looks much cleaner from developer perspective |
Why is checkpoint outside of storage engine? More fundamentally, I am not convinced you need checkpoint at all at this stage. It is an advanced feature. Having the ability to get/put/delete/subscribe/unsubscribe is plenty for now. Assuming you do need it, it is entirely within the storage engine, and should not leak out the abstraction for storage. For example: fn from_checkpoint(filepath: &Path, root: PathBuf) -> std::io::Result<Self> Says where on the filesystem it is stored? Who says it is stored locally, and not in some remote database server? Adding checkpoints also gives you a whole slew of additional APIs you need to add. What defines a checkpoint? How many can I have? Are they timestamped? How do I list them? How do I choose them? It opens up a can of worms. I think it is enough to have the basic functionality. |
Ideally it should be some form of URI to support it. By checkpoint I mean that we assume that StorageEngine might have a persistent storage let's say, meaning whenever we reboot system, it's state should be there. So using this interfaces pub trait StorageEngine: PartialEq {
// Restore if exists
fn new(root: PathBuf) -> std::io::Result<Self>
where
Self: Sized;
fn put(&mut self, key: &str, data: Box<[u8]>) -> std::io::Result<()>;
fn delete(&self, key: &str) -> std::io::Result<()>;
fn data(&self, key: &str) -> Arc<[u8]>;
fn key_to_nkv_notation(&self, key: &str) -> String; // not sure how to name this right
} actually abstracts it to the point we want it. Meaning persistence is hidden behind new, they only thing is to switch PathBuf to URI or a String, so that we don't assume that it's somewhere on a file system We do want to be able to restore from a checkpoint because if we are to integrate this to EVE, would be really great if we are backward compatible (because there're some scripts depending on it) of course, we can write some kind of patches, but that implementation won't cost us anything |
You are saying that it isn't just a future thing, but EVE uses checkpoints right now, so nkv would need checkpoints to be able to be used in EVE? Do you mind explaining how and where it is used?
I don't quite get this part. If I have a database and an app in front of it, I don't use checkpoints. I expect the database to provide persistent reliable storage. If the database shuts down, I expect that I start it up again, and it picks up where it was. This relates to the first part of this comment. What is the use case (even just inside EVE) where I need to be able to use checkpoints and restore from them? As opposed to, "I write something to my storage, it is there now and in the future until I delete it. If the storage is persistent (files or database as opposed to just in-memory), then it is there even after I shut down and start up again." Is there some different use case in EVE?
Assuming checkpoints are necessary (see above), then I would think it would be entirely inside the storage engine, and I would leave it up to the storage engine. Something like: pub trait StorageEngine: PartialEq {
fn new() -> std::io::Result<Self>
where
Self: Sized;
fn put(&mut self, key: &str, data: Box<[u8]>) -> std::io::Result<()>;
fn delete(&self, key: &str) -> std::io::Result<()>;
fn get(&self, key: &str) -> Arc<[u8]>;
// checkpoint functions; left out return values
fn set_checkpoint(&self) -> // create a checkpoint, return a string. Even better, return a timestamp
fn delete_checkpoint(&self, key: &str) -> // delete a checkpoint
fn list_checkpoints(&self) ->
fn restore_checkpoint(&self, checkpoint: &str) -> // or maybe a timestamp
} The abstraction is that a storage engine creates a checkpoint, deletes it, lists them, and restores from them. How it does it is entirely up to it. We can quibble as to whether checkpoints should be keyed on arbitrary string vs timestamp (I think timestamp is better). |
Okay I think that I used the wrong word "checkpoint" in this case. I only mean this:
To me there's only latest checkpoint, not like we have to store time-series and be able to restore to a certain point in time, not. In EVE we just need to be able to restore from persistent storage, which has a certain file structure. So I suggest we just remove "checkpoints", because it's the wrong word to use. So what I wanted to say is that StorageEngine abstracts over pub trait StorageEngine {
fn new(uri: String) -> std::io::Result<Self>
where
Self: Sized;
fn put(&mut self, key: &str, data: Box<[u8]>) -> std::io::Result<()>;
fn delete(&self, key: &str) -> std::io::Result<()>;
fn data(&self, key: &str) -> std::io::Result<Arc<[u8]>>;
} And that way caching is also hidden behind the Also this trait defined above doesn't particularly guarantee you persistence, you can easily implement in-memory Let me know if it makes sense to you |
Oh, ok that is totally different. Yeah, checkpoint is the wrong word. IMHO, this is a storage engine decision. You can have a storage engine that is 100% ephemeral in-memory. Stop the process, lose the data. Don't want that? Use a different storage engine.
Got it. So that just a property of storage engine implementations. There is no per-key or per-anything. Storage engine is expected to handle everything.
Exactly!
I am mixed on this. I can see it go both ways. I might want a storage engine with caching, like think about SSDs with faster memory caches on-board in front of the SSD (less of those now with NVMe speeds), or I might do software caching in front of my SSDs via filesystem drivers. But you can punt on this for now.
Much clearer, easier to understand. |
Okay, we actually should relax the interface more and make it this: pub trait StorageEngine {
fn put(&mut self, key: &str, data: Box<[u8]>) -> std::io::Result<()>;
fn delete(&self, key: &str) -> std::io::Result<()>;
fn data(&self, key: &str) -> std::io::Result<Arc<[u8]>>;
} The beauty of it is that let storage = InMemoryStorage::new() // why should I specify URI when I don't need one for in-memory
let mut nkv = NkvCore::new(storage)?;
let storage = FileStorage::<CachingPolicy::LRU>new(pathBuf); // I need PathBuf, because it'll be stored on a FS
let mut nkv = NkvCore::new(storage)?; We won't change a thing inside NkvCore, we only care about put, get, delete, and that's that |
This is to be compatible with EVE-OS PubSub Signed-off-by: Pavel Abramov <[email protected]>
406aba5
to
5571c7d
Compare
@deitch got the PR to much cleaner state. Now traits contain only trait, there's no such thing as Value which combines notification and StorageEngine, the only implicit requirement fro StorageEngine now is that it supports keyspaces, we store and iterate twice over two different Tries, but for now looks cleaner that way |
And cache is no more for values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, seriously a lot cleaner. Nice.
superseding #20