Add pv naming #22

uncleDecart · 2024-10-14T13:32:06Z

superseding #20

When printing trie it'll show the keys in hierarchical way like file system Signed-off-by: Pavel Abramov <[email protected]>

So if you want to save /my/awesome/file.txt it will create /my/awesome folders if they don't exist, otherwise it would return error on writing Signed-off-by: Pavel Abramov <[email protected]>

uncleDecart · 2024-10-14T13:40:53Z

@deitch ready for review

deitch · 2024-10-15T08:09:52Z

I don't get what this does. It replaces the second parameter to NkvCore with a trait Converter, but what is Converter? Converts between what and what, and why does core need it? Doesn't it just receive bytes on the channel and pass them to the storage engine? Ok, key string and value bytes, but same thing?

uncleDecart · 2024-10-15T08:13:55Z

Converter is to change string to appropriate format for your StorageEngine. So key could be keyspace1.keyspace2.key but in reality you want to store this as keyspace1/keyspace2/key.json on a file system or your current running system is using this structure. In theory I can make it part of FileStorage and say that if you want other formatting, just rewrite FileStorage as you wish. Or I can parametrise that saying that maybe you want your key to be converted to an appropriate format before using the driver. I'm not sure which approach is the best one here.

deitch · 2024-10-15T08:16:00Z

I think keep it part of StorageEngine. It is fundamental to a storage engine that it knows how to store things. The interface between core and storage is, "here is a key and value, please store it". How it decides to do it is up to the engine. Much simpler to think about.

uncleDecart · 2024-10-15T11:56:11Z

Rewrote it :)

deitch · 2024-10-15T13:37:31Z

Looks cleaner.

This part here looks like NkvCore is reading from files? Why is that in core and not part of the storage engine?

uncleDecart · 2024-10-15T15:37:17Z

@deitch we need to restore all the persist values from the checkpoints from file system, we should make it part of storage engine, I can abstract it and return some kind of vector of StorageEngine

uncleDecart · 2024-10-15T15:51:25Z

What do you think about StroageEngine instead of using this API

pub trait StorageEngine: PartialEq {
    fn new(data: Box<[u8]>, key: &str, root: PathBuf) -> std::io::Result<Self>
    where
        Self: Sized;

    fn from_checkpoint(filepath: &Path, root: PathBuf) -> std::io::Result<Self>
    where
        Self: Sized;
    fn update(&mut self, new_data: Box<[u8]>) -> std::io::Result<()>;
    fn delete_checkpoint(&self) -> std::io::Result<()>;
    fn data(&self) -> Arc<[u8]>;
    fn key(&self) -> String;
}

would use this

pub trait StorageEngine: PartialEq {
    // Restore if exists
    fn new(root: PathBuf) -> std::io::Result<Self>
    where
        Self: Sized;

    fn put(&mut self, key: &str, data: Box<[u8]>) -> std::io::Result<()>;    
    fn delete(&self, key: &str) -> std::io::Result<()>;
    fn data(&self, key: &str) -> Arc<[u8]>;
    fn key_to_nkv_notation(&self, key: &str) -> String; // not sure how to name this right
}

Meaning that StorageEngine actually oversees the whole collection of entries rather than we create an object of StorageEngine for each entry. So on a high level I'm saying this thing handles all my entries and it takes care of part of nkv API

Edit: I think this looks much cleaner from developer perspective

deitch · 2024-10-15T18:00:21Z

Why is checkpoint outside of storage engine?

More fundamentally, I am not convinced you need checkpoint at all at this stage. It is an advanced feature. Having the ability to get/put/delete/subscribe/unsubscribe is plenty for now.

Assuming you do need it, it is entirely within the storage engine, and should not leak out the abstraction for storage. For example:

    fn from_checkpoint(filepath: &Path, root: PathBuf) -> std::io::Result<Self>

Says where on the filesystem it is stored? Who says it is stored locally, and not in some remote database server?

Adding checkpoints also gives you a whole slew of additional APIs you need to add. What defines a checkpoint? How many can I have? Are they timestamped? How do I list them? How do I choose them?

It opens up a can of worms. I think it is enough to have the basic functionality.

uncleDecart · 2024-10-15T18:49:38Z

Says where on the filesystem it is stored? Who says it is stored locally, and not in some remote database server?

Ideally it should be some form of URI to support it. By checkpoint I mean that we assume that StorageEngine might have a persistent storage let's say, meaning whenever we reboot system, it's state should be there. So using this interfaces

pub trait StorageEngine: PartialEq {
    // Restore if exists
    fn new(root: PathBuf) -> std::io::Result<Self>
    where
        Self: Sized;

    fn put(&mut self, key: &str, data: Box<[u8]>) -> std::io::Result<()>;    
    fn delete(&self, key: &str) -> std::io::Result<()>;
    fn data(&self, key: &str) -> Arc<[u8]>;
    fn key_to_nkv_notation(&self, key: &str) -> String; // not sure how to name this right
}

actually abstracts it to the point we want it. Meaning persistence is hidden behind new, they only thing is to switch PathBuf to URI or a String, so that we don't assume that it's somewhere on a file system

We do want to be able to restore from a checkpoint because if we are to integrate this to EVE, would be really great if we are backward compatible (because there're some scripts depending on it) of course, we can write some kind of patches, but that implementation won't cost us anything

deitch · 2024-10-16T09:48:35Z

We do want to be able to restore from a checkpoint because if we are to integrate this to EVE, would be really great if we are backward compatible (because there're some scripts depending on it) of course, we can write some kind of patches, but that implementation won't cost us anything

You are saying that it isn't just a future thing, but EVE uses checkpoints right now, so nkv would need checkpoints to be able to be used in EVE? Do you mind explaining how and where it is used?

Ideally it should be some form of URI to support it. By checkpoint I mean that we assume that StorageEngine might have a persistent storage let's say, meaning whenever we reboot system, it's state should be there

I don't quite get this part. If I have a database and an app in front of it, I don't use checkpoints. I expect the database to provide persistent reliable storage. If the database shuts down, I expect that I start it up again, and it picks up where it was.

This relates to the first part of this comment. What is the use case (even just inside EVE) where I need to be able to use checkpoints and restore from them? As opposed to, "I write something to my storage, it is there now and in the future until I delete it. If the storage is persistent (files or database as opposed to just in-memory), then it is there even after I shut down and start up again." Is there some different use case in EVE?

actually abstracts it to the point we want it. Meaning persistence is hidden behind new, they only thing is to switch PathBuf to URI or a String, so that we don't assume that it's somewhere on a file system

Assuming checkpoints are necessary (see above), then I would think it would be entirely inside the storage engine, and I would leave it up to the storage engine. Something like:

pub trait StorageEngine: PartialEq {
    fn new() -> std::io::Result<Self>
    where
        Self: Sized;

    fn put(&mut self, key: &str, data: Box<[u8]>) -> std::io::Result<()>;    
    fn delete(&self, key: &str) -> std::io::Result<()>;
    fn get(&self, key: &str) -> Arc<[u8]>;

    // checkpoint functions; left out return values
    fn set_checkpoint(&self) ->  // create a checkpoint, return a string. Even better, return a timestamp
    fn delete_checkpoint(&self, key: &str) ->  // delete a checkpoint
    fn list_checkpoints(&self) -> 
    fn restore_checkpoint(&self, checkpoint: &str) -> // or maybe a timestamp 
}

The abstraction is that a storage engine creates a checkpoint, deletes it, lists them, and restores from them. How it does it is entirely up to it. We can quibble as to whether checkpoints should be keyed on arbitrary string vs timestamp (I think timestamp is better).

uncleDecart · 2024-10-16T12:36:11Z

Okay I think that I used the wrong word "checkpoint" in this case. I only mean this:

I expect the database to provide persistent reliable storage. If the database shuts down, I expect that I start it up again, and it picks up where it was.

To me there's only latest checkpoint, not like we have to store time-series and be able to restore to a certain point in time, not. In EVE we just need to be able to restore from persistent storage, which has a certain file structure. So I suggest we just remove "checkpoints", because it's the wrong word to use.

So what I wanted to say is that StorageEngine abstracts over get, put, delete and it handles "collection" of data, not just a particular data entry. So in NkvCore I'll create only one StorageEntry object through which I'll be interacting with how I want to store my values. With that said, the only API I'll need is this

pub trait StorageEngine {
    fn new(uri: String) -> std::io::Result<Self>
    where
        Self: Sized;

    fn put(&mut self, key: &str, data: Box<[u8]>) -> std::io::Result<()>;    
    fn delete(&self, key: &str) -> std::io::Result<()>;
    fn data(&self, key: &str) -> std::io::Result<Arc<[u8]>>;
}

And that way caching is also hidden behind the StorageEngine abstraction, meaning from NkvCore perspective I don't care how and where you store those values, just give me the value with this key. You want to do tricks with LRU, Full-cache, partial-cache, etc.etc. please feel free to do so, just implement StorageEngine trait so that I can access values. I think that it makes things cleaner.

Also this trait defined above doesn't particularly guarantee you persistence, you can easily implement in-memory StorageEngine or any other that you want.

Let me know if it makes sense to you

deitch · 2024-10-16T12:42:08Z

To me there's only latest checkpoint, not like we have to store time-series and be able to restore to a certain point in time, not. In EVE we just need to be able to restore from persistent storage, which has a certain file structure. So I suggest we just remove "checkpoints", because it's the wrong word to use.

Oh, ok that is totally different. Yeah, checkpoint is the wrong word.

IMHO, this is a storage engine decision. You can have a storage engine that is 100% ephemeral in-memory. Stop the process, lose the data. Don't want that? Use a different storage engine.

So what I wanted to say is that StorageEngine abstracts over get, put, delete and it handles "collection" of data, not just a particular data entry. So in NkvCore I'll create only one StorageEntry object through which I'll be interacting with how I want to store my values. With that said, the only API I'll need is this

Got it. So that just a property of storage engine implementations. There is no per-key or per-anything. Storage engine is expected to handle everything.

Also this trait defined above doesn't particularly guarantee you persistence, you can easily implement in-memory StorageEngine or any other that you want.

Exactly!

And that way caching is also hidden behind the StorageEngine abstraction, meaning from NkvCore perspective I don't care how and where you store those values, just give me the value with this key. You want to do tricks with LRU, Full-cache, partial-cache, etc.etc. please feel free to do so, just implement StorageEngine trait so that I can access values.

I am mixed on this. I can see it go both ways. I might want a storage engine with caching, like think about SSDs with faster memory caches on-board in front of the SSD (less of those now with NVMe speeds), or I might do software caching in front of my SSDs via filesystem drivers. But you can punt on this for now.

I think that it makes things cleaner.

Much clearer, easier to understand.

uncleDecart · 2024-10-16T12:56:52Z

I am mixed on this. I can see it go both ways. I might want a storage engine with caching, like think about SSDs with faster memory caches on-board in front of the SSD (less of those now with NVMe speeds), or I might do software caching in front of my SSDs via filesystem drivers. But you can punt on this for now.

Okay, we actually should relax the interface more and make it this:

pub trait StorageEngine {
    fn put(&mut self, key: &str, data: Box<[u8]>) -> std::io::Result<()>;    
    fn delete(&self, key: &str) -> std::io::Result<()>;
    fn data(&self, key: &str) -> std::io::Result<Arc<[u8]>>;
}

The beauty of it is that NkvCore cares only about implementing trait for StorageEngine, so in future what we can do is this:

let storage = InMemoryStorage::new()  // why should I specify URI when I don't need one for in-memory
let mut nkv = NkvCore::new(storage)?;

let storage = FileStorage::<CachingPolicy::LRU>new(pathBuf); // I need PathBuf, because it'll be stored on a FS 
let mut nkv = NkvCore::new(storage)?;

We won't change a thing inside NkvCore, we only care about put, get, delete, and that's that

This is to be compatible with EVE-OS PubSub Signed-off-by: Pavel Abramov <[email protected]>

uncleDecart · 2024-10-17T09:29:29Z

@deitch got the PR to much cleaner state. Now traits contain only trait, there's no such thing as Value which combines notification and StorageEngine, the only implicit requirement fro StorageEngine now is that it supports keyspaces, we store and iterate twice over two different Tries, but for now looks cleaner that way

uncleDecart · 2024-10-17T09:29:42Z

And cache is no more for values

deitch

Yeah, seriously a lot cleaner. Nice.

uncleDecart added 2 commits October 14, 2024 12:06

Add pretty Display for Trie

5c33c7f

When printing trie it'll show the keys in hierarchical way like file system Signed-off-by: Pavel Abramov <[email protected]>

Create parent folders if they are specified for FileStorage

e729905

So if you want to save /my/awesome/file.txt it will create /my/awesome folders if they don't exist, otherwise it would return error on writing Signed-off-by: Pavel Abramov <[email protected]>

uncleDecart mentioned this pull request Oct 14, 2024

Pv naming #20

Closed

uncleDecart force-pushed the add-pv-naming branch from dd8a615 to 5a9e6c2 Compare October 14, 2024 13:40

uncleDecart force-pushed the add-pv-naming branch from 5a9e6c2 to 406aba5 Compare October 15, 2024 11:55

FileStorage: make keyspaces as folders and save files as json

5571c7d

This is to be compatible with EVE-OS PubSub Signed-off-by: Pavel Abramov <[email protected]>

uncleDecart force-pushed the add-pv-naming branch from 406aba5 to 5571c7d Compare October 17, 2024 07:20

deitch approved these changes Oct 17, 2024

View reviewed changes

uncleDecart merged commit 78c3fd6 into main Oct 17, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pv naming #22

Add pv naming #22

uncleDecart commented Oct 14, 2024

uncleDecart commented Oct 14, 2024

deitch commented Oct 15, 2024

uncleDecart commented Oct 15, 2024

deitch commented Oct 15, 2024

uncleDecart commented Oct 15, 2024

deitch commented Oct 15, 2024

uncleDecart commented Oct 15, 2024

uncleDecart commented Oct 15, 2024 •

edited

Loading

deitch commented Oct 15, 2024

uncleDecart commented Oct 15, 2024

deitch commented Oct 16, 2024

uncleDecart commented Oct 16, 2024

deitch commented Oct 16, 2024

uncleDecart commented Oct 16, 2024

uncleDecart commented Oct 17, 2024

uncleDecart commented Oct 17, 2024

deitch left a comment

Add pv naming #22

Add pv naming #22

Conversation

uncleDecart commented Oct 14, 2024

uncleDecart commented Oct 14, 2024

deitch commented Oct 15, 2024

uncleDecart commented Oct 15, 2024

deitch commented Oct 15, 2024

uncleDecart commented Oct 15, 2024

deitch commented Oct 15, 2024

uncleDecart commented Oct 15, 2024

uncleDecart commented Oct 15, 2024 • edited Loading

deitch commented Oct 15, 2024

uncleDecart commented Oct 15, 2024

deitch commented Oct 16, 2024

uncleDecart commented Oct 16, 2024

deitch commented Oct 16, 2024

uncleDecart commented Oct 16, 2024

uncleDecart commented Oct 17, 2024

uncleDecart commented Oct 17, 2024

deitch left a comment

Choose a reason for hiding this comment

uncleDecart commented Oct 15, 2024 •

edited

Loading