Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pv naming #22

Merged
merged 3 commits into from
Oct 17, 2024
Merged

Add pv naming #22

merged 3 commits into from
Oct 17, 2024

Conversation

uncleDecart
Copy link
Owner

superseding #20

When printing trie it'll show the keys in hierarchical
way like file system

Signed-off-by: Pavel Abramov <[email protected]>
So if you want to save /my/awesome/file.txt it will create
/my/awesome folders if they don't exist, otherwise it would
return error on writing

Signed-off-by: Pavel Abramov <[email protected]>
@uncleDecart
Copy link
Owner Author

@deitch ready for review

@deitch
Copy link
Collaborator

deitch commented Oct 15, 2024

I don't get what this does. It replaces the second parameter to NkvCore with a trait Converter, but what is Converter? Converts between what and what, and why does core need it? Doesn't it just receive bytes on the channel and pass them to the storage engine? Ok, key string and value bytes, but same thing?

@uncleDecart
Copy link
Owner Author

Converter is to change string to appropriate format for your StorageEngine. So key could be keyspace1.keyspace2.key but in reality you want to store this as keyspace1/keyspace2/key.json on a file system or your current running system is using this structure. In theory I can make it part of FileStorage and say that if you want other formatting, just rewrite FileStorage as you wish. Or I can parametrise that saying that maybe you want your key to be converted to an appropriate format before using the driver. I'm not sure which approach is the best one here.

@deitch
Copy link
Collaborator

deitch commented Oct 15, 2024

I think keep it part of StorageEngine. It is fundamental to a storage engine that it knows how to store things. The interface between core and storage is, "here is a key and value, please store it". How it decides to do it is up to the engine. Much simpler to think about.

@uncleDecart
Copy link
Owner Author

Rewrote it :)

@deitch
Copy link
Collaborator

deitch commented Oct 15, 2024

Looks cleaner.

This part here looks like NkvCore is reading from files? Why is that in core and not part of the storage engine?

@uncleDecart
Copy link
Owner Author

@deitch we need to restore all the persist values from the checkpoints from file system, we should make it part of storage engine, I can abstract it and return some kind of vector of StorageEngine

@uncleDecart
Copy link
Owner Author

uncleDecart commented Oct 15, 2024

What do you think about StroageEngine instead of using this API

pub trait StorageEngine: PartialEq {
    fn new(data: Box<[u8]>, key: &str, root: PathBuf) -> std::io::Result<Self>
    where
        Self: Sized;

    fn from_checkpoint(filepath: &Path, root: PathBuf) -> std::io::Result<Self>
    where
        Self: Sized;
    fn update(&mut self, new_data: Box<[u8]>) -> std::io::Result<()>;
    fn delete_checkpoint(&self) -> std::io::Result<()>;
    fn data(&self) -> Arc<[u8]>;
    fn key(&self) -> String;
}

would use this

pub trait StorageEngine: PartialEq {
    // Restore if exists
    fn new(root: PathBuf) -> std::io::Result<Self>
    where
        Self: Sized;

    fn put(&mut self, key: &str, data: Box<[u8]>) -> std::io::Result<()>;    
    fn delete(&self, key: &str) -> std::io::Result<()>;
    fn data(&self, key: &str) -> Arc<[u8]>;
    fn key_to_nkv_notation(&self, key: &str) -> String; // not sure how to name this right
}

Meaning that StorageEngine actually oversees the whole collection of entries rather than we create an object of StorageEngine for each entry. So on a high level I'm saying this thing handles all my entries and it takes care of part of nkv API

Edit: I think this looks much cleaner from developer perspective

@deitch
Copy link
Collaborator

deitch commented Oct 15, 2024

Why is checkpoint outside of storage engine?

More fundamentally, I am not convinced you need checkpoint at all at this stage. It is an advanced feature. Having the ability to get/put/delete/subscribe/unsubscribe is plenty for now.

Assuming you do need it, it is entirely within the storage engine, and should not leak out the abstraction for storage. For example:

    fn from_checkpoint(filepath: &Path, root: PathBuf) -> std::io::Result<Self>

Says where on the filesystem it is stored? Who says it is stored locally, and not in some remote database server?

Adding checkpoints also gives you a whole slew of additional APIs you need to add. What defines a checkpoint? How many can I have? Are they timestamped? How do I list them? How do I choose them?

It opens up a can of worms. I think it is enough to have the basic functionality.

@uncleDecart
Copy link
Owner Author

Says where on the filesystem it is stored? Who says it is stored locally, and not in some remote database server?

Ideally it should be some form of URI to support it. By checkpoint I mean that we assume that StorageEngine might have a persistent storage let's say, meaning whenever we reboot system, it's state should be there. So using this interfaces

pub trait StorageEngine: PartialEq {
    // Restore if exists
    fn new(root: PathBuf) -> std::io::Result<Self>
    where
        Self: Sized;

    fn put(&mut self, key: &str, data: Box<[u8]>) -> std::io::Result<()>;    
    fn delete(&self, key: &str) -> std::io::Result<()>;
    fn data(&self, key: &str) -> Arc<[u8]>;
    fn key_to_nkv_notation(&self, key: &str) -> String; // not sure how to name this right
}

actually abstracts it to the point we want it. Meaning persistence is hidden behind new, they only thing is to switch PathBuf to URI or a String, so that we don't assume that it's somewhere on a file system

We do want to be able to restore from a checkpoint because if we are to integrate this to EVE, would be really great if we are backward compatible (because there're some scripts depending on it) of course, we can write some kind of patches, but that implementation won't cost us anything

@deitch
Copy link
Collaborator

deitch commented Oct 16, 2024

We do want to be able to restore from a checkpoint because if we are to integrate this to EVE, would be really great if we are backward compatible (because there're some scripts depending on it) of course, we can write some kind of patches, but that implementation won't cost us anything

You are saying that it isn't just a future thing, but EVE uses checkpoints right now, so nkv would need checkpoints to be able to be used in EVE? Do you mind explaining how and where it is used?

Ideally it should be some form of URI to support it. By checkpoint I mean that we assume that StorageEngine might have a persistent storage let's say, meaning whenever we reboot system, it's state should be there

I don't quite get this part. If I have a database and an app in front of it, I don't use checkpoints. I expect the database to provide persistent reliable storage. If the database shuts down, I expect that I start it up again, and it picks up where it was.

This relates to the first part of this comment. What is the use case (even just inside EVE) where I need to be able to use checkpoints and restore from them? As opposed to, "I write something to my storage, it is there now and in the future until I delete it. If the storage is persistent (files or database as opposed to just in-memory), then it is there even after I shut down and start up again." Is there some different use case in EVE?

actually abstracts it to the point we want it. Meaning persistence is hidden behind new, they only thing is to switch PathBuf to URI or a String, so that we don't assume that it's somewhere on a file system

Assuming checkpoints are necessary (see above), then I would think it would be entirely inside the storage engine, and I would leave it up to the storage engine. Something like:

pub trait StorageEngine: PartialEq {
    fn new() -> std::io::Result<Self>
    where
        Self: Sized;

    fn put(&mut self, key: &str, data: Box<[u8]>) -> std::io::Result<()>;    
    fn delete(&self, key: &str) -> std::io::Result<()>;
    fn get(&self, key: &str) -> Arc<[u8]>;

    // checkpoint functions; left out return values
    fn set_checkpoint(&self) ->  // create a checkpoint, return a string. Even better, return a timestamp
    fn delete_checkpoint(&self, key: &str) ->  // delete a checkpoint
    fn list_checkpoints(&self) -> 
    fn restore_checkpoint(&self, checkpoint: &str) -> // or maybe a timestamp 
}

The abstraction is that a storage engine creates a checkpoint, deletes it, lists them, and restores from them. How it does it is entirely up to it. We can quibble as to whether checkpoints should be keyed on arbitrary string vs timestamp (I think timestamp is better).

@uncleDecart
Copy link
Owner Author

Okay I think that I used the wrong word "checkpoint" in this case. I only mean this:

I expect the database to provide persistent reliable storage. If the database shuts down, I expect that I start it up again, and it picks up where it was.

To me there's only latest checkpoint, not like we have to store time-series and be able to restore to a certain point in time, not. In EVE we just need to be able to restore from persistent storage, which has a certain file structure. So I suggest we just remove "checkpoints", because it's the wrong word to use.

So what I wanted to say is that StorageEngine abstracts over get, put, delete and it handles "collection" of data, not just a particular data entry. So in NkvCore I'll create only one StorageEntry object through which I'll be interacting with how I want to store my values. With that said, the only API I'll need is this

pub trait StorageEngine {
    fn new(uri: String) -> std::io::Result<Self>
    where
        Self: Sized;

    fn put(&mut self, key: &str, data: Box<[u8]>) -> std::io::Result<()>;    
    fn delete(&self, key: &str) -> std::io::Result<()>;
    fn data(&self, key: &str) -> std::io::Result<Arc<[u8]>>;
}

And that way caching is also hidden behind the StorageEngine abstraction, meaning from NkvCore perspective I don't care how and where you store those values, just give me the value with this key. You want to do tricks with LRU, Full-cache, partial-cache, etc.etc. please feel free to do so, just implement StorageEngine trait so that I can access values. I think that it makes things cleaner.

Also this trait defined above doesn't particularly guarantee you persistence, you can easily implement in-memory StorageEngine or any other that you want.

Let me know if it makes sense to you

@deitch
Copy link
Collaborator

deitch commented Oct 16, 2024

To me there's only latest checkpoint, not like we have to store time-series and be able to restore to a certain point in time, not. In EVE we just need to be able to restore from persistent storage, which has a certain file structure. So I suggest we just remove "checkpoints", because it's the wrong word to use.

Oh, ok that is totally different. Yeah, checkpoint is the wrong word.

IMHO, this is a storage engine decision. You can have a storage engine that is 100% ephemeral in-memory. Stop the process, lose the data. Don't want that? Use a different storage engine.

So what I wanted to say is that StorageEngine abstracts over get, put, delete and it handles "collection" of data, not just a particular data entry. So in NkvCore I'll create only one StorageEntry object through which I'll be interacting with how I want to store my values. With that said, the only API I'll need is this

Got it. So that just a property of storage engine implementations. There is no per-key or per-anything. Storage engine is expected to handle everything.

Also this trait defined above doesn't particularly guarantee you persistence, you can easily implement in-memory StorageEngine or any other that you want.

Exactly!

And that way caching is also hidden behind the StorageEngine abstraction, meaning from NkvCore perspective I don't care how and where you store those values, just give me the value with this key. You want to do tricks with LRU, Full-cache, partial-cache, etc.etc. please feel free to do so, just implement StorageEngine trait so that I can access values.

I am mixed on this. I can see it go both ways. I might want a storage engine with caching, like think about SSDs with faster memory caches on-board in front of the SSD (less of those now with NVMe speeds), or I might do software caching in front of my SSDs via filesystem drivers. But you can punt on this for now.

I think that it makes things cleaner.

Much clearer, easier to understand.

@uncleDecart
Copy link
Owner Author

I am mixed on this. I can see it go both ways. I might want a storage engine with caching, like think about SSDs with faster memory caches on-board in front of the SSD (less of those now with NVMe speeds), or I might do software caching in front of my SSDs via filesystem drivers. But you can punt on this for now.

Okay, we actually should relax the interface more and make it this:

pub trait StorageEngine {
    fn put(&mut self, key: &str, data: Box<[u8]>) -> std::io::Result<()>;    
    fn delete(&self, key: &str) -> std::io::Result<()>;
    fn data(&self, key: &str) -> std::io::Result<Arc<[u8]>>;
}

The beauty of it is that NkvCore cares only about implementing trait for StorageEngine, so in future what we can do is this:

let storage = InMemoryStorage::new()  // why should I specify URI when I don't need one for in-memory
let mut nkv = NkvCore::new(storage)?;

let storage = FileStorage::<CachingPolicy::LRU>new(pathBuf); // I need PathBuf, because it'll be stored on a FS 
let mut nkv = NkvCore::new(storage)?;

We won't change a thing inside NkvCore, we only care about put, get, delete, and that's that

This is to be compatible with EVE-OS PubSub

Signed-off-by: Pavel Abramov <[email protected]>
@uncleDecart
Copy link
Owner Author

@deitch got the PR to much cleaner state. Now traits contain only trait, there's no such thing as Value which combines notification and StorageEngine, the only implicit requirement fro StorageEngine now is that it supports keyspaces, we store and iterate twice over two different Tries, but for now looks cleaner that way

@uncleDecart
Copy link
Owner Author

And cache is no more for values

Copy link
Collaborator

@deitch deitch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, seriously a lot cleaner. Nice.

@uncleDecart uncleDecart merged commit 78c3fd6 into main Oct 17, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants