Skip to content
This repository has been archived by the owner on Nov 9, 2020. It is now read-only.

Breaking up haret into a few libraries? #115

Open
erickt opened this issue Jun 8, 2017 · 6 comments
Open

Breaking up haret into a few libraries? #115

erickt opened this issue Jun 8, 2017 · 6 comments

Comments

@erickt
Copy link
Contributor

erickt commented Jun 8, 2017

In the why.md document, there is a discussion about how haret was designed to isolate off the protocol from the client-facing and data-storage parts of the system. Would there be any interest in formalizing this into multiple libraries? I'm personally interested in exploring a Zookeeper client wire-compatible frontend a la zetcd, so having a looser coupling between the subsystems would make this a bit easier to do.

@andrewjstone
Copy link
Contributor

The intention of that section was to allow plugging in different protocol implementations, not to allow different APIs. We always intended the API to be a disjoint set of what zookeeper provides with some new primitives baked in, and other things intentionally left out. e.g instead of allowing ephemeral, auto-incrementing nodes which are often used for leader election, we'd provide a leader election primitive instead. The goal was to provide a ready-made, opinionated system to allow users to safely coordinate their systems, not to build a toolkit, as the toolkit approach makes it harder to use and debug the different combinations of setups in production.

In practice, isolating even the protocol hasn't been perfect. While most of the VR specific code lives in /src/vr, there are artifacts of the fact that Haret uses VR littered throughout the codebase For instance the namespace manager knows about VrCtxs and the 3 different start modes (startup, recovery, reconfiguration) of replicas. Since implementing a consensus system like VR on top of a lightweight process architecture requires a management layer like the namespace manager (using gossip in this case), in order to start replicas on different nodes and learn of new consensus groups after partition, it was easy to fall into this trap. Ideally this management layer would be agnostic of the consensus protocol as well, but I haven't spent the time to go back and fix it. It isn't high on the priority list right now, although it may help provide cleaner, more structured code.

I'm actually in the middle of a major refactoring of the FSMs that I hope to open a PR for in the next week or two. It's possible that all this code could live in it's own VRR library, but I'm not sure how useful it would be outside of Haret. I'm also hesitant to independently version it at this early state, when I'm the primary developer, as it just adds another layer of management for me. The reality of adding another consensus protocol at this juncture is very remote, so taking the time right now to do this is not a priority.

As far as decomposition of the system, a bunch of things are already in their own libraries. Haret relies on rabble for the cluster system and lightweight processes and vertree for the trie based backend.

It is possible to also abstract out the front end API, but it is harder, as the API is heavily tied to the capabilities of the backend. It is also useless in and of itself.

It appears that zetcd is an independent proxy process that sits in front of etcd. That doesn't require splitting up the code at all, but it does require features, such as subscriptions, that aren't yet built into Haret. It also will either require emulating other non-native features such as ephemeral nodes, or not implementing them altogether. That all seems doable, but again isn't really a priority for me right now. My chief goal is building a correct and stable system. After some stability it will be much more actionable to talk about extension and different front end APIs.

In summary, I'm not fully opposed to this idea, but feel it is a bit of a distraction at this early time. If however you see specific parts of the code that you feel are not properly abstracted and should be split out into their own libraries, I am definitely willing to consider that.

erickt added a commit to erickt/haret that referenced this issue Jun 8, 2017
This starts pulling apart haret (vmware-archive#115), specifically cli client
binary into it's own module. The main reason to do this is for a
few reasons. First, it allows us to start framing out a higher
level library interface for Haret (haret-client). Second, it allows
us to shave off some dependencies if we only need a subset for a
particular application. haret-client will need a lot more attention
over time, since right now it just responds with string output.

Note that I've added a .gitignore to `haret-client`, as the
standard practice in the community is to only lock down
dependencies in the application crates, but leave it up to the
library consumers to decide what dependency versions they want
to use.
erickt added a commit to erickt/haret that referenced this issue Jun 8, 2017
This starts pulling apart haret (vmware-archive#115), specifically cli client
binary into it's own module. The main reason to do this is for a
few reasons. First, it allows us to start framing out a higher
level library interface for Haret (haret-client). Second, it allows
us to shave off some dependencies if we only need a subset for a
particular application. haret-client will need a lot more attention
over time, since right now it just responds with string output.

Note that I've added a .gitignore to `haret-client`, as the
standard practice in the community is to only lock down
dependencies in the application crates, but leave it up to the
library consumers to decide what dependency versions they want
to use.
erickt added a commit to erickt/haret that referenced this issue Jun 8, 2017
This starts pulling apart haret (vmware-archive#115), specifically cli client
binary into it's own module. The main reason to do this is for a
few reasons. First, it allows us to start framing out a higher
level library interface for Haret (haret-client). Second, it allows
us to shave off some dependencies if we only need a subset for a
particular application. haret-client will need a lot more attention
over time, since right now it just responds with string output.

Note that I've added a .gitignore to `haret-client`, as the
standard practice in the community is to only lock down
dependencies in the application crates, but leave it up to the
library consumers to decide what dependency versions they want
to use.
@erickt
Copy link
Contributor Author

erickt commented Jun 9, 2017

Hi @andrewjstone! You are welcome of course to want to move at your pace and turn all this down :) As I was starting to go through the code, it seemed like there was a natural decoupling between the interior communication between the nodes, and the client/server communication. At least for me, it seemed like it'd be a little easier to contribute on those portions of haret without needing to have a lot of understanding on how VR works. As best as I can tell, it doesn't seem to hard to pull the client/server out of the core library, and it has the nice benefit of reducing dependencies and revealing what needs to be public and private.

Regarding the zookeeper compatible client interface, that's more of a toy experiment to compare/contrast some workloads. I thought it might be a nice way to get some people from that community to pay some attention to the project. I don't think you should feel compelled to add any features to support it.

@andrewjstone
Copy link
Contributor

Hi @erickt,

I can't tell you how much I appreciate you taking an interest in Haret. After looking at your changes to the cli-client lately and thinking more about this, I am less concerned about pulling things apart. I was never really that concerned about separating the code, but more about having to support multiple APIs. However, there is no reason I have to support multiple APIs :) Community projects are completely fine and reasonable

Additionally, you are correct that the client/server API part is well isolated from the internal communication, so separation shouldn't be that hard. As you state it is also certainly an easier way to start contributing. Furthermore, it could be very useful to have an HTTP interface using JSON in addition to protobuf. Implementing that for the admin client would be most useful in particular.

With all that said, have at it! I will be happy to review any changes you are interested in making, and from what I've seen so far will likely merge them in quickly. If you want to discuss complex things before implementation we can do that also.

Cheers!

@jrgarcia
Copy link
Contributor

@erickt @andrewjstone Should this be closed now that things have been separated accordingly?

@andrewjstone
Copy link
Contributor

andrewjstone commented Aug 30, 2017 via email

@jrgarcia
Copy link
Contributor

Sounds good. I was just looking through here to pick something up and came across this.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants