-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Valkey Command Reference Spaghetti #31
Comments
I agree with goal 2. Regarding goal 1, it's a good idea, but at the same time it's conflicting with another goal that I think is desirable: self-contained docs (goal 4). Regarding goal 3, it's sounds good but where should we store these artifacts? In yet another repo? We'll have dependencies to the artifacts and then we may as well store these artifacts in the doc repo. Here are a few things I would like the doc repo to provide:
I agree that the various (outdated) json files and other junk is unacceptable. It can definitely be improved. but first let's see the current situation and what exactly were talking about. What we haveHistorically (until now) the doc repo had this property (self-contained) at the expense of having generated content checked in. The generated content we're talking about is:
What I'd like
|
Tell me more about your goal 4. I don't think I totally grok what 'self-contained' would mean in this context. As for goal 3, I think that would be something we can store the artifacts using GitHub's artifact storage. It's still on GitHub but it's not stored in the git tree which makes far less problematic.
I think this so somewhat do-able, you and I might approach it slightly differently though. Being smart about it you can alternate the layout that generates the website; Jekyll probably falls into 'unreasonable deps' but there other options for SSGs that are just single, standalone binaries.
I don't think 'complete' is needed here really because lose potentially a lot to attain this goal. As an example, Let's say you want to include a how-to video. You can certainly do this with markdown that works on GitHub( For example, if I just pop a video into the Valkey website using the markdown that works for GitHub markdown preview, it will look something like this on the website: It's not just limited to video embeds, but it also affects nicely usable diagrams, DRYing on notices, interpolate descriptions and titles and a bunch of other things that would be a massive pain to maintain over time.
I think the
This feels like an anti-pattern. You're storing information in two places and checking for synchronization. Modern SSGs can understand the format which Valkey stores the command JSON files. Instead of a CI job that checks it, I'd rather Valkey better utilize these files and store/pull more information from them.
This isn't a bad idea, but the devil would be in the details. If you're creating a CI that makes PRs change the front matter I think you're going to be challenged to prevent people from making (or wanting to make) manual changes. I'm not a fan of having things that can't be edited by a human in GitHub (and that's partially how we got here in the first place). I can definitely see a step that adds the information to the files as part of a build step, but it's very close to the method already used today where the command JSON is just referenced. |
Now you're trolling me. Rick roll. 😆
When building docs (say man pages) locally, getting the markdown source files from a debian source package using something like What I mean by self-contained docs is that the docs package (ideally corresponding 1:1 to the doc repo) contain all of the doc content needed to render the docs. It's not 100% required. We could depend on the code repo (e.g. a distro package called
I don't think we shall have embedded videos in the docs. I even think we shall avoid images, because they don't render well in a man page in a terminal. (There are only 9 images in 459 markdown files currently.) If we want diagrams, it's better to have them as ascii art in fenced code blocks. We can have links to videos but even that's not too nice in a man page. We can have videos in other parts of the website but not in the docs.
DRYing on notices? I don't get it. Do you mean repeating code like Interpolate descriptions and titles? I don't know what you mean here. Nor do I understand "bunch of other things that would be a massive pain to maintain over time". I need concrete examples because I don't really see this.
You're right, it's not nice. The only nice solution to all these goals (1-4) would be a monorepo maybe.
The valkey-server uses those JSON files as input and computes the final ACL categories from a number of factors. If we want to use those JSON files directly and want find out the effective ACL categories, then we must duplicate the logic valkey-server uses to compute the ACL categories. That's why (I've been told) we shouldn't do that. Instead we should fetch the info from a running valkey node. That said, I think we should do it anyway if the alternatives are even worse. ;) Or perhaps better: 💡 We can store those ACL categories explicitly in those JSON files and remove that implicit juggling from valkey-server. Then it's safe to depend on the JSON files directly. (We'd need some logic to check that it's consistent though, but that can be in the python script that generates the C structs from the JSON files.) Now I feel we're getting somewhere. I'd like to make this change so we can actually use those JSON files directly. That'd save us a lot of trouble.
We have some generated content in the code repo too. In generated code, it's pretty standard to insert a comment like "This is generated. Don't edit.". There are comments in YAML so it can be used in frontmatter I think. And if a user still edits some of these fields, a CI job can easily tell them that they did something wrong, just like we have a spell checker in a CI job. I agree it's not ideal but I think it's less bad than the alternatives.
Yes, it's basically the same, just a little less scattered. Perhaps we should just make sure we keep the commands.json updated and add some CI for that first? Of even before that, just do it once manually. Command syntax doesn't change that often anyway. Sorry for a long and spaghetti-structured comment. |
Re: 'unreasonable deps' I'm still a bit unclear about this one, if the artifact is accessible via curl at a public URL that makes it unreasonable? That seems odd. At any rate, I think we can swing something with Re: self-contained docs Okay, yeah, I think we're aligned that we shouldn't need Valkey to create valkey-docs. Re: video/diagrams/images in docs My personal preference is that we don't go through a one-way door on this. Going to least common denominator for content types because of a technical reason will haunt us for years to come. I would rather have some sort of alternate presentation for text-only representations, like alternate links for video or diagrams. (And ASCII art diagrams are sadly a nightmare for accessibility and responsiveness, we probably need to avoid them. I use a screen reader: ASCII art comes out as 5 minutes of a robot saying stuff like 'dot dash dash dot dot plus dot', plus it's hard to skip. Images can have a proper description that is a few seconds and skippable). Re: DRYing on notices I wasn't clear on that. I'm not implying that we need the Hugo
This is a good idea. Right now, as I understand it, the JSON files include part of the ACL picture, which feels like a design mistake. I don't quite get how Valkey infers the other ACL categories (I've looked but not deeply). I'm all for straight lines here. Re: Committing generated code / preventing editing I'd like to minimize this if at all possible, it makes me itch especially if the generated code pulls from other repos. Re: The website doesn't currently have a dependency on it (but we also don't have ACLs either) so I'd rather not take a dependency on something that will go away if individual command JSON files tell the whole story. That being said, I know Valkey isn't being built in a day and writing a small amount of code as a stop-gap may be unavoidable. |
It seems we are from different planets. 😄 In companies (like my employer) the build servers are essentially offline, so they can't by any chance download anything that's not been verified and approved security wise and legally wise. In distros (at least in debian) packages are GPG signed and those who sign them are GPG signed by several other people, so they're highly trusted. (Assuming other distros are similar.) It may not be completely illegal to download stuff from elsewhere but I think it's a bit sketchy to do it without a good reason. It's normal for a package to depend on other package from the same distro package repo though, so for building the docs we can safely depend on valkey source for the JSON files.
If it can be done in Markdown (embedded HTML), then yes this is fine. Another option is to have it on the website outside of the core docs. Getting started guides, how to download and install, etc. don't really belong in man pages anyway. We can have some different styles in different sections.
Can't you skip code blocks? We have lots of output from that must be a nightmare for screen readers, like this: https://github.com/valkey-io/valkey-doc/blob/main/topics/cli.md#continuous-stats-mode And what would you suggest for the git-merge manual?
Thanks. I'm glad you agree on this one.
This rings 3 of my warning bells:
Yeah, it's on my unwritten todo now, but I think I'll start with documenting the JSON files and see where that leads us. Incrementally.
Yeah, I agree. I'm convinced. If we allow depending on the JSON files in the code repo, then we can remove almost all of that. I'm considering adding build step for the docs, like a Makefile, to first build the full markdown files with H1, command syntax and everything. In the next step, these can be fed to a map page converter. Probably that middle-step can be used for the website too. But if such logic is done in Jekyll (etc) it's useful only for the website.
Great! (The redis website did though.) So don't depend on them! I'll try to solve the ACL problem. Let's delete commands.json (but we can wait until we have solved what it provides). I'm not sure we can get rid of the resp3_replies.json files that easily though. That info should go back into the JSON files in the code repo, but the scaffolding (validation, etc.) is not in place yet it seems. It seems we're aligned on most of this now. 🤝 |
Okay, so you're talking about air gapped docs. Got it. I don't see any blockers in using make files and common dependencies to generate any of this.
That's the crux. It really can't. Embedded HTML will break the responsiveness of the website (see the rickroll). I really don't like embedding HTML directly into the markdown: It's likely to break the website in weird and hard to control ways.
I think this might be a really important note: man pages should be a subset of the entire documentation. We can have different rules for man vs non-man pages (which I'm generally fine with).
No, you don't really want to skip code blocks (that's often the whole point). The example you gave from the
Respectfully, I've used this very technique extensively on two very large oss/documentation projects, so the use case is clear and common. Additionally, you do often have to repeat things across documentation; it's a safe assumption that on most documentation visits, users will read a single page or two of documentation and won't explore the links. So, if something needs to be mentioned across multiple pages, you write once and present where needed. Assuming that they'll find it elsewhere is pretty flimsy and re-writing it across many pages will lead to errors and (probable) madness. Super excited when I saw valkey-io/valkey#403 ! I do think we're getting somewhere. |
In a 103 message Slack conversations with @zuiderkwast, we discovered the inherited system of generating the command references has many moving parts and worryingly duplicated and unsynchronized information.
Currently, the website repo has to submodule both
valkey-io/valkey-doc
andvalkey-io/valkey
just to generate the command reference. And if someone needs to change/add arguments to a command in the software, they would need to touch 3 repos.In
valkey-io/valkey
there is a json file for each command. This is used internally by valkey in the argument parser for things like arity, flags, and arguments. Example:ZRANGEBYLEX
individual jsonIn
valkey-io/valkey-doc
there is acommands.json
file which contains all the commands and is indirectly generated from the individual command files (by running an instance of Valkey and extracting the same data from command response with a python script). It ends up with almost the same data, but not quite (missing some pieces and has extra pieces vs the individual command). Example:ZRANGEBYLEX
fromcommands.json
. Of note,commands.json
was out-of-date even before the split.Additionally, there is
resp2_replies.json
andresp3_replies.json
which appear to be generated... some how... but these have been human edited in the redis-doc repo (before the split).The complex architecture here spreads content across many repos (docs, website, valkey itself) and increases the difficulty to do even simple things. This feels like a system that was built iteratively over years without a plan and we should take this opportunity to straighten out the spaghetti.
Goals:
The text was updated successfully, but these errors were encountered: