feat(derive): add markdown parsing of doc comments in derive #4444

azdle · 2022-11-02T22:58:19Z

This is an early draft of a PR that will resolve #2389. I wanted to get some feedback on intent in a few places to make sure I'm doing the right thing.

This largely takes the tack that was laid out in #2389.

Markdown parsing is done within derive.

This was all but explicitly stated, and is, I believe, the right place to do the parsing.

Then a StyledStr is output into the user's expanded code.

StyledStr isn't explicitly mentioned, but it's what the generated function takes and what Colorizer uses internally, so I assume this is right.

Though, StyledStr doesn't have things like bold and italic itself directly and I'm not sure if it's the right place to add it. I know you also have anstyle, but I'm not sure that makes sense to use here since StyledStr seems to denote more semantic contexts. The biggest problem there is I'm not sure what to do when some text is bold and italic or worse a heading that is partly styled or any other combination of things since it only tracks a single enum variant for the style.

This PR also makes the basic styling functions on StyledStr pub so that they can be used from the derived code. I'm not sure if that's acceptable or not. We could hide them from the docs if you'd prefer they weren't fully public.

Markdown is parsed with pulldown-cmark

pulldown-cmark isn't perfectly suited to what I'm doing here, but I think it's the only real option that will work.

pulldown-cmark seems to assume you'd only ever want to output html, and omits telling us a lot of information that would be helpful for outputting back to psuedo markdown. Doesn't let me do everything I want, but is good enough.
comrak parses as github flavored markdown, but doc comments in rust are parsed by Rust as commonmark markdown. It's much heavier than pulldown-cmark. Though it does have a somewhat better API for what we're trying to accomplish. Didn't try it though, don't know what other problems we'd run into.
minimad doesn't parse blocks and only parses some inlines. To be able to use this we'd have the write the first half of a markdown parser anyway.
mini_markdown, as mentioned in your rosetta repo, doesn't have a stable API yet and from a cursory glance doesn't seem complete enough. May be useful in the future.
[a custom parser] I actually started on this idea before realizing just how complex of a thing that would end up being. Best not to make an already large project 10x larger.

Formatting

As far as I'm aware there's not canonical representation of formatted markdown in the terminal. As part of this PR I'm essentially inventing one. A lot of things can't be represented in any native way in terminal output and will "fallback to markdown" for which I just mean output text that looks like markdown. Markdown was based on how people were already writings things in plaintext files after all.

Most of these aren't done, I wanted to get feedback before I go about implementing some of the harder ones. There are some that require a lot of other probably wide-reaching changes, pretty much anything that requires nested styling.

Block Elements

Thematic breaks

AKA a horizontal rule:

Fallback to markdown, insert a --- line.

Could potentially write -s across the whole terminal/written area if I have that info available at the time. This would require a StyledStr with a new Style and no actual str.

Headings

Unsure. Probably set to StyledStr::heading(), and ignore level.

In the future could add levels to StyledStr::heading, not sure how headings other than the first level should even be styled though.

Could also explicitly disallow more than a certain number of heading levels.

Code Blocks

Fallback to markdown but always output in the style of an indented code block.

Could consider syntax highlighting in the future, that would require setting colors on segments within the codeblock.

HTML Blocks

Probably just drop them and all of their content. I can't think of anything else to do with them.

Paragraphs

Merge into a single line of text that is wrapped as appropriate during output. (Though see also Hard/Soft Line Break.)

Blank Lines

That is extra blank lines beyond those that determine block structure.

Ignored. I believe this is essentially what already happens to doc comments.

Quotes

Fallback to markdown, prefix lines of block with > .

This is the big problem child with styling. it probably needs to be a single StyledStr::quote(), but that then means that nothing inside it can be styled in any way. This would have to be done this way so that at output time the quoted things can wrap, but still be prepended. Having any internal styling would depend on a StyledStr that can contain other internal StyledStr.

Lists

Fall back to markdown, but with proper item numbering & unified bullets.

Output plain text markdown so that inner styling works. This might cause minor formatting problems with wrapping, I haven't looked yet at how exactly that works.

Inlines

Code Span

Make bold, this seems to be how "code" in manpages gets formatted for the terminal.

Emphasis and strong emphasis

Emphasis -> italic, Strong Emphasis -> Bold

(Note: don't currently have a way to represent both at the same time, currently will just choose the innermost one. This could be solved with a special case BoldItalic style if we wanted to fix it short term.)

~~strikethrough~~

Show striked.

(Though strikethrough isn't supported in termcolor, but there is an open PR for it. This would have to wait for that or a change over to anstyle. For now just fallback to markdown

Links

I don't have a good answer for this. Currently this just removed the link syntax and URL completely so that help text doesn't get too verbose.

Could possibly keep that for short help text & for manpages convert all links to ref-style links. Long help could go either way.

Images

Removed.

Could consider keeping the alt/title text.

Autolinks

Fallback to markdown. This is just a link in < > so some terminals will detect it as a link anyway.

HTML

Again, remove.

Hard Line Break

This is when you have
two spaces at the end of a line
which forces a line break.

Insert a \n into the text.

Soft Line Break

Join the lines with a space.

Text

Is text, show text.

Stubs

This code also contains (or will contain) some stubs for future expansion:

Manual Sections

Right now the doc comment is parsed into short_about and long_about implicitly based on the first blank line. This keeps that (assuming the first non-blank line is text and not something else), but also adds splitting out anything with a heading as manual content. (Though nothing is done with that content at the moment, I'm assuming it would make sense to make it available to clap_mangen in the future.)

Markdown Manuals

There's a few things that only make sense in this context. We want to render our manpages in our reference docs site. Eventually I want to be able to build a "markdown manpage" out of the doc comment markdown without round tripping through roff first.

tl;dr

Hooray markdown!
What's the future of StyledStr? Should it morph to support nested styles?
Any issues with the proposed formatting above?
Do you have a preference between getting some basic formatting in quickly or getting as close to handling all of the formatting that markdown can possibly represent properly handled before merging?

TODO

finish translation of all desired blocks and inlines
feature flag parsing & pulldown-cmark dependency
docs
[... lots of other things]

azdle · 2022-11-02T23:14:00Z

And after all that I forgot to include the example. The example from the included test:

Source

/// A remarkable tool
///
/// This tool is remarkable becasue its help is generated from **markdown**!
///
/// It has some _italic_ text.
///
/// This is a [link](https://example.com). (But not in the help output.)
///
/// This paragraph has `inline code`.
///
/// This is also a paragraph, but unlike the other paragraphs, this one is
/// spread across multiple lines. Or at least it is in the markdown.
#[derive(Parser, Debug)]
//#[command(verbatim_doc_comment)]
struct Remarkable {
    /// Markdown _is_ __also__ handled here.
    #[arg(long)]
    foo: bool,
}

Rendered

(Note: the underlining is being filtered out of the HTML by github here.)

A remarkable tool

This tool is remarkable because its help is generated from markdown!

It has some italic text.

This is a link. (But not in the help output.)

This paragraph has inline code.

This is also a paragraph, but unlike the other paragraphs, this one is spread across multiple lines. Or at least it is in the markdown.



Usage: clap [OPTIONS]

Options:
      --foo
          Markdown is also handled here.
          
          

  -h, --help
          Print help information (use `-h` for a summary)

Obviously it has some problems but it is help output styled with markdown. :D

epage

We need to work out more details on StyledStr before we can have conversations on what to do about this generally and about how to map markdown to the terminal

epage · 2022-11-03T02:03:52Z

clap_derive/Cargo.toml

@@ -42,6 +42,7 @@ quote = "1.0.9"
 proc-macro2 = "1.0.42"
 heck = "0.4.0"
 proc-macro-error = "1"
+pulldown-cmark = "0.9"


minimad doesn't parse blocks and only parses some inlines. To be able to use this we'd have the write the first half of a markdown parser anyway.

I wonder how much of a problem that will actually be, especially compared to what we do now. The build times are significantly different than pulldown

https://github.com/rosetta-rs/md-rosetta-rs

Yep, I get that. Unfortunately, the way that it accomplishes that is by not really being a markdown parser. It's a parser for something that sorta looks like a subset of markdown.

Honestly I think that not supporting markdown is a better option than supporting minimad's version of "markdown".

There's some things that look like they're just bugs that we could fix that I can only assume would be accepted, like:

Inlines aren't parsed in lists:

* some bullet item * some _bullet_ item * some bullet item

Normal(Composite { style: ListItem, compounds: ["some bullet item"] }) Normal(Composite { style: ListItem, compounds: ["some _bullet_ item"] }) Normal(Composite { style: ListItem, compounds: ["some bullet item"] })

There's some things that I'm not sure if they would consider them to be a bug, but we could technically workaround:

Each line of a paragraph is considered a paragraph.

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Dictum at tempor commodo ullamcorper a lacus vestibulum sed arcu.

Normal(Composite { style: Paragraph, compounds: ["Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do"] }) Normal(Composite { style: Paragraph, compounds: ["eiusmod tempor incididunt ut labore et dolore magna aliqua. Dictum at"] }) Normal(Composite { style: Paragraph, compounds: ["tempor commodo ullamcorper a lacus vestibulum sed arcu."] })

Sub-lists (bonus points for parsing a single unconnected * as the start of an italic block)

* some bullet item * some bullet item * some bullet item

Normal(Composite { style: ListItem, compounds: ["some bullet item"] }) Normal(Composite { style: Paragraph, compounds: [" ", I" some bullet item"] }) Normal(Composite { style: ListItem, compounds: ["some bullet item"] })

Only one block level is supported:

> # foobar

Normal(Composite { style: Quote, compounds: ["# foobar"] })

There's way more, these were just the most obviously wrong of the few things I tried. Honestly I think it'd be easier and faster to write a new markdown parser than to pre-parse and re-parse around this crate to get reasonable output of any sort of basic markdown document. That would put a major dent in its lead in your stats too.

tl;dr: We should heed the warning on that crate that says:

If you're looking for a Markdown parser, this one is probably not the one you want

What you want to use it for isn't what it's meant to be used for. It's not a general purpose markdown parser. IMO, it's not even a "general purpose (when your purpose is outputting to the terminal) markdown parser". It's good for when you want some minimal style in terminal output of static, non-user-supplied, text.

epage · 2022-11-03T02:05:23Z

src/builder/styled_str.rs

@@ -28,38 +28,56 @@ impl StyledStr {
        AnsiDisplay { styled: self }
    }

-    pub(crate) fn header(&mut self, msg: impl Into<String>) {
+    /// Display string as header.
+    pub fn header(&mut self, msg: impl Into<String>) {


StyledStr's details can't be made public until after

Allow good/warning/error/hint color to be customized #3234

Allow styled text in template #1433

The header for my examples are not colored like the others #3108

At that point, we'll have a better idea of what can and should be public

👍

Though, at least in my understanding of how StyledStr gets used, having a public interface would be a prerequsite for #1433 and #3108.

Would it be helpful to you for me to write up a separate issue that pulls out my random scattered comments about StyledStr above (and makes them a lot more cohesive)?

epage · 2022-11-04T20:11:52Z

Though, at least in my understanding of how StyledStr gets used, having a public interface would be a prerequsite for #1433 and #3108.

Not with the current API for StyledStr

Would it be helpful to you for me to write up a separate issue that pulls out my random scattered comments about StyledStr above (and makes them a lot more cohesive)?

The doc comments? The ones I saw basically repeat the function name so not the highest priority /value

azdle · 2022-11-04T21:53:05Z

Not with the current API for StyledStr

I don't understand. StyledStr effectively doesn't have an API, everything besides new and ansi is private. How would a user build a StyledStr template or after_help?

Is there another way to create a StyledStr (or something that impls Into<StlyedStr>) without using using any of StyledStr's methods? If there is another way to add styling to the output text that's already possible, I'd be more than happy to change to that.

The doc comments? The ones I saw basically repeat the function name so not the highest priority /value

Not doc comments, comments as in "the things I commented on earlier in this conversation". I just meant, would if be helpful to pull out my thoughts on StyledStr from this PR's description and the limitations I found in the current API trying to implement markdown support?

epage · 2022-11-04T22:08:41Z

I don't understand. StyledStr effectively doesn't have an API, everything besides new and ansi is private. How would a user build a StyledStr template or after_help?

My plan for the future API is nothing like the current internal API and the design of those features will help guide what that API looks like, so they will need to be developed in tandem.

Not doc comments, comments as in "the things I commented on earlier in this conversation". I just meant, would if be helpful to pull out my thoughts on StyledStr from this PR's description and the limitations I found in the current API trying to implement markdown support?

Yes, this is why I prefer more of those requirement / design discussions to happen in the original issue so they don't get lost in PRs that come and go.

wip: add markdown parsing of doc comments in derive

c5cf20a

azdle changed the title ~~add markdown parsing of doc comments in derive~~ feat(derive): add markdown parsing of doc comments in derive Nov 2, 2022

epage reviewed Nov 3, 2022

View reviewed changes

WhyNotHugo mentioned this pull request Jul 10, 2023

mangen: verbatim_doc_comment has no effect #5003

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(derive): add markdown parsing of doc comments in derive #4444

feat(derive): add markdown parsing of doc comments in derive #4444

azdle commented Nov 2, 2022

azdle commented Nov 2, 2022

epage left a comment

epage Nov 3, 2022

azdle Nov 4, 2022

epage Nov 3, 2022

azdle Nov 4, 2022

epage commented Nov 4, 2022

azdle commented Nov 4, 2022

epage commented Nov 4, 2022

feat(derive): add markdown parsing of doc comments in derive #4444

Are you sure you want to change the base?

feat(derive): add markdown parsing of doc comments in derive #4444

Conversation

azdle commented Nov 2, 2022

Formatting

Block Elements

Inlines

Stubs

Other Open Questions

tl;dr

TODO

azdle commented Nov 2, 2022

Source

Rendered

epage left a comment

Choose a reason for hiding this comment

epage Nov 3, 2022

Choose a reason for hiding this comment

azdle Nov 4, 2022

Choose a reason for hiding this comment

epage Nov 3, 2022

Choose a reason for hiding this comment

azdle Nov 4, 2022

Choose a reason for hiding this comment

epage commented Nov 4, 2022

azdle commented Nov 4, 2022

epage commented Nov 4, 2022