RFC: Spec for HTML Inlines vs. HTML Blocks #102
Replies: 4 comments 1 reply
-
If there were an API that wouldn't "preparse" the content of html elements, but instead provide them as raw strings (assuming this is possible(?)), and then a user writes an html renderer calling the inline parser manually, so transforming that raw string back into inlines. Wouldn't it be possible for elm-pages to still distill this as a data source? |
Beta Was this translation helpful? Give feedback.
-
I think this is indeed a weird situation, but |
Beta Was this translation helpful? Give feedback.
-
Hmm yeah, this example illustrates something unexpected. I assume the spec says this html element should be a block and wrapped in paragraphs? (does the spec specify something like this?) I think the least surprising thing for me would be, if the rules for when HTML are blocks is the same as the rules when a string of text would become its own paragraph vs. when it becomes part of a paragraph. Or, to try to specify it differently: For each HTML element, (as a thought experiment) replace all its characters (including the tags) with Examples: EDIT: Turns out the github markdown renderer renders the ordered list here weirdly for me :) Markdown is hard, huh?
Of course I'm not suggesting this gets implemented like this (replacing html elements non-whitespace characters with some gibberish and checking what that gibberish becomes), but I think it's a good mental model for what a user might expect, right? |
Beta Was this translation helpful? Give feedback.
-
For reference, and to contrast with a different set of semantics, here is what MDXJS does. It feels counterintuitive to me. In this example:
What I find unintuitive about that implementation is that markdown parsing turns on or off based on the context. This seems likely to cause confusion. To "turn markdown parsing on" for the Button when it's interpreted as a Block, you need to surround it with newlines. <Button>
*Here is a button*
</Button> See the MDX playground to try it out. Turning markdown on/off based on newlines is somewhat of a separate issue. I think that if you write an HTML renderer that is getting rendered view's (not a Perhaps this design is attempting to be as close to possible as how you can render markdown within HTML tags in vanilla markdown? Like in this example here: https://spec.commonmark.org/0.30/#example-152 <DIV CLASS="foo">
*Markdown*
</DIV> Which renders to <DIV CLASS="foo">
<p><em>Markdown</em></p>
</DIV> Whereas this example: https://spec.commonmark.org/0.30/#example-162 <a href="foo">
*bar*
</a> Does not parse the <a href="foo">
*bar*
</a> Given that HTML is more of a first-class citizen for |
Beta Was this translation helpful? Give feedback.
-
Background
There are some things to clarify about the specification for HTML renderers, and some related details to clarify around how to parse HTML tags.
The goal is to make HTML tags in
elm-markdown
predictable, simple, and explicit. And to give you the tools to accomplish what you need to.Related issues: #50 and #70.
In #100, I have an implementation of changing out HTML Inlines to run the inline parser only (instead of the block parser). This also simplifies the Inline and Block type definitions because it means that the
Inline
type can no longer refer back up to aBlock
.Wrapping Paragraphs
Should there be a way to render without the paragraph wrapper? In some cases, you may want to directly get the list of rendered inlines and choose how to wrap them yourself (rather than having them implicitly wrapped in a Paragraph). In other cases, having the list of rendered children represent two different things (rendered blocks vs. rendered inlines) could lead to strange visual bugs like rendering horizontal elements vertically or vice versa.
For example,
If that were wrapped implicitly in a Paragraph, it would render the
sup
as a block and there wouldn't be a way to opt out of that display. If you're given a List representing the rendered inlines, then you could render them with an inline display and show them correctly.Should there be a way to get the literal text within a tag?
In the case of the
<sup>
, maybe you don't want to deal with rendered markdown children at all and instead just want to get the text inside of the HTML tag and use that directly. Is this a good idea, or does this make things more confusing and hard to predict how things will render? For example, should someone writingx<sup>**3**</sup>
reasonably expect to be using markdown within the HTML tag? Is it worth the extra mental overhead of having two different modes here? There are some other uses cases worth considering, like rendering with different formats such as LaTeX. In cases like that, you wouldn't want markdown parsing to interfere with the raw format, so having access to the unparsed text would open up use cases like that.The types would need to change to contain the raw data. I'm confident that parsing should happen independent of HTML renderers, so I think you should be able to take a parsed AST and then pass it to any different renderer, so therefore the AST would need to include both the parsed markdown children as well as the raw String in order to handle both.
It would also be possible to defer parsing the inner body, but I prefer to have a fully parsed structure so the data structure can be traversed without doing multiple calls to the parser, and also for performance reasons for tools like
elm-pages
that want to fully parse the AST at build time and then serialize that data to avoid running the parser in the browser.The type would need to change to something like this:
The
Maybe String
would be the unparsed string inside the HTML tag (orNothing
if it is a self-closing tag).Defining Inline vs. Block Renderers
Should you be able to define an HTML renderer to only handle Inline HTML or only Block HTML? For example, if you have an HTML handler for a
<Youtube id="..." />
tag, it may be designed to render properly as a block, but could look strange in the middle of a Paragraph of text.Validation Semantics
It could potentially fit into the mental model of HTML renderers as just another validation. Just like you can have a validation error if there is a required HTML attribute, or an unhandled HTML tag, you could also give an error if you try to render a
<Youtube id="..." />
embed as an inline.Disallowing block render could be confusing
If you could give an error in the case that an HTML tag is used as a Block, and only allow it to be used as an Inline HTML tag, that could be confusing. Because it can change from an Inline HTML tag to a Block HTML tag simply by moving from the middle of a line to the beginning. So this seems like it would be likely to cause issues, because allowing a tag to only be rendered as an Inline would disallow using it as the first item in a paragraph.
This could be a sign that either:
Rendering
Or if you have an HTML handler for
<DictionaryDefinition word="equestrian">
, and the HTML handler is designed to display an annotated word and render it like an inline (similar to bold or italic inlines), it may render unexpectedly if it displays as a block element, pushing the remaining text to the next line.For example, you wouldn't want this Block HTML to push the rest of the paragraph to a newline because the HTML tag is at the beginning of the line (making it Block HTML not Inline HTML).
Since the
DictionaryDefinition
example is relying on an HTML attribute for input, not the rendered markdown children within the HTML tag, it could simply render as an inline rather than block styling (using aspan
tag or CSS, for example). However, if we were relying on the rendered children, the parsing changes to block parsing vs. inline parsing.This would parse as an UnorderedList as expected. If we use
FunFacts
in an Inline HTML, should it wrap them in a Paragraph, or should the renderedChildren be the list of rendered inlines?Should the renderedChildren in this case be:
And should there be a way to render differently based on whether it is Block HTML or Inline HTML?
Is Multi-Line Inline HTML valid?
If there is a newline before the closing tag of an HTML Inline, what should happen?
A) Don't parse it as an HTML tag, parse it as plain text
B) Parse it as HTML until the closing tag?
Beta Was this translation helpful? Give feedback.
All reactions