Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In Markdown table cells, apply HTML escaping only to code blocks, and apply it properly #167

Merged
merged 10 commits into from
Aug 1, 2023

Conversation

tetromino
Copy link
Collaborator

@tetromino tetromino commented Jul 25, 2023

In Markdown table cells, apply HTML escaping only to code blocks, and apply it properly

Since #161 removed HTML escaping for defaults and function docstrings,
we should do the same for attribute and param docs in table cells.

The only limitations Markdown places on table cells are:

  • no pipe characters (they must be escaped with a backslash)
  • no newlines (they must be transformed into <br> or an HTML entity)

The latter restriction makes it impossible to have a fenced code block inside a table cell.

Therefore:

  • we do not escape HTML or Markdown markup outside a fenced code block
  • we keep existing logic for escaping newlines outside a fenced code block
  • we fix fence detection (e.g. allowing more than 3 fence characters to support embedded code blocks in code blocks, allowing tildes as fence characters, properly handling language names, etc.);
  • in code block content, we escape HTML, and we escape newlines as HTML entities (since <br> does not work in a <pre><code> block) - finally fixing code block newlines in table cells.

This is a followup to #161.

Partially addresses #118

@tetromino tetromino changed the title Markdown cell In Markdown table cells, apply HTML escaping only to code blocks, and apply it properly Jul 25, 2023
@tetromino
Copy link
Collaborator Author

tetromino commented Jul 25, 2023

This is a draft PR to be merged after #166 lands

Relevant commits start at b198a24 (everything before that is from #166)

@fmeum fyi

… apply it properly

Since bazelbuild#161 removed HTML escaping for defaults and function docstrings,
we should do the same for attribute and param docs in table cells.

The only limitations Markdown places on table cells are:
* no pipe characters (they must be escaped with a backslash)
* no newlines (they must be transformed into <br> or an HTML entity)

The latter restriction makes it impossible to have a fenced code block
inside a table cell.

Therefore:
* we do not escape HTML or Markdown markup outside a fenced code block
* we keep existing logic for escaping newlines outside a fenced code
  block
* we fix fence detection (e.g. allowing more than 3 fence characters to
  support embedded code blocks in code blocks, allowing tildes as fence
  characters, properly handling language names, etc.);
* in code block content, we escape HTML, and we escape newlines as HTML
  entities (since <br> does not work in a <pre><code> block) - finally
  fixing code block newlines in table cells.

This is a followup to bazelbuild#161.
@tetromino tetromino marked this pull request as ready for review July 25, 2023 02:03
@tetromino tetromino requested a review from brandjon as a code owner July 25, 2023 02:03
@tetromino tetromino marked this pull request as ready for review July 25, 2023 18:30
@tetromino
Copy link
Collaborator Author

I have refactored markdownCellFormat to make it feasible to extend it later for formatting lists etc. inside markdown tables.

Copy link
Member

@brandjon brandjon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments on the parsing logic but didn't go through the whole algorithm with a fine-toothed comb.

resultString = replaceWithTag(resultString, "`", "<code>", "</code>");
// See https://github.github.com/gfm
private static final class MarkdownCellFormatter {
private final ImmutableList<String> lines;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add one-line comments to doc the invariant for these fields. (Especially that currentLine is 0-indexed.)

\r\n will be normalized to \n by Bazel's Starlark interpreter, and if it somehow ends up
in the input proto, it will be stripped in our String.lines() call regardless.
@tetromino tetromino merged commit 4736754 into bazelbuild:master Aug 1, 2023
@tetromino tetromino deleted the markdown_cell branch August 1, 2023 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants