Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In Markdown table cells, apply HTML escaping only to code blocks, and apply it properly #167

Merged
merged 10 commits into from
Aug 1, 2023
10 changes: 5 additions & 5 deletions docs/stardoc_rule.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,17 +25,17 @@ Generates documentation for exported starlark rule definitions in a target starl
| <a id="stardoc-deps"></a>deps | A list of bzl_library dependencies which the input depends on. | `[]` |
| <a id="stardoc-format"></a>format | The format of the output file. Valid values: 'markdown' or 'proto'. | `"markdown"` |
| <a id="stardoc-symbol_names"></a>symbol_names | A list of symbol names to generate documentation for. These should correspond to the names of rule definitions in the input file. If this list is empty, then documentation for all exported rule definitions will be generated. | `[]` |
| <a id="stardoc-semantic_flags"></a>semantic_flags | A list of canonical flags to affect Starlark semantics for the Starlark interpreter during documentation generation. This should only be used to maintain compatibility with non-default semantic flags required to use the given Starlark symbols.<br><br>For example, if <code>//foo:bar.bzl</code> does not build except when a user would specify <code>--incompatible_foo_semantic=false</code>, then this attribute should contain "--incompatible_foo_semantic=false". | `[]` |
| <a id="stardoc-stardoc"></a>stardoc | The location of the legacy Stardoc extractor. Ignored when using the native <code>starlark_doc_extract</code> rule. | `Label("//stardoc:prebuilt_stardoc_binary")` |
| <a id="stardoc-semantic_flags"></a>semantic_flags | A list of canonical flags to affect Starlark semantics for the Starlark interpreter during documentation generation. This should only be used to maintain compatibility with non-default semantic flags required to use the given Starlark symbols.<br><br>For example, if `//foo:bar.bzl` does not build except when a user would specify `--incompatible_foo_semantic=false`, then this attribute should contain "--incompatible_foo_semantic=false". | `[]` |
| <a id="stardoc-stardoc"></a>stardoc | The location of the legacy Stardoc extractor. Ignored when using the native `starlark_doc_extract` rule. | `Label("//stardoc:prebuilt_stardoc_binary")` |
| <a id="stardoc-renderer"></a>renderer | The location of the renderer tool. | `Label("//stardoc:renderer")` |
| <a id="stardoc-aspect_template"></a>aspect_template | The input file template for generating documentation of aspects | `Label("//stardoc:templates/markdown_tables/aspect.vm")` |
| <a id="stardoc-func_template"></a>func_template | The input file template for generating documentation of functions. | `Label("//stardoc:templates/markdown_tables/func.vm")` |
| <a id="stardoc-header_template"></a>header_template | The input file template for the header of the output documentation. | `Label("//stardoc:templates/markdown_tables/header.vm")` |
| <a id="stardoc-provider_template"></a>provider_template | The input file template for generating documentation of providers. | `Label("//stardoc:templates/markdown_tables/provider.vm")` |
| <a id="stardoc-rule_template"></a>rule_template | The input file template for generating documentation of rules. | `Label("//stardoc:templates/markdown_tables/rule.vm")` |
| <a id="stardoc-repository_rule_template"></a>repository_rule_template | The input file template for generating documentation of repository rules. This template is used only when using the native <code>starlark_doc_extract</code> rule. | `Label("//stardoc:templates/markdown_tables/repository_rule.vm")` |
| <a id="stardoc-module_extension_template"></a>module_extension_template | The input file template for generating documentation of module extensions. This template is used only when using the native <code>starlark_doc_extract</code> rule. | `Label("//stardoc:templates/markdown_tables/module_extension.vm")` |
| <a id="stardoc-use_starlark_doc_extract"></a>use_starlark_doc_extract | Use the native <code>starlark_doc_extract</code> rule if available. | `True` |
| <a id="stardoc-repository_rule_template"></a>repository_rule_template | The input file template for generating documentation of repository rules. This template is used only when using the native `starlark_doc_extract` rule. | `Label("//stardoc:templates/markdown_tables/repository_rule.vm")` |
| <a id="stardoc-module_extension_template"></a>module_extension_template | The input file template for generating documentation of module extensions. This template is used only when using the native `starlark_doc_extract` rule. | `Label("//stardoc:templates/markdown_tables/module_extension.vm")` |
| <a id="stardoc-use_starlark_doc_extract"></a>use_starlark_doc_extract | Use the native `starlark_doc_extract` rule if available. | `True` |
| <a id="stardoc-kwargs"></a>kwargs | Further arguments to pass to stardoc. | none |


Original file line number Diff line number Diff line change
Expand Up @@ -24,15 +24,16 @@
import com.google.devtools.build.skydoc.rendering.proto.StardocOutputProtos.AttributeInfo;
import com.google.devtools.build.skydoc.rendering.proto.StardocOutputProtos.AttributeType;
import com.google.devtools.build.skydoc.rendering.proto.StardocOutputProtos.FunctionParamInfo;
import com.google.devtools.build.skydoc.rendering.proto.StardocOutputProtos.ModuleExtensionInfo;
import com.google.devtools.build.skydoc.rendering.proto.StardocOutputProtos.ModuleExtensionTagClassInfo;
import com.google.devtools.build.skydoc.rendering.proto.StardocOutputProtos.ProviderInfo;
import com.google.devtools.build.skydoc.rendering.proto.StardocOutputProtos.ProviderNameGroup;
import com.google.devtools.build.skydoc.rendering.proto.StardocOutputProtos.RuleInfo;
import com.google.devtools.build.skydoc.rendering.proto.StardocOutputProtos.RepositoryRuleInfo;
import com.google.devtools.build.skydoc.rendering.proto.StardocOutputProtos.ModuleExtensionInfo;
import com.google.devtools.build.skydoc.rendering.proto.StardocOutputProtos.ModuleExtensionTagClassInfo;
import com.google.devtools.build.skydoc.rendering.proto.StardocOutputProtos.RuleInfo;
import com.google.devtools.build.skydoc.rendering.proto.StardocOutputProtos.StarlarkFunctionInfo;
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

/** Contains a number of utility methods for markdown rendering. */
Expand All @@ -46,55 +47,135 @@ public MarkdownUtil(String extensionBzlFile) {
}

/**
* Return a string that formats the input string so it is displayable in a markdown table cell.
* This performs the following operations:
* Formats the input string so that it is displayable in a Markdown table cell. This performs the
* following operations:
*
* <ul>
* <li>Trims the string of leading/trailing whitespace.
* <li>Transforms the string using {@link #htmlEscape}.
* <li>Transforms multline code (```) tags into preformatted code HTML tags.
* <li>Transforms single-tick code (`) tags into code HTML tags.
tetromino marked this conversation as resolved.
Show resolved Hide resolved
* <li>Transforms 'new paraphgraph' patterns (two or more sequential newline characters) into
* line break HTML tags.
* <li>Turns lingering new line tags into spaces (as they generally indicate intended line wrap.
* <li>Escapes pipe characters ({@code |}) as {@code \|}.
* <li>Transforms Markdown code blocks ({@code ```}) into HTML preformatted code blocks, and
* transforms newlines within those code blocks into character entities
* <li>Transforms remaining 'new paragraph' patterns (two or more sequential newline characters)
* into line break HTML tags.
* <li>Turns remaining newlines into spaces (as they generally indicate intended line wrap).
* </ul>
*
* TODO(https://github.com/bazelbuild/stardoc/issues/118): also format Markdown lists as HTML.
*/
public String markdownCellFormat(String docString) {
String resultString = htmlEscape(docString.trim());
public static String markdownCellFormat(String docString) {
return new MarkdownCellFormatter(docString).format();
}

resultString = replaceWithTag(resultString, "```", "<pre><code>", "</code></pre>");
resultString = replaceWithTag(resultString, "`", "<code>", "</code>");
// See https://github.github.com/gfm
private static final class MarkdownCellFormatter {
// Lines of the input docstring, without newline terminators.
private final ImmutableList<String> lines;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add one-line comments to doc the invariant for these fields. (Especially that currentLine is 0-indexed.)

// Index of the current line in lines, 0-based.
int currentLine;
// Formatted result.
StringBuilder result;

return resultString.replaceAll("\n(\\s*\n)+", "<br><br>").replace('\n', ' ');
}
private static final Pattern CODE_BLOCK_OPENING_FENCE =
Pattern.compile("^ {0,3}(?<fence>```+|~~~+) *(?<lang>\\w*)[^`~]*$");

private static String replaceWithTag(
String wholeString, String stringToReplace, String openTag, String closeTag) {
String remainingString = wholeString;
StringBuilder resultString = new StringBuilder();
MarkdownCellFormatter(String docString) {
lines = docString.trim().replace("|", "\\|").lines().collect(toImmutableList());
currentLine = 0;
result = new StringBuilder();
}

boolean openTagNext = true;
int index = remainingString.indexOf(stringToReplace);
while (index > -1) {
resultString.append(remainingString, 0, index);
resultString.append(openTagNext ? openTag : closeTag);
openTagNext = !openTagNext;
remainingString = remainingString.substring(index + stringToReplace.length());
index = remainingString.indexOf(stringToReplace);
/** Consumes the input and yields the formatted result. */
String format() {
tetromino marked this conversation as resolved.
Show resolved Hide resolved
boolean prefixContentWithSpace = false;
for (; currentLine < lines.size(); currentLine++) {
if (formatParagraphBreak()) {
prefixContentWithSpace = false;
continue;
}
if (prefixContentWithSpace) {
result.append(" ");
}
prefixContentWithSpace = true;
if (formatFencedCodeBlock()) {
continue;
}
result.append(lines.get(currentLine));
}
return result.toString();
}

/**
* If a fenced code block begins at {@link #currentLine}, render to {@link #result}, update
* {@link #currentLine} to point to the closing fence, and return true.
*/
private boolean formatFencedCodeBlock() {
// See https://github.github.com/gfm/#fenced-code-blocks
Matcher opening = CODE_BLOCK_OPENING_FENCE.matcher(lines.get(currentLine));
if (!opening.matches()) {
return false;
}
Pattern closingFence = Pattern.compile("^ {0,3}" + opening.group("fence") + " *$");
for (int closingLine = currentLine + 1; closingLine < lines.size(); closingLine++) {
if (closingFence.matcher(lines.get(closingLine)).matches()) {
// We found the closing fence: format the block's contents as HTML.
String language = opening.group("lang");
if (language != null && !language.isEmpty()) {
result.append("<pre><code class=\"language-").append(language).append("\">");
} else {
result.append("<pre><code>");
}
int firstContentLine = currentLine + 1;
for (int i = firstContentLine; i < closingLine; i++) {
if (i > firstContentLine) {
result.append(newlineEscape("\n"));
}
result.append(htmlEscape(lines.get(i)));
}
result.append("</code></pre>");
currentLine = closingLine;
return true;
}
}
// We did not find the closing fence.
return false;
}

/**
* If blank lines appear at {@link #currentLine}, render to {@link #result}, update {@link
* #currentLine} to point to the last line of the break, and return true.
*/
private boolean formatParagraphBreak() {
int numEmptyLines = 0;
for (int i = currentLine; i < lines.size(); i++) {
if (lines.get(i).isEmpty()) {
numEmptyLines++;
} else {
break;
}
}
if (numEmptyLines > 0) {
result.append("<br><br>");
currentLine += numEmptyLines - 1;
return true;
}
return false;
}
resultString.append(remainingString);
return resultString.toString();
}

/**
* Return a string that escapes angle brackets for HTML.
*
* <p>For example: 'Information with <brackets>.' becomes 'Information with &lt;brackets&gt;'.
*/
public String htmlEscape(String docString) {
public static String htmlEscape(String docString) {
return docString.replace("<", "&lt;").replace(">", "&gt;");
}

/** Returns a string that escapes newlines with HTML entities. */
private static String newlineEscape(String docString) {
return docString.replace("\n", "&#10;");
}

private static final Pattern CONSECUTIVE_BACKTICKS = Pattern.compile("`+");

/**
Expand Down Expand Up @@ -164,23 +245,25 @@ public String aspectSummary(String aspectName, AspectInfo aspectInfo) {
}

/**
* Return a string representing the repository rule summary for the given repository rule with the given name.
* Return a string representing the repository rule summary for the given repository rule with the
* given name.
*
* <p>For example: 'my_repo_rule(foo, bar)'. The summary will contain hyperlinks for each attribute.
* <p>For example: 'my_repo_rule(foo, bar)'. The summary will contain hyperlinks for each
* attribute.
*/
@SuppressWarnings("unused") // Used by markdown template.
public String repositoryRuleSummary(String ruleName, RepositoryRuleInfo ruleInfo) {
ImmutableList<String> attributeNames =
ruleInfo.getAttributeList().stream()
.map(AttributeInfo::getName)
.collect(toImmutableList());
ruleInfo.getAttributeList().stream().map(AttributeInfo::getName).collect(toImmutableList());
return summary(ruleName, attributeNames);
}

/**
* Return a string representing the module extension summary for the given module extension with the given name.
* Return a string representing the module extension summary for the given module extension with
* the given name.
*
* <p>For example:
*
* <pre>
* my_ext = use_extension("//some:file.bzl", "my_ext")
* my_ext.tag1(foo, bar)
Expand All @@ -192,13 +275,19 @@ public String repositoryRuleSummary(String ruleName, RepositoryRuleInfo ruleInfo
@SuppressWarnings("unused") // Used by markdown template.
public String moduleExtensionSummary(String extensionName, ModuleExtensionInfo extensionInfo) {
StringBuilder summaryBuilder = new StringBuilder();
summaryBuilder.append(String.format("%s = use_extension(\"%s\", \"%s\")", extensionName, extensionBzlFile, extensionName));
summaryBuilder.append(
String.format(
"%s = use_extension(\"%s\", \"%s\")", extensionName, extensionBzlFile, extensionName));
for (ModuleExtensionTagClassInfo tagClass : extensionInfo.getTagClassList()) {
ImmutableList<String> attributeNames =
tagClass.getAttributeList().stream()
.map(AttributeInfo::getName)
.collect(toImmutableList());
summaryBuilder.append("\n").append(summary(String.format("%s.%s", extensionName, tagClass.getTagName()), attributeNames));
tagClass.getAttributeList().stream()
.map(AttributeInfo::getName)
.collect(toImmutableList());
summaryBuilder
.append("\n")
.append(
summary(
String.format("%s.%s", extensionName, tagClass.getTagName()), attributeNames));
}
return summaryBuilder.toString();
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -47,4 +47,38 @@ public void markdownCodeSpan_backticksPadding() {
assertThat(MarkdownUtil.markdownCodeSpan("foo`")).isEqualTo("`` foo` ``");
assertThat(MarkdownUtil.markdownCodeSpan("foo``")).isEqualTo("``` foo`` ```");
}

@Test
public void markdownCellFormat_pipes() {
assertThat(MarkdownUtil.markdownCellFormat("foo|bar")).isEqualTo("foo\\|bar");
assertThat(MarkdownUtil.markdownCellFormat("|\\|foobar||")).isEqualTo("\\|\\\\|foobar\\|\\|");
}

@Test
public void markdownCellFormat_newlines() {
assertThat(MarkdownUtil.markdownCellFormat("\nfoo\nbar\n\nbaz\r\n\r\n\r\nqux\r\n"))
.isEqualTo("foo bar<br><br>baz<br><br>qux");
// Newline escapes are not expanded
assertThat(MarkdownUtil.markdownCellFormat("hello\\r\\nworld")).isEqualTo("hello\\r\\nworld");
}

@Test
public void markdownCellFormat_codeBlocks() {
assertThat(MarkdownUtil.markdownCellFormat("```\nhello();\n```"))
.isEqualTo("<pre><code>hello();</code></pre>");
assertThat(MarkdownUtil.markdownCellFormat("```\nhello();\n```\nor\n~~~\nbye();\n~~~"))
.isEqualTo("<pre><code>hello();</code></pre> or <pre><code>bye();</code></pre>");
assertThat(MarkdownUtil.markdownCellFormat("```bash\ncat foo.txt | cmd > /dev/null\n```"))
.isEqualTo(
"<pre><code class=\"language-bash\">cat foo.txt \\| cmd &gt; /dev/null</code></pre>");
assertThat(MarkdownUtil.markdownCellFormat("````\n```\n```\n````"))
.isEqualTo("<pre><code>```&#10;```</code></pre>");
}

@Test
public void markdownCellFormat_inlineMarkup() {
assertThat(MarkdownUtil.markdownCellFormat("<b>bold</b> <i>italic</i>"))
.isEqualTo("<b>bold</b> <i>italic</i>");
assertThat(MarkdownUtil.markdownCellFormat("**bold** _italic_")).isEqualTo("**bold** _italic_");
}
}
2 changes: 1 addition & 1 deletion test/bzlmod/docs.md.golden
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,6 @@ Emits the constraints of the host platform to a file.

| Name | Description | Default Value |
| :------------- | :------------- | :------------- |
| <a id="write_host_constraints-name"></a>name | The name of the target. The output file will be named <code>&lt;name&gt;.txt</code>. | none |
| <a id="write_host_constraints-name"></a>name | The name of the target. The output file will be named `<name>.txt`. | none |


Loading