Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

model: Add wrapWithBacktickFence, to use with quote-and-reply #179

Merged
merged 1 commit into from
Jun 16, 2023

Conversation

chrisbobbe
Copy link
Collaborator

@chrisbobbe chrisbobbe commented Jun 13, 2023

For the corresponding logic used in the web app and zulip-mobile, see web/shared/src/fenced_code.ts.

I believe the logic here differs from that in just this way: it follows the CommonMark spec more closely by disqualifying backtick-fence lines where the "info string" has a backtick, since that's not allowed:

If the info string comes after a backtick fence, it may not
contain any backtick characters. (The reason for this restriction
is that otherwise some inline code would be incorrectly
interpreted as the beginning of a fenced code block.)

Regarding the new file lib/model/compose.dart, we do have existing code that could reasonably move here, but it's pretty simple. It's the code that gives the upload-file Markdown; see
registerUploadStart and registerUploadEnd in
[ContentTextEditingController].

Related: #116

@chrisbobbe chrisbobbe added the a-compose Compose box, autocomplete, attaching files/images label Jun 13, 2023
@chrisbobbe chrisbobbe requested a review from gnprice June 13, 2023 00:50
@chrisbobbe chrisbobbe marked this pull request as draft June 13, 2023 00:51
@chrisbobbe chrisbobbe force-pushed the pr-wrap-with-backtick-fence branch 2 times, most recently from 12a1030 to 55a689f Compare June 13, 2023 00:55
final matches = _openingBacktickFenceRegex.allMatches(content);
int result = 3;
for (final match in matches) {
result = max(result, match[1]!.length);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

match[1]!.length + 1?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eep, yep, thanks for the catch! An example of why we'll want this code to have tests before we merge it. 😅

Comment on lines 96 to 98
for (int i = 0; i < fenceLength; i++) {
resultBuffer.write('`');
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FTR Dart suports string repetition with '`' * fenceLength.

Comment on lines 7 to 10
// TODO: Find a nice compact way to write test cases. :-) The tricky bit is
// that almost all of them will involve two multiline strings that need to
// be easy to read (input and expected output). Can we e.g. read data from
// a separate file in a format that's convenient to maintain?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd start by writing the strings in a way like this:

    print('''\
```quote
words
```
''');

Note the backslash after the opening triple-quote — that allows the source to have a newline, so that the string appears in the source with the same alignment as it'd have on its own, without introducing a leading newline in the resulting string.

Copy link
Collaborator Author

@chrisbobbe chrisbobbe Jun 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. But in DartPad:

main() {
  const stringA = '''\
```quote
words
```
''';
  
  const stringB = '''
```quote
words
```
''';
  
  print(stringA == stringB); // true
  print(stringB[0]); // `
}

Copy link
Collaborator Author

@chrisbobbe chrisbobbe Jun 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if the opening """ is followed by an optional \ and then a newline, the newline is ignored.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if I want the ending """ to be set off by a newline that doesn't appear in the source, I'm not sure I can accomplish that with \:

main() {
  const stringA = '''
```quote
words
```
''';
  
  const stringB = '''
```quote
words
```\
''';
  
  print(stringA == stringB); // true
  print(stringB.endsWith('\n')); // true
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if the opening """ is followed by an optional \ and then a newline, the newline is ignored.

Huh indeed! Convenient, then.

Also, if I want the ending """ to be set off by a newline that doesn't appear in the source, I'm not sure I can accomplish that with \:

From the examples, I think you mean a newline that does appear in the source but not in the resulting string.

That seems fine — I think we'll generally want these test strings to end with a newline anyway, or just won't care.

If there are some test cases for situations where the lack of a final newline affects things, then you could add a bit of code manipulating the string. For example define a chomp function like String chomp(String s) => s.replaceFirst(RegExp(r'\n$'), ''), and write:

  checkFoo(chomp('''
```quote
words
```
'''));

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the examples, I think you mean a newline that does appear in the source but not in the resulting string.

Ah, yes, thanks.

@chrisbobbe chrisbobbe force-pushed the pr-wrap-with-backtick-fence branch from 55a689f to 6405798 Compare June 13, 2023 18:38
@chrisbobbe chrisbobbe marked this pull request as ready for review June 13, 2023 18:38
@chrisbobbe
Copy link
Collaborator Author

Thanks for the reviews! Revision pushed, with tests.

@chrisbobbe chrisbobbe changed the title WIP model: Add wrapWithBacktickFence, to use with quote-and-reply model: Add wrapWithBacktickFence, to use with quote-and-reply Jun 13, 2023
Copy link
Member

@gnprice gnprice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I appreciate the detailed comments and references to the CommonMark spec. Comments below.

Comment on lines 216 to 137
final expectedWithInfoString = chomp('''
``````info
```
hello world
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One change that I think would make it easier to read these tests is to remove the infoString: … case from almost all of them. Then there could be a couple of tests that are specifically about the infoString behavior.

That wouldn't sacrifice any meaningful degree of test coverage, I think, because there's very little interaction between the handling of infoString and the rest of what this wrapWithBacktickFence function does. And on the other hand, it would make each test case quite a bit shorter and less repetitive, which would help in scanning through them and seeing what kinds of situations are covered.

Comment on lines 297 to 164
final expected = chomp('''
```
// hello world

~~~~~~
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I think a further improvement to the test code would be to use a test helper to deduplicate between content and expected, probably in all of the test cases. That would be a further big reduction in the repetitiveness of the cases, making it easier to scan through to understand what they're all doing.

The key here is that content is straightforward to infer from expected. So the test helper can take expected as an argument; remove the first and last lines; pass that to wrapWithBacktickFence; and just check that the result equals the original expected.

Then each of these test cases will have just a single copy of the string, and a simple function call around it.

Comment on lines 256 to 264
test('no panic on badly positioned fences', () {
final content = chomp('''
````javascript
// hello world
```
asdf
````
```python
''');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I wouldn't even call these "bad" in any way.

There's a "javascript" code block containing:

// hello world
```
asdf

followed by a "python" code block that's empty.

That isn't well-formed JS inside the "javascript" block, but the Markdown here is perfectly fine — certainly according to CommonMark and to the Zulip server.

check(wrapWithBacktickFence(content: content, infoString: 'info')).equals(expectedWithInfoString);
});

test('backtick fences may be indented up to three spaces', () {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this test tests this.

In particular, if I remove that logic:

-  const lineStart = r'^ {0,3}';
+  const lineStart = r'^';

the test still passes.

Once you make the test simplifications discussed above, this test can look like:

  checkFenceWrap('''
````
 ```
````
''');
  checkFenceWrap('''
````
  ```
````
''');
  checkFenceWrap('''
````
   ```
````
''');
  // but at 4 spaces of indentation it no longer counts:
  checkFenceWrap('''
```
    ```
```
''');

Comment on lines 103 to 96
resultBuffer.write('`' * fenceLength);
return resultBuffer.toString();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be writing a newline at the end, too.

That's necessary in order for the string to consist of complete lines, rather than end with a line fragment. And the syntax we're aiming to produce is a multi-line piece of syntax, in the context of Markdown which is quite line-oriented in general.

Concretely, if we take the output of this and it lacks a final newline, and then go on to put anything at all after it without first adding that missing newline, then that subsequent text will not be interpreted as intended, and will corrupt the closing code fence that this function intended. That's because that subsequent text (or the beginning of it) would become part of the last line this function started.

Comment on lines 61 to 63
/// This does not parse [content] to make sure its backtick fences are properly
/// paired and nested. If they aren't, it's probably not clear what
/// backtick-fenced output would be more reasonable anyway --
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way for them to be improperly paired and nested?

I'm not sure there is. I think if you take one Markdown document with some fenced blocks in it, and you shuffle around the order of the fences, you get another perfectly fine Markdown document that just means something else: some blocks may be closed implicitly, and some lines that had been opening fences with info strings are now just part of the text because they can't be closing fences.

More generally, I believe Markdown is a language where there's no such thing as malformed syntax — just syntax that doesn't mean what you intended.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CommonMark does not require closing fences to be paired.

The content of the code block consists of all subsequent lines, until a closing code fence of the same type as the code block began with (backticks or tildes), and with at least as many backticks or tildes as the opening code fence.

If the end of the containing block (or document) is reached and no closing code fence has been found, the code block contains all of the lines after the opening code fence until the end of the containing block (or document).

Our ```quote construct that parses its contents as Markdown is not part of CommonMark, but if it were, the only consistent nesting behavior to specify would be that a closing fence closes the outermost block it can close under these rules. Unremarkably, we don’t currently live up to that consistency (zulip/zulip#17610).

Comment on lines 11 to 19
// Allow "up to three spaces of indentation".
// As of Zulip Server 7.0, we don't comply with that detail:
// https://chat.zulip.org/#narrow/stream/6-frontend/topic/quote-and-reply.20fence.20length/near/1588273
// and that's a bug.
const lineStart = r'^ {0,3}';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is kind of concerning on the face of it, because it sounds like it's saying that this code's behavior disagrees with the server.

I think just s/and/but/ would already help.

Also helpful would be to be more specific about "we don't comply". I think what's meant is that the server doesn't currently accept those fences. But we're not generating indented fences here, only watching out for them; and it's harmless to make our own fence longer even if the server wouldn't notice the internal fence that we're steering clear of.

@chrisbobbe chrisbobbe force-pushed the pr-wrap-with-backtick-fence branch from 6405798 to 93c0d1e Compare June 14, 2023 00:09
@chrisbobbe
Copy link
Collaborator Author

Thanks for the reviews! Revision pushed.

I'm not sure I've found the right comment here:

  // (A) Why not panic on dangling opening fences
  //     (ones without a corresponding closing fence)?
  //     Because the spec allows leaving the closing fence implicit:
  //     > If the end of the containing block (or document) is reached
  //     > and no closing code fence has been found,
  //     > the code block contains all of the lines after the opening code fence
  //     > until the end of the containing block (or document).
  //
  // (B) Why not look for dangling closing fences (ones without an opening fence)?
  //     Because technically there's no such thing:
  //     they would be indistinguishable from dangling opening fences,
  //     and parsers (and this function) will treat them that way.
  final fenceLength = getUnusedOpeningBacktickFenceLength(content);

but I did want to write something for readers who might be confused (like I was when I first read the TypeScript implementation in @zulip/shared) about why it's OK that the logic doesn't look for fence pairs.

Copy link
Member

@gnprice gnprice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think the tests are quite readable now. Comments below; I think there's still one small behavior change we'll want.

Comment on lines 23 to 25
final openingFenceLength = firstLineMatch![1]!.length;
assert(RegExp(r'^`{' + openingFenceLength.toString() + r'}$').hasMatch(lines.removeAt(lines.length - 1)),
'test error: closing fence not found in `expected`');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same as saying that that line is equal to the string firstLineMatch![1], right?

Comment on lines 21 to 22
assert(lines.removeAt(lines.length - 1) == '',
'test error: `expected` should end with a newline');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to avoid having things inside assert that have side effects which the non-assert code is relying on.

It's probably OK in practice here in this test, because when running tests one generally wants assertions on anyway. But there are some cases where one runs test code without assertions, too, like for benchmarks. So one can't even quite draw a bright line like "in test code, always assume assertions run and don't worry about the other case".

Comment on lines 17 to 26
final lines = expected.split('\n');
final firstLineMatch = RegExp(r'^(`{3,})[^`]*$').allMatches(lines.removeAt(0)).singleOrNull;
assert(firstLineMatch != null,
'test error: opening fence not found in `expected`');
assert(lines.removeAt(lines.length - 1) == '',
'test error: `expected` should end with a newline');
final openingFenceLength = firstLineMatch![1]!.length;
assert(RegExp(r'^`{' + openingFenceLength.toString() + r'}$').hasMatch(lines.removeAt(lines.length - 1)),
'test error: closing fence not found in `expected`');
final content = lines.join('\n');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's another way to express the same set of allowed expected and resulting content:

Suggested change
final lines = expected.split('\n');
final firstLineMatch = RegExp(r'^(`{3,})[^`]*$').allMatches(lines.removeAt(0)).singleOrNull;
assert(firstLineMatch != null,
'test error: opening fence not found in `expected`');
assert(lines.removeAt(lines.length - 1) == '',
'test error: `expected` should end with a newline');
final openingFenceLength = firstLineMatch![1]!.length;
assert(RegExp(r'^`{' + openingFenceLength.toString() + r'}$').hasMatch(lines.removeAt(lines.length - 1)),
'test error: closing fence not found in `expected`');
final content = lines.join('\n');
final re = RegExp(r'^(`{3,})[^`]*?\n(.*)\n\1\n$', dotAll: true);
final content = re.firstMatch(expected)![2]!;

I think it's probably actually better, though, to simplify away this helper's knowledge of things like backticks and matching fence lengths, and let the code under test speak for itself in that respect. That then looks like this:

      final re = RegExp(r'^.*?\n(.*)\n.*\n$', dotAll: true);
      final content = re.firstMatch(expected)![1]!;

and the helper's doc can just say it removes the first and last line to get the input.

Then if the test passes something with, say, mismatched fences:

      checkFenceWrap('''

hello world

''');

the developer gets output from check like:

00:01 +0 -1: wrapWithBacktickFence single line with no code blocks [E]         
  Expected: a String that:
    equals '````
    hello world
    ```'
  Actual: '```
  hello world
  ```'
  Which: differs at offset 3:
  ````\nhello w ...
  ```\nhello wo ...
     ^
  package:checks/src/checks.dart 72:9               check.<fn>
  package:checks/src/checks.dart 717:12             _TestContext.expect
  package:checks/src/extensions/string.dart 108:13  StringChecks.equals
  test/model/compose_test.dart 20:78                main.<fn>.checkFenceWrap
  test/model/compose_test.dart 24:7                 main.<fn>.<fn>

which seems pretty clear, and appropriate — the result of the test is that the code under test did not produce the output that the test said was expected.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to make content consist of complete lines as discussed in my other comment, the capturing group just needs to enclose the newline:

      final re = RegExp(r'^.*?\n(.*\n).*\n$', dotAll: true);

Then in order to test that the function correctly handles the case of an incomplete final line too, we can add another flag to the helper:

    void checkFenceWrap(String expected, {String? infoString, bool chopNewline = false}) {
      final re = RegExp(r'^.*?\n(.*\n).*\n$', dotAll: true);
      String content = re.firstMatch(expected)![1]!;
      if (chopNewline) content = content.substring(0, content.length - 1);

and have a couple of test cases using that.

Comment on lines 95 to 96
resultBuffer.write(content);
resultBuffer.write('\n');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So one thing this function is doing is it's effectively assuming that its argument content is missing its final newline — that it ends with an incomplete line. If the argument is instead made of complete lines, then this will add a blank line at the end before the closing fence.

If you look back at the spec: https://spec.commonmark.org/0.30/#fenced-code-blocks
and at the HTML output for the examples, the content (enclosed in <code>…</code>) always consists of complete lines — it's always either empty (so consists of zero lines, all of them complete), or ends with a newline. The only exceptions are examples 121, 138, and 145… which demonstrate syntax that does not make a fenced code block, and instead makes some inline code.

And that's reflected in the spec text: "The content of the code block consists of all subsequent lines", emphasis added.

I think the norm for this function should therefore be that the input consists of complete lines, just as the output will. Probably it's best if it also cheerfully accepts an incomplete final line, and supplies its newline. The implementation then just needs to say:

Suggested change
resultBuffer.write(content);
resultBuffer.write('\n');
resultBuffer.write(content);
if (!content.endsWith('\n')) {
resultBuffer.write('\n');
}

check(wrapWithBacktickFence(content: content, infoString: infoString)).equals(expected);
}

test('single line with no code blocks', () {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other set of scenarios that would be good to test: empty content; content consisting of blank lines.

return RegExp(lineStart + backticks + trailing, multiLine: true);
})();

/// The shortest opening backtick fence that's longer than any in [content].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reasoning here can actually be simplified by removing the word "opening", from this summary and the function's name. We're finding a backtick fence that's longer than any backtick fence, whether opening or closing, in the content.

That's what the code is actually already doing, because the regex matches closing fences as well as opening fences.

It's easier to reason about, because a fence that's an opening fence in the original content in isolation could become a closing fence, if we were to try to enclose it in fences that were too short.

And it's what we need to do:

  • We need our opening fence to be long enough that it won't be closed by any fence in the content.
  • We need our closing fence to be long enough that it will close any outstanding opening fences in the content.

Comment on lines 76 to 88
// (A) Why not panic on dangling opening fences
// (ones without a corresponding closing fence)?
// Because the spec allows leaving the closing fence implicit:
// > If the end of the containing block (or document) is reached
// > and no closing code fence has been found,
// > the code block contains all of the lines after the opening code fence
// > until the end of the containing block (or document).
//
// (B) Why not look for dangling closing fences (ones without an opening fence)?
// Because technically there's no such thing:
// they would be indistinguishable from dangling opening fences,
// and parsers (and this function) will treat them that way.
final fenceLength = getUnusedOpeningBacktickFenceLength(content);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To your comment above about this comment: I think a good direction to go in commenting on this would be on the lines of @andersk's comment at #179 (comment) , after making the simplification I point out in the comment before this one.

Or perhaps just take the two "what we need to do" points from the end of that comment of mine, with a link to that comment of Anders's.

@chrisbobbe chrisbobbe force-pushed the pr-wrap-with-backtick-fence branch from 93c0d1e to fd79d07 Compare June 15, 2023 21:00
@chrisbobbe
Copy link
Collaborator Author

Thanks for the review! Revision pushed.

Copy link
Member

@gnprice gnprice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Small comments.

Comment on lines 78 to 84
test('three-backtick block; incomplete final line', () {
checkFenceWrap(chopNewline: true, '''
```
hello
world
```
''');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test seems not what you intended.

I think it can just be dropped, though — there's already the test above with the same contents, as well as the five-backtick version below.

Comment on lines 50 to 54
/// For example, for this Markdown:
///
/// ```javascript
/// console.log('Hello world!');
/// ```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
/// For example, for this Markdown:
///
/// ```javascript
/// console.log('Hello world!');
/// ```
/// For example, for this Markdown:
///
/// ```javascript
/// console.log('Hello world!');
/// ```

and similarly below, so that this dartdoc renders properly, e.g. in the IDE at a call site

Comment on lines 22 to 26
test('empty content', () {
checkFenceWrap(chopNewline: true, '''
```

```
''');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, this example doesn't look right. The output is a block whose contents are one empty line, not a block with empty contents.

I think the trouble is that that condition I suggested above:

  if (!content.endsWith('\n')) {
    resultBuffer.write('\n');

is incorrectly treating an empty string as ending with an incomplete line. So it should instead be like if (content.isNotEmpty && !content.endsWith('\n')).

@chrisbobbe chrisbobbe force-pushed the pr-wrap-with-backtick-fence branch from fd79d07 to 96a13e1 Compare June 16, 2023 21:09
@chrisbobbe
Copy link
Collaborator Author

Aha, thanks! Revision pushed.

For the corresponding logic used in the web app and zulip-mobile,
see web/shared/src/fenced_code.ts.

I believe the logic here differs from that in just this way: it
follows the CommonMark spec more closely by disqualifying
backtick-fence lines where the "info string" has a backtick, since
that's not allowed:

> If the info string comes after a backtick fence, it may not
> contain any backtick characters. (The reason for this restriction
> is that otherwise some inline code would be incorrectly
> interpreted as the beginning of a fenced code block.)

Regarding the new file lib/model/compose.dart, we do have existing
code that could reasonably move here, but it's pretty simple. It's
the code that gives the upload-file Markdown; see
registerUploadStart and registerUploadEnd in
[ContentTextEditingController].

Related: zulip#116
@gnprice gnprice force-pushed the pr-wrap-with-backtick-fence branch from 96a13e1 to b6556a6 Compare June 16, 2023 22:31
@gnprice gnprice merged commit b6556a6 into zulip:main Jun 16, 2023
@gnprice
Copy link
Member

gnprice commented Jun 16, 2023

Thanks! Looks good — merging 😅

@chrisbobbe chrisbobbe deleted the pr-wrap-with-backtick-fence branch June 16, 2023 22:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a-compose Compose box, autocomplete, attaching files/images
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants