Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

current-non-scribble-entity-handler #48

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

tim-brown
Copy link

Currently the markdown parser converts "..." (and friends) to an 'hellip symbol. This works lovely for xexpr->html, but 'hellip isn't a supported pre-content symbol.

The parameter: current-non-scribble-entity-handler is called when 'hellip is generated, and is a hook to substitute other "representations" of "...". It defaults to values, which leaves the output as the current parser does. It could, equally be:

(match-lambda ['hellip "..."]) ; literally three dot string
(match-lambda ['hellip ""]) ; U+2606 - HORIZONTAL ELLIPSIS = three dot leader

this handler can be used to extend any other "smart" symbol mappings.

Scribble does not include 'hellip as a recognised HTML entity in pre-content

Entities not in: 'mdash, 'ndash, 'ldquo, 'lsquo, 'rdquo, 'rsquo, 'larr, 'rarr, or 'prime are passed through current-non-scribble-entity-handler

current-non-scribble-entity-handler defaults to `values` i.e. `(current-non-scribble-entity-handler 'hellip)` -> `'hellip`
other possibles are:
```racket
(match-lambda ['hellip "..."]) ; literal dot dot dot replacement
(match-lambda ['hellip "…"]) ; unicode horizontal ellipsis character
```
@greghendershott
Copy link
Owner

Thank you very much for taking time to prepare the pull request!

My first reaction is to wonder if this should actually be in the markdown library? Or at least, wonder if it should be something quite this specific to 'hellip and Scribble?

For example a user of the library already could walk the returned x-expressions and do this transformation.

However let's say that's inconvenient and maybe slow. In that case, I like your idea of a parameterizable function that defaults to values. But maybe it should be called for every x-expression symbol (a.k.a. HTML entity), not just 'hellip?

By the way, parse-markdown is already walking the x-expressions before returning them -- resolve-references has to look for footnote promises to force. So what I describe could be done as part of that existing walk, and probably not slow down things significantly.


To be clear, I'm just discussing this right now -- not asking you to change your PR like this. Curious what you think (independent of whether you or I would write the code).

@tim-brown
Copy link
Author

First things first... what I present here is a hack that gets me past Scribble!

I, too, considered doing this from the Scribble end... but the multiple back ends (HTML/Latex/text) made it look like a lot of work and risk to do this properly. It's not just 'hellip and Scribble, it's any entity not in that (rather short) list I keep referencing.

There's a choice of handling all entities (symbols) or non-scribble entities. The way I went allows for something that can be left transparent with values. If you want to handle all entities, you would need a handler that always has a Scribble entities clause:

(match-lambda
  [(and x (or 'rsquo 'rdquo ...)) x] ; Scribble entities clause
  ['hellip "something special"]
  ;; maybe here we have no match -- causing early concerns?
)

which isn't as simple.

Question is... is Scribble so important a target for the markdown parser that we consider Scribble's special entity/symbols to be special? Or, as you ask, is there something special about all symbols? Personally, I came across the 'hellip problem because I'm preparing something to go to Scribble (to nicely handle some Racket code); so in that light I'll answer "Yes, No".

Oh, I haven't gotten as far as seeing how your back-referencing "back hooks" on the footnotes play with Scribble (if Scribble even sees them, that is). But again, potentially a non-scribble-entity might find its way into the xexpr.

@tim-brown tim-brown closed this Nov 21, 2014
@tim-brown
Copy link
Author

Sorry, wrong button pressage

@tim-brown tim-brown reopened this Nov 21, 2014
@greghendershott
Copy link
Owner

Thanks for the reply.

Oh, I haven't gotten as far as seeing how your back-referencing "back hooks" on the footnotes play with Scribble (if Scribble even sees them, that is). But again, potentially a non-scribble-entity might find its way into the xexpr.

I think the footnotes are n/a for Scribble. I only mentioned this to say that the markdown library already does a full recursive walk of the x-expressions, for this purpose. So the incremental cost of having it look for symbol? elements would be negligible.

It's not just 'hellip and Scribble, it's any entity not in that (rather short) list I keep referencing.

IIUC the PR handles just 'hellip. If the markdown parser ever changes to emit some other non-Scribble entity e.g. 'foo, the PR won't handle that, correct?

I guess what I had in mind is:

  • Rename the param to something like current-entity-handler. The default is values.
  • The markdown library will walk the x-expression and call the function for every entity (x-expr symbol) encountered.
  • For convenience, the markdown library could provide a function scribble-entity-handler that you could supply as current-entity-handler. It would be like the match code you sketched out. The catch? What would a reasonable "else"/_ clause be? I don't know.

TL;DR I had mind something more general. But maybe you're right, that Scribble is a special case that's worth handling specially as in your PR.

@tim-brown
Copy link
Author

IIUC the PR handles just 'hellip. If the markdown parser ever changes to emit some other non-Scribble entity e.g. 'foo, the PR won't handle that, correct?

Correct. I'd've expected the author to wrap it, just as I wrapped your 'hellip. So we're now heading towards unDRYness.

I like the current-entity-handler -> values or a "derivative" of scribble-entity-handler.

What would a reasonable "else"/_ clause be?

Of course, values won't have an else clause. But a choice of else clause would inform extending the handlers. What follows is cud-chewment over how to extend/combine/compose the handler.
(I expect you could bring some Clojure experience to bear?)

I think if you're naming a function scribble-entity-handler, it shouldn't handle non-scribble-entities particularly gracefully (harsh, but fair). Contract current-entity-handler out as:

(contract-out current-entity-handler; pseudo-code at best
 (parameter/c
  (-> pre-content?
   (or/c
    pre-content?
    ;; #f, if inserted into an xexpr will bust it, and could make composition easier?
    #f))))

Then I can use... oh... is there a "composable or" combinator out there?

(define ((or-combinator f1 f2) v) (or (f1 v) (f2 v)))

Then I can use scribble-entity-handler as in:

(current-entity-handler
 (or-combinator
  scribble-entity-handler
  (match-lambda ['hellip "..."] [_ #f])))
;; actually, more likely:
(current-entity-handler
 (match-lambda
  ['hellip "..."]
  [(app scribble-entity-handler v) v]))

@greghendershott
Copy link
Owner

Thinking about this more, it seems like the key concept is this: Every time the markdown parser wants to do something "fancy" where it automatically replaces some x-expr (like "...") with a symbol entity (like 'hellip)? It should call a function that takes both, and decides which to use. That function should be current-entity-handler.

This sketch looks more complicated than it is, due to spelling out contracts for everything, but:

(provide
 (contract-out
  [current-entity-handler (parameter/c entity-handler/c)]
  [default-entity-handler entity-handler/c]
  [scribble-entity-handler entity-handler/c]))

;; Given an original xexpr and a proposed symbol entity substitution,
;; an entity-handler returns which to use.
(define entity-handler/c (-> xexpr/c symbol? xexpr/c))

;; A default entity-handler that accepts every proposed substitution.
(define (default-entity-handler _ sym)
  sym)

;; An entity-handler suitable for use to produce x-expressions that
;; you want to give to scribble, which expects only a limited list.
(define (scribble-entity-handler orig sym)
  (match sym
    [(or 'mdash 'ndash 'ldquo 'lsquo 'rdquo 'rsquo 'larr 'rarr 'prime) sym]
    [_ orig]))

(define current-entity-handler (make-parameter default-entity-handler))

The remaining change is that I should go through parse.rkt, and anyplace it's doing auto-fancy-pants stuff, run it through this handler.

Does that make sense?

greghendershott pushed a commit that referenced this pull request Nov 22, 2014
The current-entity-handler parameter enables customizing which HTML
entities are allowed in the resulting x-expressions.

The default-entity-handler allows any.

The scribble-entity-handler allows only those on Scribble's short
list (and was the motivating use case).

This is a possible alternative approach to:
  #48
@tim-brown
Copy link
Author

One thing I'm not sure of is the phrase:

Every time the markdown parser wants to do something "fancy"...

This means a repeated piece of code every time you generate (or consider generating) an entity symbol. Whereas your solutions that involve doing the work during a walk of the xexpr need only be extended in the one place.

But... Frankly I'd go with what you've just suggested. It is, at least, transparent - as opposed to the opacity of putting it in a walker people possibly don't even know they're calling. Unless you have a walker markdown-generated-xexpr->scribble-safe-xexpr. But I think that's a bit too much.

@greghendershott
Copy link
Owner

This means a repeated piece of code every time you generate (or consider generating) an entity symbol. Whereas your solutions that involve doing the work during a walk of the xexpr need only be extended in the one place.

You're right, and that concern is what tilted me towards suggesting a walk, originally.

What tilted me back was the 'helip example. The parser replaces any of "...", ". . ." and a couple other variations with 'hellip. A walk wouldn't know which of these originals to "un-replace" and restore.

If there were many dozens of these occurrences and a high likelihood of changes in the future, the walk is justified and I could store both the original text and the entity text, for the walk to choose later. That's the most robust. But I think that's probably over-designing things in this case.

@greghendershott
Copy link
Owner

the walk is justified and I could store both the original text and the entity text, for the walk to choose later

In fact, being careful to store both the original text and the entity text, in every appropriate such place ... would be no more robust than being careful to use ent in every appropriate such place, in my proposed pull request. Either way, if someone changes the code to insert a new entity symbol, and doesn't follow this policy, it's a problem.

But probably no one will add more entity symbols at all, much less frequently.

@tim-brown
Copy link
Author

currently parse-markdown produces a usable xexpr. if you used (what is fundamentally) a symbol/string pair to represent an entity, this would cease to be so.

If someone adds another "fancy" string to entity mapping (I have ".." in mind...), and they don't use (c-e-h) to handle it -- then the first time it goes to Scribble (or someone equally picky), you'll hear whinges.

Robust, maybe not. But close to self-healing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants