Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

makeup_html produces different output when compared to Pygments #3

Open
lkarthee opened this issue Jan 14, 2022 · 3 comments
Open

makeup_html produces different output when compared to Pygments #3

lkarthee opened this issue Jan 14, 2022 · 3 comments

Comments

@lkarthee
Copy link

lkarthee commented Jan 14, 2022

makeup_html produces different tokens when compared to Pygments .

Makeup styles html tags which are recognised using k which is keyword rather than styling it with nt which is name_tag.

<!-- produced by pygments site -->
<pre>
  <span></span>
  <span class="p">&lt;</span>
  <span class="nt">h1</span>
  <span class="na">alt</span>
  <span class="o">=</span>
  <span class="s">"blah"</span>
  <span class="p">&gt;</span>This is heading
  <span class="p">&lt;/</span>
  <span class="nt">h1</span>
  <span class="p">&gt;</span>
</pre>

<!-- produced by makeup_html, data-group-ids deleted-->
<pre class="highlight">
  <code>
    <span class="p">&lt;</span>
    <span class="k">h1</span> <!-- "k" instead of "nt" -->
    <span class="w"> </span>
    <span class="na">alt</span>
    <span class="o">=</span>
    <span class="s">&quot;blah&quot;</span>
    <span class="p">&gt;</span>
    <span class="s">This is a heading</span>
    <span class="p">&lt;/</span>
    <span class="k">h1</span> <!-- "k" instead of "nt" -->
    <span class="p">&gt;</span>
  </code>
</pre>

Makeup styles html tags which are not recognised using s which is string rather than styling it with nt which is name_tag. Also it styles unrecognised attributes with s which is string rather than na which is name_attribute.

<!-- produced by pygments site -->
<pre>
  <span></span>
  <span class="nt">&lt;.alert</span>
  <span class="na">primary=</span>
  <span class="s">"true"</span>
  <span class="nt">&gt;</span>This is heading 1
  <span class="nt">&lt;/.alert&gt;</span>
</pre>

<!-- produced by makeup_html , data-group-ids deleted -->
<pre class="highlight">
  <code>
    <span class="p">&lt;</span>
    <span class="s">.alert</span><!-- "s" instead of "nt" -->
    <span class="w"> </span>
    <span class="s">primary</span><!-- "s" instead of "na" -->
    <span class="o">=</span>
    <span class="s">&quot;true&quot;</span>
    <span class="p">&gt;</span>
    <span class="s">This is a heading</span>
    <span class="p">&lt;/</span>
    <span class="s">.alert</span><!-- "s" instead of "nt" -->
    <span class="p">&gt;</span>
  </code>
</pre>

Here are my queries after using Makeup html :

  • Is this intentional or a bug?
  • Any reason for lexing html in strictest possible way when the html language is extended by people in many ways like for example LiveView components, etc. It becomes a problem to use this lexer as alternative to Pygments.
  • Also can the data-group-ids can be deactivated or activated using a flag passed using options ?
@lkarthee lkarthee changed the title makeup_html produces different output when compared to Pygments.lexer makeup_html produces different output when compared to Pygments Jan 14, 2022
@josevalim
Copy link
Contributor

Hi @lkarthee, feel free to send PRs for those changes. I think aligning with pygments can be positive. This is mostly a community project, so if it can be made more useful, contributions are welcome!

@javiergarea
Copy link
Collaborator

Hi @lkarthee, thank you so much for providing your feedback. ❤️

This project has been experimental as it has been developed with the single objective of implementing an HTML lexer for makeup. That's one of the reasons I decided to follow HTML5 syntax as it seemed easier at first. As @josevalim said, any contribution is gladly welcome and the three of your concerns (i.e., aligning with pygments, weakening the syntax and deactivation of data-group-ids via flags) are quite interesting features for the project IMHO.

@lkarthee
Copy link
Author

Thank you @josevalim and @javiergarea - I appreciate your effort in maintaining this project.

I am using a fork with changes I made in a project(bootstrap library for phoenix components). I am fixing some bugs with attributes highlighting - i will send pr, once those changes are stable? (May be in a week).

For now I am using with following changes:

def not_keywords_stringify(tokens) do
    not_keywords_stringify(tokens, {0, []}, [])
  end

  def skip_whitespace(tokens, token) do
    queue = 
      Enum.reduce_while(tokens, [token], fn t, acc -> 
        case t do
          {:string , tup, list} ->
            {:halt, acc ++ [{:name_tag, tup, list}]}

          {:keyword, tup, list} ->
            {:halt, acc ++ [{:name_tag, tup, list}]}

          _ ->
            {:cont, acc ++ [t]}

        end
      end)
    {_, tokens} = Enum.split(tokens, length(queue) - 1)
    {queue, tokens}
  end

  def not_keywords_stringify(
    [{:punctuation, _, "<"} = token | tokens],
    {id, []} = queue_tuple,
    result) do
    {queue, tokens} = skip_whitespace(tokens, token)
    not_keywords_stringify(tokens, {id + 1, []}, result ++ queue)
  end

  def not_keywords_stringify(
    [{:punctuation, _, "<"} = token | tokens],
    {id, orig_queue} = queue_tuple,
    result) do
    {queue, tokens} = skip_whitespace(tokens, token)
    not_keywords_stringify(tokens, {id + 1, []}, result ++ orig_queue ++ queue)
  end

  def not_keywords_stringify(
    [{:punctuation, _, "</"} = token | tokens],
    {id, []} = queue_tuple,
    result) do
    {queue, tokens} = skip_whitespace(tokens, token)
    not_keywords_stringify(tokens, {id + 1, []}, result ++ queue)
  end

  def not_keywords_stringify(
    [{:punctuation, _, "</"} = token | tokens],
    {id, orig_queue} = queue_tuple,
    result) do
    {queue, tokens} = skip_whitespace(tokens, token)
    not_keywords_stringify(tokens, {id + 1, []}, result ++ orig_queue ++ queue)
  end

  def not_keywords_stringify(
    [{:punctuation, _, ">"} = token | tokens],
    {id, queue} = queue_tuple,
    result) do
    {queue, _} =
      Enum.reduce(queue, {[], nil},fn {type, mid, data} = curr, {acc, prev} -> 
        prev_type = 
          case prev do
            nil ->
              nil

            {prev_type, _, _} ->
              prev_type
              
          end
            
        cond do
          prev_type == nil and type == :string ->
            {acc ++ [{:name_attribute, mid, data}], curr}

          prev_type == :whitespace and type == :string ->
            {acc ++ [{:name_attribute, mid, data}], curr}
          
          true ->
            {acc ++ [curr], curr}
            
        end
      end)
    not_keywords_stringify(tokens, {id, []}, result ++ queue ++ [token])
  end

  def not_keywords_stringify([token | tokens] , {id, queue} = queue_tuple, result) do
    not_keywords_stringify(tokens, {id, queue ++ [token]}, result)
  end

  def not_keywords_stringify([], {_id, queue}, result),
    do: result ++ queue

  @impl Makeup.Lexer
  def postprocess(tokens, _opts \\ []) do
    tokens
    |> char_stringify()
    |> commentify()
    |> keyword_stringify()
    |> attributify()
    |> element_stringify()
    |> not_keywords_stringify()
  end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants