ID_Start and ID_Continue specification and regular expression #331

flip111 · 2018-10-18T12:53:20Z

I'm looking at this

Lines 718 to 740 in a514465

    
           <production name="IdentifierStart" rr:inline="true" oc:lexer="true"> 
        
             <description> 
        
               Based on the unicode identifier and pattern syntax 
        
                 (http://www.unicode.org/reports/tr31/) 
        
               And extended with a few characters. 
        
             </description> 
        
             <alt> 
        
               <character set="ID_Start"/> 
        
               <character set="Pc"/> <!-- Punctuation connectors (underscores) --> 
        
             </alt> 
        
           </production> 
        
           <production name="IdentifierPart" rr:inline="true" oc:lexer="true"> 
        
             <description> 
        
               Based on the unicode identifier and pattern syntax 
        
                 (http://www.unicode.org/reports/tr31/) 
        
               And extended with a few characters. 
        
             </description> 
        
             <alt> 
        
               <character set="ID_Continue"/> 
        
               <character set="Sc"/> <!-- Currency symbols --> 
        
             </alt> 
        
           </production>

well actually the EBNF version that can be downloaded from the website (The legacy one).

How should i implement this? When you look for unicode ID_Start there a jungle of documentation. But i only need the regex of this. Fortunately unicode has made a tool for this https://unicode.org/cldr/utility/regex.jsp?a=%5B%3AID_Start%3A%5D&b=

That leaves the question what about And extended with a few characters?

So perhaps to clear up this section of the spec some of the following could be done:

Specify the "extended few characters"
Give regular expression in the comments.
Links to the unicode tools.

I think the unicode tool gives the regular expression in some standard regex format, but there can be other engines like PCRE which can define other shorthand classes which could be used instead. Therefor it can be useful to include regular expressions in comments.

The text was updated successfully, but these errors were encountered:

Mats-SX · 2018-10-18T14:53:19Z

I believe it is the Pc and Sc classes that are referred to when it says a few characters. So it's ID_Start plus Pc for IdentifierStart and ID_Continue plus Sc for IdentifierPart.

You can check this code for the model's definition of these constructs (derived from Unicode):

openCypher/tools/grammar/src/main/java/org/opencypher/grammar/CharacterSet.java

Lines 310 to 314 in 4a2896e

    
           ID_Start( union( Ll.set, Lm.set, Lo.set, Lt.set, Lu.set, Nl.set, Other_ID_Start.set ) 
        
                             .except( Pattern_Syntax.set ).except( Pattern_White_Space.set ) ), 
        
           /** Characters allowed in an identifier. http://unicode.org/reports/tr31/ */ 
        
           ID_Continue( union( ID_Start.set, Mn.set, Mc.set, Nd.set, Pc.set, Other_ID_Continue.set ) 
        
                                .except( Pattern_Syntax.set ).except( Pattern_White_Space.set ) ),;

flip111 · 2018-10-18T16:50:11Z

Will it be acceptable to change anything in the comments of the xml ?

Mats-SX · 2018-10-19T07:44:27Z

Sure, but please note that in order to accept contributions there is a CLA that must be signed.

flip111 · 2018-10-19T15:05:30Z

Ah ok, thank you for the notice. I won't sign anything so i won't make PR's here. If this isn't important for anyone else, this issue can be closed. Hopefully for other people they will be able to find this issue to get the information that is not (yet) in the xml ..

Mats-SX · 2018-10-19T15:11:03Z

I'll keep it open for a bit to see if someone (like me) decides to improve the comment, and will close at that point (or before if nothing happens).

jacobfriedman mentioned this issue Jul 8, 2022

Incomplete EBNF Grammar #534

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ID_Start and ID_Continue specification and regular expression #331

ID_Start and ID_Continue specification and regular expression #331

flip111 commented Oct 18, 2018

Mats-SX commented Oct 18, 2018

flip111 commented Oct 18, 2018

Mats-SX commented Oct 19, 2018

flip111 commented Oct 19, 2018

Mats-SX commented Oct 19, 2018

ID_Start and ID_Continue specification and regular expression #331

ID_Start and ID_Continue specification and regular expression #331

Comments

flip111 commented Oct 18, 2018

Mats-SX commented Oct 18, 2018

flip111 commented Oct 18, 2018

Mats-SX commented Oct 19, 2018

flip111 commented Oct 19, 2018

Mats-SX commented Oct 19, 2018