Bijective Regex #57
Replies: 2 comments
-
Idea 1: BIRDsThis proposal seeks to define the BIjective Regex DSL (BIRD): a regex-like language that is bijective, allowing us to keep the unicity property of the KE language. SyntaxKE integrationBIRD may be integrated into KEs to match exactly one chunk by surrounding them with Why couldn't BIRD match multiple chunks?In theory, it could, but I suspect the cost of being able to do so would be too great. We can expand its functionality later if we want to, but be prepared for heavy canonization interactions. Why couldn't BIRD be a subchunk?Mostly because this would impact canonization greatly, as Special characters
Normal charactersThey are always matched literally. A syntactic character may be literally matched by prefixing it with RangesRanges are written as a sorted list of inclusive intervals between UTF codepoints. None of these intervals can intersect, nor be joined into a larger interval. Intervals matching a single character may exist, as long as other intervals are included in the range. Character classesThe following character classes are defined:
QuantifiersQuantifiers allow repeating a quantifiable pattern
Only ranges are quantifiables. Literal matches are not quantifiable, letting words be matched literally without needing to convert repeating letters to quantifiers-syntax. CommentsWhy can't BIRD have capture groups?Because capture groups would have to be expressed in the language, but wouldn't change the set of keys defined, breaking the bijectivity. Why can't BIRD have alternatives?Because they create many ways to break bijectivity, some of which are very tricky to detect. On top of that, alternatives may be better as their own DSL, exposing easier semantics for infrastructures to optimise upon. This could be done as the ADSL: |
Beta Was this translation helpful? Give feedback.
-
Idea 2: Alternatives DSL (ADSL) + Range DSL (RDSL)This idea needs further study before being turned into a proposal for this discussion. |
Beta Was this translation helpful? Give feedback.
-
Introduction
We've been pining for more precise wilds than
*
for quite some time. With the new key expression language, we can now introduce DSLs to do just that.Beta Was this translation helpful? Give feedback.
All reactions