keysyms: Add sharp S upper case mapping exception

The case mapping `ssharp` ß ↔ `U1E9E` ẞ was added in 13b30f4 but was broken: - For the lower case mapping it returned the keysym `0x10000df`, which is an invalid Unicode keysym. - For the upper case mapping it returned the upper Unicode code point rather than the corresponding keysym. It did accidentally enable the detection of alphabetic key type for the pair (ß, ẞ) though. However this detection was accidentally removed in 5c7c799 (v1.7) with an attempt to fix the wrong keysym case mapping. Finally both the *lower* case mapping and the key type detection were fixed for good when we implemented the complete Unicode simple case mappings and corresponding tests in e83d08d. However, the *upper* case mapping `ssharp` → `U1E9E` remained disabled. Indeed, ẞ is a relatively recent addition to Unicode (2008) and had no official recommendation, until recently. So while the lower mapping ẞ→ß exists in Unicode, its converse upper mapping does not. Yet since 2017 the Council for German Orthography (Rat für deutsche Rechtschreibung) recommends[^1] ẞ as the capitalization of ß. Due to its stability policies, the Unicode Character Database (UCD) that we use to generate our keysym case mappings (via ICU) cannot update the simple case mapping of ß. Discussions are currently ongoing in the Unicode mailing list[^2] and CLDR[^3] about how to deal with the new recommended case mapping. However, the discussions are oriented on text-processing and compatibility mappings, while libxkbcommon is on a rather lower level. It seems that the slow adoption of ẞ is partly due to the difficulty to type it. Since ẞ is used only for ALL CAPS casing, the expectation is to type it using CapsLock. While our detection of alphabetic key types works well[^4] for the pair (ß,ẞ), the *internal capitalization* currently does not work and is fixed by this commit. Added the ß → ẞ upper mapping: - Added an exception in the generation script - Fixed tests - Added documentation of the exceptions in `xkbcommon.h` - Added/updated log entries [^1]: https://www.rechtschreibrat.com/regeln-und-woerterverzeichnis/ [^2]: https://corp.unicode.org/pipermail/unicode/2024-November/011162.html [^3]: https://unicode-org.atlassian.net/browse/CLDR-17624 [^4]: Except libxkbcommon 1.7, see the second paragraph.
xkbcommon · Dec 15, 2024 · 54ff6e7 · 54ff6e7
1 parent 0ebdc4d
commit 54ff6e7
Show file tree

Hide file tree

Showing 11 changed files with 456 additions and 338 deletions.
diff --git a/changes/api/+großes-ẞ.breaking.md b/changes/api/+großes-ẞ.breaking.md
@@ -0,0 +1,2 @@
+Added the upper case mapping ß → ẞ (`ssharp` → `U1E9E`). This enable to type
+ẞ using CapsLock thanks to the internal capitalization rules.
diff --git a/changes/api/+großes-ẞ.bugfix.md b/changes/api/+großes-ẞ.bugfix.md
@@ -0,0 +1,2 @@
+Fixed the lower case mapping ẞ → ß (`U1E9E` → `ssharp`). This re-enable the detection
+of alphabetic key types for the pair (ß, ẞ).
diff --git a/changes/api/+unicode-16.breaking.md b/changes/api/+unicode-16.breaking.md
@@ -5,13 +5,16 @@ the following:
 - `xkb_keysym_to_lower()` and `xkb_keysym_to_upper()` give different output
   for keysyms not covered previously and handle *title*-cased keysyms.
 
-  Example of title-cased keysym: `0x10001f2` (`U+01F2` “ǲ”):
-  - `xkb_keysym_to_lower(0x10001f2) == 0x10001f3` (`U+01F3` “ǳ”)
-  - `xkb_keysym_to_upper(0x10001f2) == 0x10001f1` (`U+01F1` “Ǳ”)
+  Example of title-cased keysym: `U01F2` “ǲ”:
+  - `xkb_keysym_to_lower(U01F2) == U01F3` “ǲ” → “ǳ”
+  - `xkb_keysym_to_upper(U01F2) == U01F1` “ǲ” → “Ǳ”
 - *Implicit* alphabetic key types are better detected, because they use the
   latest Unicode case mappings and now handle the *title*-cased keysyms the
   same way as upper-case ones.
 
+Note: There is a single *exception* that do not follow the Unicode mappings:
+- `xkb_keysym_to_upper(ssharp) == U1E9E` “ß” → “ẞ”
+
 Note: As before, only *simple* case mappings (i.e. one-to-one) are supported.
 For example, the full upper case of `U+01F0` “ǰ” is “J̌” (2 characters: `U+004A`
 and `U+030C`), which would require 2 keysyms, which is not supported by the

diff --git a/data/keysyms.yaml b/data/keysyms.yaml
@@ -560,6 +560,7 @@
 0x00df:
   name: ssharp
   code point: 0x00DF
+  upper: 0x1001e9e # U1E9E
 0x00e0:
   name: agrave
   code point: 0x00E0

diff --git a/include/xkbcommon/xkbcommon.h b/include/xkbcommon/xkbcommon.h
@@ -552,9 +552,18 @@ xkb_utf32_to_keysym(uint32_t ucs);
  * If there is no such form, the keysym is returned unchanged.
  *
  * The conversion rules are the *simple* (i.e. one-to-one) Unicode case
- * mappings and do not depend on the locale. If you need the special
- * case mappings (i.e. not one-to-one or locale-dependent), prefer to
- * work with the Unicode representation instead, when possible.
+ * mappings (with some exceptions, see hereinafter) and do not depend
+ * on the locale. If you need the special case mappings (i.e. not
+ * one-to-one or locale-dependent), prefer to work with the Unicode
+ * representation instead, when possible.
+ *
+ * Exceptions to the Unicode mappings:
+ *
+ * | Lower keysym | Lower letter | Upper keysym | Upper letter | Comment |
+ * | ------------ | ------------ | ------------ | ------------ | ------- |
+ * | `ssharp`     | `U+00DF`: ß  | `U1E9E`      | `U+1E9E`: ẞ  | [Council for German Orthography] |
+ *
+ * [Council for German Orthography]: https://www.rechtschreibrat.com/regeln-und-woerterverzeichnis/
  *
  * @since 0.8.0: Initial implementation, based on `libX11`.
  * @since 1.8.0: Use Unicode 16.0 mappings for complete Unicode coverage.

diff --git a/meson.build b/meson.build
@@ -742,8 +742,12 @@ test(
 )
 test(
     'keymap',
-    executable('test-keymap', 'test/keymap.c', 'test/keysym.h',
-               dependencies: test_dep),
+    executable(
+        'test-keymap',
+        'test/keymap.c',
+        'test/keysym.h',
+        'test/keysym-case-mapping.h',
+        dependencies: test_dep),
     env: test_env,
 )
 test(

diff --git a/scripts/update-unicode.py b/scripts/update-unicode.py
@@ -90,6 +90,7 @@
 from pathlib import Path
 from typing import (
     Any,
+    ClassVar,
     Generator,
     Generic,
     Iterable,
@@ -101,15 +102,18 @@
     TypeVar,
     cast,
 )
+import unicodedata
 
 import icu
+import jinja2
 import yaml
 
 assert sys.version_info >= (3, 12)
 
 c = icu.Locale.createFromName("C")
 icu.Locale.setDefault(c)
 
+SCRIPT = Path(__file__)
 CodePoint = NewType("CodePoint", int)
 Keysym = NewType("Keysym", int)
 KeysymName = NewType("KeysymName", str)
@@ -294,6 +298,9 @@ class Entry:
     upper: int
     is_lower: bool
     is_upper: bool
+    # [NOTE] Exceptions must be documented in `xkbcommon.h`.
+    to_upper_exceptions: ClassVar[dict[str, str]] = {"ß": "ẞ"}
+    "Upper mappings exceptions"
 
     @classmethod
     def zeros(cls) -> Self:
@@ -326,16 +333,20 @@ def lower_delta(cls, cp: CodePoint) -> int:
     def upper_delta(cls, cp: CodePoint) -> int:
         return cp - cls.to_upper_cp(cp)
 
-    @staticmethod
-    def to_upper_cp(cp: CodePoint) -> CodePoint:
+    @classmethod
+    def to_upper_cp(cls, cp: CodePoint) -> CodePoint:
+        if upper := cls.to_upper_exceptions.get(chr(cp)):
+            return ord(upper)
         return icu.Char.toupper(cp)
 
     @staticmethod
     def to_lower_cp(cp: CodePoint) -> CodePoint:
         return icu.Char.tolower(cp)
 
-    @staticmethod
-    def to_upper_char(char: str) -> str:
+    @classmethod
+    def to_upper_char(cls, char: str) -> str:
+        if upper := cls.to_upper_exceptions.get(char):
+            return upper
         return icu.Char.toupper(char)
 
     @staticmethod
@@ -1954,6 +1965,37 @@ def run(
         best_solution.test(config)
         if write:
             best_solution.write(root)
+            cls.write_tests(root)
+
+    @classmethod
+    def write_tests(cls, root: Path) -> None:
+        # Configure Jinja
+        template_loader = jinja2.FileSystemLoader(root, encoding="utf-8")
+        jinja_env = jinja2.Environment(
+            loader=template_loader,
+            keep_trailing_newline=True,
+            trim_blocks=True,
+            lstrip_blocks=True,
+        )
+
+        def code_point_name_constant(c: str, padding: int = 0) -> str:
+            if not (name := unicodedata.name(c)):
+                raise ValueError(f"No Unicode name for code point: U+{ord(c):0>4X}")
+            name = name.replace("-", "_").replace(" ", "_").upper()
+            return name.ljust(padding)
+
+        jinja_env.filters["code_point"] = lambda c: f"0x{ord(c):0>4x}"
+        jinja_env.filters["code_point_name_constant"] = code_point_name_constant
+        path = root / "test/keysym-case-mapping.h"
+        template_path = path.with_suffix(f"{path.suffix}.jinja")
+        template = jinja_env.get_template(str(template_path.relative_to(root)))
+        with path.open("wt", encoding="utf-8") as fd:
+            fd.writelines(
+                template.generate(
+                    upper_exceptions=Entry.to_upper_exceptions,
+                    script=SCRIPT.relative_to(root),
+                )
+            )
 
 
 ################################################################################
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		Added the upper case mapping ß → ẞ (`ssharp` → `U1E9E`). This enable to type
		ẞ using CapsLock thanks to the internal capitalization rules.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		Fixed the lower case mapping ẞ → ß (`U1E9E` → `ssharp`). This re-enable the detection
		of alphabetic key types for the pair (ß, ẞ).