Skip to content

Commit

Permalink
keysyms: Add sharp S upper case mapping exception
Browse files Browse the repository at this point in the history
The lower case mapping `U1E9E` ẞ → `ssharp` ß was added in
13b30f4 and then confirmed
when we implemented the complete Unicode simple case mappings
in e83d08d.

However, the lower case mapping `ssharp` → `U1E9E` was not added
in either commits, because ẞ is a relatively recent addition to
Unicode (2008) and had no official recommendation, until recently.
Since 2017 the Council for German Orthography (Rat für deutsche
Rechtschreibung) recommends[^1] ẞ as the capitalization of ß.

Due to its stability policies, the Unicode Character Database (UCD)
that we use to generate our keysym case mapping cannot update the
simple case mapping of ß. Discussions are currently ongoing in the
Unicode mailing list[^2] and CLDR[^3] about how to deal with the new
recommended case mapping. However, the discussions are oriented on
text-processing and compatibility mappings, while libxkbcommon is
on a rather lower level.

It seems that the slow adoption of ẞ is partly due to the difficulty
to type it. Since ẞ is used only for ALL CAPS casing, the expectation
is to type it using CapsLock. While our detection of alphabetic key
types works well for the pair (ß,ẞ) since the implementation of the
complete Unicode case mappings, the internal capitalization currently
does not work and is fixed by this commit.

Added the ß → ẞ upper mapping:
- Added an exception in the generation script
- Fixed tests
- Added documentation of the exceptions in `xkbcommon.h`

[^1]: https://www.rechtschreibrat.com/regeln-und-woerterverzeichnis/
[^2]: https://corp.unicode.org/pipermail/unicode/2024-November/011162.html
[^3]: https://unicode-org.atlassian.net/browse/CLDR-17624
  • Loading branch information
wismill committed Dec 9, 2024
1 parent e0130e3 commit ef545c9
Show file tree
Hide file tree
Showing 7 changed files with 355 additions and 324 deletions.
2 changes: 2 additions & 0 deletions changes/api/+großes-ẞ.breaking.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Added the upper case mapping ß → ẞ (`ssharp``U1E9E`). This enable to type
ẞ using CapsLock thanks to the internal capitalization rules.
9 changes: 6 additions & 3 deletions changes/api/+unicode-16.breaking.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,16 @@ the following:
- `xkb_keysym_to_lower()` and `xkb_keysym_to_upper()` give different output
for keysyms not covered previously and handle *title*-cased keysyms.

Example of title-cased keysym: `0x10001f2` (`U+01F2` “Dz”):
- `xkb_keysym_to_lower(0x10001f2) == 0x10001f3` (`U+01F3` “dz”)
- `xkb_keysym_to_upper(0x10001f2) == 0x10001f1` (`U+01F1` “DZ”)
Example of title-cased keysym: `U01F2` “Dz”:
- `xkb_keysym_to_lower(U01F2) == U01F3` “Dz” → “dz”
- `xkb_keysym_to_upper(U01F2) == U01F1` “Dz” → “DZ”
- *Implicit* alphabetic key types are better detected, because they use the
latest Unicode case mappings and now handle the *title*-cased keysyms the
same way as upper-case ones.

Note: There is a single *exception* that do not follow the Unicode mappings:
- `xkb_keysym_to_upper(ssharp) == U1E9E` “ß” → “ẞ”

Note: As before, only *simple* case mappings (i.e. one-to-one) are supported.
For example, the full upper case of `U+01F0` “ǰ” is “J̌” (2 characters: `U+004A`
and `U+030C`), which would require 2 keysyms, which is not supported by the
Expand Down
1 change: 1 addition & 0 deletions data/keysyms.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -560,6 +560,7 @@
0x00df:
name: ssharp
code point: 0x00DF
upper: 0x1001e9e # U1E9E
0x00e0:
name: agrave
code point: 0x00E0
Expand Down
15 changes: 12 additions & 3 deletions include/xkbcommon/xkbcommon.h
Original file line number Diff line number Diff line change
Expand Up @@ -552,9 +552,18 @@ xkb_utf32_to_keysym(uint32_t ucs);
* If there is no such form, the keysym is returned unchanged.
*
* The conversion rules are the *simple* (i.e. one-to-one) Unicode case
* mappings and do not depend on the locale. If you need the special
* case mappings (i.e. not one-to-one or locale-dependent), prefer to
* work with the Unicode representation instead, when possible.
* mappings (with some exceptions, see hereinafter) and do not depend
* on the locale. If you need the special case mappings (i.e. not
* one-to-one or locale-dependent), prefer to work with the Unicode
* representation instead, when possible.
*
* Exceptions to the Unicode mappings:
*
* | Lower keysym | Lower letter | Upper keysym | Upper letter | Comment |
* | ------------ | ------------ | ------------ | ------------ | ------- |
* | `ssharp` | `U+00DF`: ß | `U1E9E` | `U+1E9E`: ẞ | [Council for German Orthography] |
*
* [Council for German Orthography]: https://www.rechtschreibrat.com/regeln-und-woerterverzeichnis/
*
* @since 0.8.0: Initial implementation, based on `libX11`.
* @since 1.8.0: Use Unicode 16.0 mappings for complete Unicode coverage.
Expand Down
16 changes: 12 additions & 4 deletions scripts/update-unicode.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,7 @@
from pathlib import Path
from typing import (
Any,
ClassVar,
Generator,
Generic,
Iterable,
Expand Down Expand Up @@ -294,6 +295,9 @@ class Entry:
upper: int
is_lower: bool
is_upper: bool
# [NOTE] Exceptions must be documented in `xkbcommon.h`.
to_upper_exceptions: ClassVar[dict[str, str]] = {"ß": "ẞ"}
"Upper mappings exceptions"

@classmethod
def zeros(cls) -> Self:
Expand Down Expand Up @@ -326,16 +330,20 @@ def lower_delta(cls, cp: CodePoint) -> int:
def upper_delta(cls, cp: CodePoint) -> int:
return cp - cls.to_upper_cp(cp)

@staticmethod
def to_upper_cp(cp: CodePoint) -> CodePoint:
@classmethod
def to_upper_cp(cls, cp: CodePoint) -> CodePoint:
if upper := cls.to_upper_exceptions.get(chr(cp)):
return ord(upper)
return icu.Char.toupper(cp)

@staticmethod
def to_lower_cp(cp: CodePoint) -> CodePoint:
return icu.Char.tolower(cp)

@staticmethod
def to_upper_char(char: str) -> str:
@classmethod
def to_upper_char(cls, char: str) -> str:
if upper := cls.to_upper_exceptions.get(char):
return upper
return icu.Char.toupper(char)

@staticmethod
Expand Down
Loading

0 comments on commit ef545c9

Please sign in to comment.