How do I use different locale #20

Hultner · 2018-09-13T12:54:07Z

In the UCA standard's introduction they present an example as the following

Language	Swedish:	German:
--	z < ö	ö < z

@jtauber How would I achieve this with PyUCA?

If I use the foreign C++ bindings with PyICU I can produce the above example like this

>>> def collate_compare(collator, a, b): return collator.getSortKey(a) > collator.getSortKey(b)
...
>>> col_de = Collator.createInstance(Locale("de_DE"))
>>> col_de.setStrength(Collator.PRIMARY)
>>> col_se = Collator.createInstance(Locale("sv_SE"))
>>> col_se.setStrength(Collator.PRIMARY)
>>> collate_compare(col_de, "ö", "z") # German: ö is more than z => False
False
>>> collate_compare(col_se, "ö", "z") # Swedish: ö is more than z => True
True

However, I would much prefer a pure python solution without mixing in external C++ dependencies as these make installation and usage cumbersome. I could not find any guidance in the documentation on how to achieve this goal with PyUCA so would love any help or pointers.

jtauber · 2018-10-20T01:47:36Z

The way to achieve different sort orders is to use a different collation element table. PyICU must ship with different CETs for different locales.

dvorapa · 2019-05-19T16:17:41Z

Hm, so pyuca is only configured to uca-en and any other locale would need to download uca-xy raw files? That's not convenient (e.g. for installing pyuca from PyPI)

Perhaps pyuca could have some sort of a switch to use different localizations and get all the files instead of the English one only?

jtauber · 2019-05-20T15:16:40Z

It is not just configured for English-only. It by default uses the DUCET (which is suitable for many things beyond English). You can supply an alternative CET in the constructor.

If there are alternative CETs we could ship with, I'm happy to do that.

dvorapa · 2019-05-21T11:09:03Z

Well, I need specifically uca-cs (based on CLDR: https://github.com/unicode-org/cldr/tree/master/common/collation, could be tested here: http://demo.icu-project.org/icu-bin/collation.html)

I thought pyuca could help me with that.

jtauber · 2019-05-25T23:22:13Z

pyuca is just an implementation of the UCA. CLDR is an extension of UCA which I haven't (yet) implemented. CLDR support could be added to pyuca or a new library could be created.

jtauber · 2019-05-25T23:22:53Z

However, pyuca can still be used for locale-specific collation, you just need to manually create the appropriate CET for your locale.

dvorapa · 2019-05-26T12:52:30Z

I see, well, this way is not convenient for automatic testing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do I use different locale #20

How do I use different locale #20

Hultner commented Sep 13, 2018

jtauber commented Oct 20, 2018

dvorapa commented May 19, 2019 •

edited

Loading

jtauber commented May 20, 2019 •

edited

Loading

dvorapa commented May 21, 2019

jtauber commented May 25, 2019

jtauber commented May 25, 2019

dvorapa commented May 26, 2019

How do I use different locale #20

How do I use different locale #20

Comments

Hultner commented Sep 13, 2018

jtauber commented Oct 20, 2018

dvorapa commented May 19, 2019 • edited Loading

jtauber commented May 20, 2019 • edited Loading

dvorapa commented May 21, 2019

jtauber commented May 25, 2019

jtauber commented May 25, 2019

dvorapa commented May 26, 2019

dvorapa commented May 19, 2019 •

edited

Loading

jtauber commented May 20, 2019 •

edited

Loading