Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do I use different locale #20

Open
Hultner opened this issue Sep 13, 2018 · 7 comments
Open

How do I use different locale #20

Hultner opened this issue Sep 13, 2018 · 7 comments

Comments

@Hultner
Copy link

Hultner commented Sep 13, 2018

In the UCA standard's introduction they present an example as the following

Language Swedish: German:
-- z < ö ö < z

@jtauber How would I achieve this with PyUCA?

If I use the foreign C++ bindings with PyICU I can produce the above example like this

>>> def collate_compare(collator, a, b): return collator.getSortKey(a) > collator.getSortKey(b)
...
>>> col_de = Collator.createInstance(Locale("de_DE"))
>>> col_de.setStrength(Collator.PRIMARY)
>>> col_se = Collator.createInstance(Locale("sv_SE"))
>>> col_se.setStrength(Collator.PRIMARY)
>>> collate_compare(col_de, "ö", "z") # German: ö is more than z => False
False
>>> collate_compare(col_se, "ö", "z") # Swedish: ö is more than z => True
True

However, I would much prefer a pure python solution without mixing in external C++ dependencies as these make installation and usage cumbersome. I could not find any guidance in the documentation on how to achieve this goal with PyUCA so would love any help or pointers.

@jtauber
Copy link
Owner

jtauber commented Oct 20, 2018

The way to achieve different sort orders is to use a different collation element table. PyICU must ship with different CETs for different locales.

@dvorapa
Copy link

dvorapa commented May 19, 2019

Hm, so pyuca is only configured to uca-en and any other locale would need to download uca-xy raw files? That's not convenient (e.g. for installing pyuca from PyPI)

Perhaps pyuca could have some sort of a switch to use different localizations and get all the files instead of the English one only?

@jtauber
Copy link
Owner

jtauber commented May 20, 2019

It is not just configured for English-only. It by default uses the DUCET (which is suitable for many things beyond English). You can supply an alternative CET in the constructor.

If there are alternative CETs we could ship with, I'm happy to do that.

@dvorapa
Copy link

dvorapa commented May 21, 2019

Well, I need specifically uca-cs (based on CLDR: https://github.com/unicode-org/cldr/tree/master/common/collation, could be tested here: http://demo.icu-project.org/icu-bin/collation.html)

I thought pyuca could help me with that.

@jtauber
Copy link
Owner

jtauber commented May 25, 2019

pyuca is just an implementation of the UCA. CLDR is an extension of UCA which I haven't (yet) implemented. CLDR support could be added to pyuca or a new library could be created.

@jtauber
Copy link
Owner

jtauber commented May 25, 2019

However, pyuca can still be used for locale-specific collation, you just need to manually create the appropriate CET for your locale.

@dvorapa
Copy link

dvorapa commented May 26, 2019

I see, well, this way is not convenient for automatic testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants