-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Update collation #168
base: master
Are you sure you want to change the base?
Conversation
This test file used by ICU might be a good alternative source to the old tailoring test data. |
Hey @camertron, take a look at #52 and this gist – they explain why some of the tests were marked as pending. Essentially, the main reason was that even back then the tests were already outdated, so I only aimed to pass the tests that were passed by ICU4J implementation. On top of that there was this problem with denormalized Japanese code points that resulted in one test failure, but, as mentioned in the #52, might cause more incorrect ordering on real data. |
Ok, thanks @KL-7. I remember these from before. I understand why the tests were marked as pending, but I can't find any importer or other code that marks tests as pending if ICU doesn't sort them correctly. I attempted to do something like this with the I'm currently discussing the future of the collation test data on the CLDR users mailing list and will report back here if I learn anything useful. |
Oh, sorry, I thought you were asking about the why. Regarding the how, it was done semi-manually (I went through our failures and confirmed that ICU failed on them as well) and I don't think I have any useful code left. |
i guess we can close this pr. |
@tahsin352 why? |
it seems very old pr, no updates on it for long time. |
@tahsin352 that's true, but I'd like to keep it open. Hopefully I can use this work in the future as a springboard to finally modernize collation. |
This PR is meant to address updating our current collation implementation to CLDR v26 and ICU 54.1.1. At the moment, there are several hurdles that need to be overcome:
With this branch checked out, if you run
bundle exec rake clean_vendored update:tailoring_data update:collation_tries update:tailoring_tests
and then runFULL_SPEC=true bundle exec rspec spec/collation/tailoring_spec.rb
quite a few locales report a bunch of failures. The most alarming of these is Japanese, which has 1007 failures out of 3339 active. I know that we haven't yet addressed things like stroke order in our collation implementation, so maybe that's the reason. Other locales like Spanish have 1 failure out of 402, which I don't understand either.Anyway, I would really appreciate some help on this.