Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow specifying locale for string matching when walk through syntax tree #8

Open
cccccroge opened this issue Feb 12, 2024 · 1 comment
Labels
help wanted Extra attention is needed

Comments

@cccccroge
Copy link
Contributor

cccccroge commented Feb 12, 2024

Problem description

I just came across this project when dealing with the product i18n process. First of all I want to thank to @zealot128 to come up with a nice solution. This tool allow me to plan a quicker process of extracting all untranslated content in the Rails application.

As for my project there's a big pain point when using the tool, mainly because we are a Taiwan company where the untranslated content might be all in zh-TW. And current implementation will basically ignore those characters which result weird output.

For example:
In one of the ruby file there's the following content: (please ignore the shop_translate method, it was a weird method that actually not doing it's job, so this might need to be adjusted manually)

    flash[:error] = shop_translate('密碼不正確,請重試.')

And if I follow the prompt and I will get wrong result:

  • replaced string I18n.t("shop.shops_controller."), which is empty after the namspace
  • translation file looks like
    zh-TW:
      shop:
        shops_controller: 密碼不正確,請重試.

expected result wil be something like I18n.t("shop.shops_controller.密碼不正確_請重試") and the following yaml:

  zh-TW:
    shop:
      shops_controller:
        密碼不正確_請重試: 密碼不正確,請重試.

And another issue is that the prompt is asking every possible strings, which is not ideal for extracting non-en characters. eg. Those strings that should be extracted to translation keys are having 100% chance of containing zh-TW characters. So it makes more sense the prompt is only triggered on those strings that contains at least one zh-TW character.

Proposed solution

  • Add a new configuration option called detect_locales that defaults to ['en']. This can be configured through CLI.
  • Each locale map to particular set of characters. For zh-TW it should be like [\u4e00-\u9fff]+

Example for ruby_adapter.rb

  • Update the ExtractI18n.key implementation to accept those locales
  • Update on_dstr and on_str implementation to first check if the string contains any characters from the target locales before processing

I havn't looked into other format (.slim, .erb) yet but the idea is similar.


And I think this api will benefit other usecase that extract non-en text to translation files.
Any thoughts on this design? I can make a PR if the design is ok.
(might need some time since I'm a React developer instead of Ruby developer lol. but I'm interested on solving this problem.

@cccccroge
Copy link
Contributor Author

And there's a tool in js ecosystem that doing similar job:
https://github.com/i18next/i18next-scanner

Maybe we can support sort of config file that gain maximum customization if the character set implementation is simply not enough. But might need further study and plan on the API.

@zealot128 zealot128 added the help wanted Extra attention is needed label Jun 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants