Allow specifying locale for string matching when walk through syntax tree #8

cccccroge · 2024-02-12T13:50:08Z

Problem description

I just came across this project when dealing with the product i18n process. First of all I want to thank to @zealot128 to come up with a nice solution. This tool allow me to plan a quicker process of extracting all untranslated content in the Rails application.

As for my project there's a big pain point when using the tool, mainly because we are a Taiwan company where the untranslated content might be all in zh-TW. And current implementation will basically ignore those characters which result weird output.

For example:
In one of the ruby file there's the following content: (please ignore the shop_translate method, it was a weird method that actually not doing it's job, so this might need to be adjusted manually)

    flash[:error] = shop_translate('密碼不正確，請重試.')

And if I follow the prompt and I will get wrong result:

replaced string I18n.t("shop.shops_controller."), which is empty after the namspace

translation file looks like

zh-TW:
  shop:
    shops_controller: 密碼不正確，請重試.

expected result wil be something like I18n.t("shop.shops_controller.密碼不正確_請重試") and the following yaml:

  zh-TW:
    shop:
      shops_controller:
        密碼不正確_請重試: 密碼不正確，請重試.

And another issue is that the prompt is asking every possible strings, which is not ideal for extracting non-en characters. eg. Those strings that should be extracted to translation keys are having 100% chance of containing zh-TW characters. So it makes more sense the prompt is only triggered on those strings that contains at least one zh-TW character.

Proposed solution

Add a new configuration option called detect_locales that defaults to ['en']. This can be configured through CLI.
Each locale map to particular set of characters. For zh-TW it should be like [\u4e00-\u9fff]+

Example for ruby_adapter.rb

Update the ExtractI18n.key implementation to accept those locales
Update on_dstr and on_str implementation to first check if the string contains any characters from the target locales before processing

I havn't looked into other format (.slim, .erb) yet but the idea is similar.

And I think this api will benefit other usecase that extract non-en text to translation files.
Any thoughts on this design? I can make a PR if the design is ok.
(might need some time since I'm a React developer instead of Ruby developer lol. but I'm interested on solving this problem.

The text was updated successfully, but these errors were encountered:

cccccroge · 2024-02-12T14:05:29Z

And there's a tool in js ecosystem that doing similar job:
https://github.com/i18next/i18next-scanner

Maybe we can support sort of config file that gain maximum customization if the character set implementation is simply not enough. But might need further study and plan on the API.

cccccroge mentioned this issue Feb 12, 2024

Add doc for contributing #9

Merged

zealot128 added the help wanted Extra attention is needed label Jun 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow specifying locale for string matching when walk through syntax tree #8

Allow specifying locale for string matching when walk through syntax tree #8

cccccroge commented Feb 12, 2024 •

edited

Loading

cccccroge commented Feb 12, 2024

Allow specifying locale for string matching when walk through syntax tree #8

Allow specifying locale for string matching when walk through syntax tree #8

Comments

cccccroge commented Feb 12, 2024 • edited Loading

Problem description

Proposed solution

cccccroge commented Feb 12, 2024

cccccroge commented Feb 12, 2024 •

edited

Loading