Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xgettext: multiple starting lines for a string are not well supported #13

Open
armijnhemel opened this issue Mar 15, 2024 · 3 comments
Open

Comments

@armijnhemel
Copy link

armijnhemel commented Mar 15, 2024

In the current xgettext implementation I can see at line 115 https://github.com/nexB/source-inspector/blob/9511f56b44ac7c5644b34d413146d58dd9fa7ea0/src/source_inpector/strings_xgettext.py#L115 the following:

_, _, start_line = line.rpartition(":")

This is likely leading to the wrong results, as a line can have multiple instances of start_line, which you aren't catching. As an example, I used xgettext with the same parameters as you did on libbb/lineedit.c from BusyBox:

$ xgettext --omit-header --extract-all --no-wrap lineedit.c

Some of the result lines:

#: lineedit.c:834 lineedit.c:890 lineedit.c:893
msgid "."
msgstr ""

As you can see there are multiple file/line number entries there. It seems that at some point the authors of xgettext decided to combine these. Your code does not correctly process these lines:

>>> line = '#: lineedit.c:834 lineedit.c:890 lineedit.c:893'
>>> _, _, start_line = line.rpartition(":")
>>> start_line
'893'
@armijnhemel
Copy link
Author

armijnhemel commented Mar 15, 2024

When fixing this, please think of : possibly appearing in a file name as well. An easy test case: I moved lineedit.c to lineedit:834.c and then reran xgettext:

#: lineedit:834.c:834 lineedit:834.c:890 lineedit:834.c:893
msgid "."
msgstr ""

so just splitting on : might not be the right approach.

@armijnhemel
Copy link
Author

Another option would be to use the --strict option, but that would require a (slight) rewrite of the code, plus it is discouraged:

Write out a strict Uniforum conforming PO file. Note that this Uniforum format should be avoided because it doesn’t support the GNU extensions.

@armijnhemel
Copy link
Author

Note: if the goal is to provide each string found in a source code file and report it, but you don't need to necessarily report duplicates, then the current code is of course complete fine.

pombredanne added a commit that referenced this issue Mar 15, 2024
Reference: #13
Reference: #14
Reported-by: Armijn Hemel @armijnhemel
Signed-off-by: Philippe Ombredanne <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant