xgettext: multiple starting lines for a string are not well supported #13

armijnhemel · 2024-03-15T11:11:03Z

In the current xgettext implementation I can see at line 115 https://github.com/nexB/source-inspector/blob/9511f56b44ac7c5644b34d413146d58dd9fa7ea0/src/source_inpector/strings_xgettext.py#L115 the following:

_, _, start_line = line.rpartition(":")

This is likely leading to the wrong results, as a line can have multiple instances of start_line, which you aren't catching. As an example, I used xgettext with the same parameters as you did on libbb/lineedit.c from BusyBox:

$ xgettext --omit-header --extract-all --no-wrap lineedit.c

Some of the result lines:

#: lineedit.c:834 lineedit.c:890 lineedit.c:893
msgid "."
msgstr ""

As you can see there are multiple file/line number entries there. It seems that at some point the authors of xgettext decided to combine these. Your code does not correctly process these lines:

>>> line = '#: lineedit.c:834 lineedit.c:890 lineedit.c:893'
>>> _, _, start_line = line.rpartition(":")
>>> start_line
'893'

The text was updated successfully, but these errors were encountered:

armijnhemel · 2024-03-15T11:20:20Z

When fixing this, please think of : possibly appearing in a file name as well. An easy test case: I moved lineedit.c to lineedit:834.c and then reran xgettext:

#: lineedit:834.c:834 lineedit:834.c:890 lineedit:834.c:893
msgid "."
msgstr ""

so just splitting on : might not be the right approach.

armijnhemel · 2024-03-15T11:27:21Z

Another option would be to use the --strict option, but that would require a (slight) rewrite of the code, plus it is discouraged:

Write out a strict Uniforum conforming PO file. Note that this Uniforum format should be avoided because it doesn’t support the GNU extensions.

armijnhemel · 2024-03-15T18:29:53Z

Note: if the goal is to provide each string found in a source code file and report it, but you don't need to necessarily report duplicates, then the current code is of course complete fine.

@armijnhemel

Reference: #13 Reference: #14 Reported-by: Armijn Hemel @armijnhemel Signed-off-by: Philippe Ombredanne <[email protected]>

pombredanne added a commit that referenced this issue Mar 15, 2024

Call xgettext with UTF-8 and parse lines

d691562

Reference: #13 Reference: #14 Reported-by: Armijn Hemel @armijnhemel Signed-off-by: Philippe Ombredanne <[email protected]>

pombredanne mentioned this issue Mar 15, 2024

Improve xgettext handlings #16

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xgettext: multiple starting lines for a string are not well supported #13

xgettext: multiple starting lines for a string are not well supported #13

armijnhemel commented Mar 15, 2024 •

edited

Loading

armijnhemel commented Mar 15, 2024 •

edited

Loading

armijnhemel commented Mar 15, 2024

armijnhemel commented Mar 15, 2024

xgettext: multiple starting lines for a string are not well supported #13

xgettext: multiple starting lines for a string are not well supported #13

Comments

armijnhemel commented Mar 15, 2024 • edited Loading

armijnhemel commented Mar 15, 2024 • edited Loading

armijnhemel commented Mar 15, 2024

armijnhemel commented Mar 15, 2024

armijnhemel commented Mar 15, 2024 •

edited

Loading

armijnhemel commented Mar 15, 2024 •

edited

Loading