-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convenient-sized strings for translation with minimal markup #29
Comments
Trying to remove/minimise formatting would be great, but how would this work in practice? For example, I wouldn't really know how to translate "Extract translatable strings from a parsed <3> tree or translate it by giving the extracted strings to <6>." From context <3> and <6> are probably nouns, but I don't have any information on their number or grammatical gender. What happens if the formatted strings need to be translated? Is it possible that <3> is a noun or phrase that needs translation? BTW, for some reason if I run your example I don't get the same output: tools::Rd_db('Rd2gettext')$'Rd_extract_strings.Rd' |>
Rd2gettext::Rd_extract_strings() |>
head(3)
#> [[1]]
#> [1] "Functions for Rd translation"
#> attr(,"pos")
#> [1] 1
#> attr(,"subpos")
#> [1] 1
#> attr(,"extra")
#> named character(0)
#>
#> [[2]]
#> [1] "Extract translatable strings from a parsed \\code{Rd} tree or translate it by giving the extracted strings to \\code{\\link[base]{gettext}}."
#> attr(,"pos")
#> [1] 6
#> attr(,"subpos")
#> [1] 1
#> attr(,"extra")
#> named character(0)
#>
#> [[3]]
#> [1] "A pre-parsed Rd tree loaded from \\code{tools::\\link[tools]{Rd_db}} or otherwise produced by \\code{tools::\\link[tools]{parse_Rd}}."
#> attr(,"pos")
#> [1] 8 3 2
#> attr(,"subpos")
#> [1] 1
#> attr(,"extra")
#> named character(0) Although I do get the placeholders if I pass the tools::Rd_db('base')$'mean' |>
Rd2gettext::Rd_extract_strings() |>
_[[3]]
#> [1] "an <2> object. Currently there are methods for numeric/logical vectors and \\link[=Dates]{date}, \\link{date-time} and \\link{time interval} objects. Complex vectors are allowed for \\code{trim = 0}, only."
#> attr(,"pos")
#> [1] 8 3 2
#> attr(,"subpos")
#> [1] 1
#> attr(,"extra")
#> named character(0) |
Thank you for giving this a try! You're absolutely right, I've translated slightly more than a half of the
The core of the suggested approach is to walk the parse tree recursively. If <3> is a block with its own potentially translatable plain text inside, the algorithm visits it and extracts its contents into separate translatable strings. With no splitting, some of the strings extracted from |
The
gettext
project has a number of recommendations for what the translatable strings should look like to make it most convenient to translate them. In particular, it is recommended to split text at paragraphs and minimise "unusual markup".While I don't think we are going to entirely avoid Rd markup inside translatable strings, I think it's possible to achieve. I would like to suggest the following approach:
attr(., 'Rd_tag') == 'TEXT'
)attr(., 'Rd_tag') == '\\emph')
and whose contents are allTEXT
.pot
file for translationWhen rendering a help file for translation, perform the same process but backwards:
The resulting translatable strings look very manageable:
If you're interested in an approach like this, I can try to integrate it into
rhelpi18n
. My original use case ishelp(whittaker2, albatross)
, which is very unwieldy to translate as a flattened representation.The text was updated successfully, but these errors were encountered: