-
Notifications
You must be signed in to change notification settings - Fork 128
Translations Overview
See The Mission for rationale and technical background. This document describes the basic workflow.
Contains common elements between all sites (headers, footers, etc)
Markup for individual sites. Tokens wrapped in calls: l()
, lp()
and so on to Locale::Simple.
There also appears to be a DDG::Translate module to perform similar (based on Locale::Simple?) though this appears to be unused currently and is not in DuckPAN.
Running duckpan poupload
sends a named (source/English) po (token/domain) files here -> https://duck.co/translate/po for integration with the community platform translation system.
Import / merge uploaded tokens with an existing token domain.
Management of new token domains (sites from publisher repository)
Volunteer contributed translations, admin functions to manage .po files and add notes to tokens.
These contain translations for each of the web properties - modules named in the form DDGC::Locale::$sitename, e.g. DDGC::Locale::DuckduckgoDontbubbleus
Some functionality is included in these modules, e.g. for info on languages / number of translations included:
perl -MDDGC::Locale::DuckduckgoDontbubbleus -e 'use DDP; p DDGC::Locale::DuckduckgoDontbubbleus::locales'
These modules are generated by running ddgc_update_locale_dist.pl
on the live Community Platform. This packages the translations and integrates them with DuckPAN for later installation on the server for a given site.
They can also be previewed anywhere using duckduckgo-publisher.
To update the DDGC::Locale::DuckduckgoWhatisdnt translation set, run from the production DDGC server as user ddgc:
$ script/ddgc_update_locale_dist.pl --domain=duckduckgo-whatisdnt
The token domain 'keys' you need to provide as an argument can be seen on the DDGC Translate index page or in the "Token domain management" admin page.
This is a manual process in which text elements of a site are split up into tokens. The source of the sites is then scraped and po files containing scraped tokens are uploaded to the community platform and added to the database.
Locale::Simple ships with locale_simple_scraper. DuckDuckGo Publisher includes xbin/site_scraper.sh
which wraps this to scrape the tokens for a given site.
Gettext (.po) files are uploaded for a given site are uploaded to the community platform by the DuckPAN tool, e.g. duckpan poupload dontbubbleus.po
. The Translation and Token Domain Admin interfaces are used to create a new Token Domain and add the tokens from the new .po file to it.
The result of the query which generates the Translation front page is cached for up to an hour, so new domains could take some time to actually appear.
Tokens are made available from the translation interface and are translated by volunteers.
Packages are created from the community platform and integrated into DuckPAN in a single step. They are then available to testers/developers on publisher and for publishing on the live sites.
A very rough overview of the translation process (diagram asciio file):
.------------.
| New domain |
| and texts |
'------------'
|
Manual separation
of tokens
|
v
.----------------.
| Publisher Site |
.----------------------| tx files - l() |
| '----------------'
| |
| |
duckduckgo-publisher: | Publisher : xbin/site_scraper.sh
$ dzil release Scraper <----------Locale::Simple : locale_simple_scraper
$ bin/ddg_publisher(?) |
| v
| .-----------.
| | Tokens |
| | (po file) | - $ duckpan poupload newsite.po
| '-----------' - Community platform -> Token domain management ->
| | New token domain -> 'newsite'
| duckpan poupload<------ Translate -> po manager -> compare with token domain ->
| | 'newsite' -> Migrate this po to newsite
| v - Last step merges new translations with domain
| .-----------.
| | Community |
| | Platform |
| '-----------'
| \o/ | o
| | Cool /|\
| / \Volunteers/ \
| |
| v
| .--------------.
| | Translations |
| '--------------'
| | ddgc@ddgc$ ~/live/script/ddgc_update_locale_dist.pl \
| ddgc_update_locale_dist.pl <---- --domain=duckduckgo-whatisdnt
| |
| |
| |
| v
| duckpan .---------.
|---DDGC::Locale::$domain--| DuckPAN |
| '---------'
v
.-----------.
| DDG Sites |
'-----------'
To add a new language, you'll likely be armed with information mailed by a volunteer via the platform. For this example, we'll add "Latvian in Latvia".
Countries and Languages are added separately and are dependent on each other (country will have a primary language configured).
From the admin area, got to Language & Country Editor -> Country Editor -> New Country... and enter the requested details, in this case:
- Country name in English: Latvia
- Country name in local language: Latvija
- Virtual Country: No
- 2 letter Country Code: lv
- Primary Flag: https://upload.wikimedia.org/wikipedia/commons/8/84/Flag_of_Latvia.svg
When this information is saved, you may then add a language. Go to Language editor -> New language...
- Language name in English: Latvian in Latvia
- Right to left: No
- Language and country name in local language: Latviski Latvijā
- Language only in local language: Latviešu valoda
- Country: Latvia
- Locale: lv_LV
- nplurals: 2
- Plural: n != 1
You may now go back to the country editor and update Latvia to set its primary language to Latvian.
The next step is to generate a flag icon, which is done from the shell by:
$ script/ddgc_generate_flag_sprites.pl
Though the existing flag sprites may be cached (they all exist in one file) so it could take a short while for the flag to appear.
Then we need to add this language to the various token domains:
$ script/ddgc_add_language_to_domain.pl lv_LV duckduckgo-duckduckgo
$ script/ddgc_add_language_to_domain.pl lv_LV duckduckgo-donttrackus
$ script/ddgc_add_language_to_domain.pl lv_LV duckduckgo-whatisdnt
$ script/ddgc_add_language_to_domain.pl lv_LV duckduckgo-dontbubbleus
$ script/ddgc_add_language_to_domain.pl lv_LV duckduckgo-fixtracking
$ script/ddgc_add_language_to_domain.pl lv_LV test
We should now be able to collect translations in Latvian!
Invoked by script/ddgc_update_locale_dist.pl
- the important part of this invokes DDGC::DB::Result::Token::Domain::generate_pos
which calls DDGC::DB::Result::Token::Domain::Language::generate_po_for_locale_in_dir_as_with_fallback
for each language configured in the Community Platform. This generates the actual localised .po and .json files to be packaged.
The decision on which specific individual translation to include is in DDGC::DB::Result::Token::Language::auto_use
:
for (keys %first) {
if ($best) {
$best = $first{$_} if $votes{$_} > $votes{$best->key};
$best = $first{$_} if $votes{$_} == $votes{$best->key}
&& $grade{$_} > $grade{$best->key};
} else {
$best = $first{$_};
}
}
if ($best) {
$self->set_translation($best);
return $self->update;
}
The upload
endpoint accepts new gettext files to duck.co/translate/po/upload. This simply stashes files in a cache as long as the login credentials/access level are sound (i.e. the supplied credentials are for a Translation Manager) and the file contains some translation entries. The destination filename is versioned with a time stamp. These files are then available for review/import from duck.co/translate/po. Clicking one of the uploaded files from here will allow to review / import new and modified tokens.
Also requires HTML::Packer
and JavaScript::Value::Escape
Generates static pages for each locale. Wraps DDG::App::Publisher
.
Wraps Locale::Simple scraper to generate tags for core files and a given site. Takes a single parameter, the domain to generate a po file for.
$ xbin/site_scraper.sh dontbubbleus > dontbubbleus.po
$ file dontbubbleus.po
dontbubbleus.po: GNU gettext message catalogue, ASCII text, with very long lines
When duckpan publisher
is invoked, an instance of App::DuckPAN::WebPublisher
is created (via App::DuckPAN::Cmd::Publisher
) to serve each of the sites present in the current Publisher repository.
This is a pre-defined list of sites which would need to be expanded to accommodate new web properties.
At the top of sites like whatisdnt.com, there is a language selection drop-down composed of links to /?kad=$locale
, e.g.:
<li><a href="/?kad=ar_EG">
العربية - مصر
</a></li>
These are generated in the header.tx
file for each site for each locale with > 95% translation coverage. This is ascertained by querying an instance of DDGC::Locale::$site
, in this case DDGC::Locale::DuckduckgoWhatisdnt
.
The Locale module is configure inside the site's config module, like in DDG::Publisher::Site::Whatisdnt
:
sub locale_package { 'DDGC::Locale::DuckduckgoWhatisdnt' }
sub locale_dist { 'DDGC-Locale-DuckduckgoWhatisdnt' }
sub locale_domain { 'duckduckgo-whatisdnt' }
These are added to the instance in DDG::Publisher::SiteRole
.
So, what happens when a link to a new locale is clicked?
Requests are fulfilled by App::DuckPAN::WebPublisher::request, first we set the default locale:
my $locale = defined $ENV{DDG_LOCALE} ? $ENV{DDG_LOCALE} : 'en_US';
Then the port the request came in on is used to decide which site to serve, from a pre-populated hashref:
my $site = $self->ports->{$request->port}->{site};
my $url = $self->ports->{$request->port}->{url};
Locale (current_language
) is set if the kad
parameter is set:
$self->current_language($request->param('kad')) if $request->param('kad');
Normalise the url : request '/index' for '/' and strip any trailing slash:
my $uri = $request->path_info eq '/' ? '/index' : $request->path_info;
$uri =~ s/\/$//;
In DDG::Publisher::SiteRole
, the fullpath_files
call merges fullpath_files
for all dirs with DirRole
.
Dirs with DirRole
are made up of instances of DDG::Publisher::File
. This module takes care of rendering / retrieving file content.
So, for the requested language we build a complete URI to the file and request it from fullpath_files
in SiteRole
, setting the response to its content if it exists.
Failing that, a new request is generated for the requested resource (image, css etc?) and the response is set to that if successful.
my $file = $uri.'/'.$self->current_language.'.html';
if (defined $site->fullpath_files->{$file}) {
print 'Request '.$request->path_info.' uses '.$file.' from DDG::Publisher...'."\n";
$body = $site->fullpath_files->{$file}->uncached_content;
$response->code("200");
$response->content_type('text/html');
} else {
my $res = $self->app->http->request(HTTP::Request->new(GET => $url.$request->request_uri));
if ($res->is_success) {
$body = $res->decoded_content;
$response->code($res->code);
$response->content_type($res->content_type);
} else {
$body = "GET ".$url.$request->request_uri.": ".$res->status_line;
warn $body, "\n";
}
}
This response is then returned.
There are a few components which come together to locate and generate a file for publishing. The basic hierarchy is (publisherfiles.asciio):
.----------------------.
| DDG::Publisher::File |
'----------------------'
^
|
Has Many
|
.--DDG::Publisher Roles--|--------.
| | |
| .----------. .----------. |
| | SiteRole | | DirRole |<------.
| '----------' '----------' | |
| ^ | Is A
'--------|------------------------' |
| |
Is A .-----------------------------------.
| | DDG::Publisher::Site::$SITE::Root |
| '-----------------------------------'
| ^
| |
.-----------------------------. Has A
| DDG::Publisher::Site::$SITE |-------------'
'-----------------------------'