Skip to content
This repository has been archived by the owner on Nov 19, 2021. It is now read-only.

Translations Overview

John Barrett edited this page Apr 8, 2014 · 43 revisions

See The Mission for rationale and technical background. This document describes the basic workflow.

Components

Publisher duckduckgo-publisher/share/core

Contains common elements between all sites (headers, footers, etc)

Publisher duckduckgo-publisher/share/site

Markup for individual sites. Tokens wrapped in calls: l(), lp() and so on to Locale::Simple.

There also appears to be a DDG::Translate module to perform similar (based on Locale::Simple?) though this appears to be unused currently and is not in DuckPAN.

Community platform duck.co/translate/po/upload

Running duckpan poupload sends a named (source/English) po (token/domain) files here -> https://duck.co/translate/po for integration with the community platform translation system.

Community platform duck.co/translate/po

Import / merge uploaded tokens with an existing token domain.

Community platform duck.co/admin/token

Management of new token domains (sites from publisher repository)

Community platform duck.co/translate

Volunteer contributed translations, admin functions to manage .po files and add notes to tokens.

Locale dists

These contain translations for each of the web properties - modules named in the form DDGC::Locale::$sitename, e.g. DDGC::Locale::DuckduckgoDontbubbleus

Some functionality is included in these modules, e.g. for info on languages / number of translations included: perl -MDDGC::Locale::DuckduckgoDontbubbleus -e 'use DDP; p DDGC::Locale::DuckduckgoDontbubbleus::locales'

These modules are generated by running ddgc_update_locale_dist.pl on the live Community Platform. This packages the translations and integrates them with DuckPAN for later installation on the server for a given site.

They can also be previewed anywhere using duckduckgo-publisher.

To update the DDGC::Locale::DuckduckgoWhatisdnt translation set, run from the production DDGC server as user ddgc:

$ script/ddgc_update_locale_dist.pl --domain=duckduckgo-whatisdnt

The token domain 'keys' you need to provide as an argument can be seen on the DDGC Translate index page or in the "Token domain management" admin page.

Phases

Tokenisation

This is a manual process in which text elements of a site are split up into tokens. The source of the sites is then scraped and po files containing scraped tokens are uploaded to the community platform and added to the database.

Scraping

Locale::Simple ships with locale_simple_scraper. DuckDuckGo Publisher includes xbin/site_scraper.sh which wraps this to scrape the tokens for a given site.

Uploading

Gettext (.po) files are uploaded for a given site are uploaded to the community platform by the DuckPAN tool, e.g. duckpan poupload dontbubbleus.po. The Translation and Token Domain Admin interfaces are used to create a new Token Domain and add the tokens from the new .po file to it.

The result of the query which generates the Translation front page is cached for up to an hour, so new domains could take some time to actually appear.

Translation

Tokens are made available from the translation interface and are translated by volunteers.

Deployment

Packages are created from the community platform and integrated into DuckPAN in a single step. They are then available to testers/developers on publisher and for publishing on the live sites.

Data flow

A very rough overview of the translation process (diagram asciio file):

                                  .------------.
                                  | New domain |
                                  | and texts  |
                                  '------------'
                                         |
                                 Manual separation
                                     of tokens    
                                         |
                                         v
                                .----------------.
                                | Publisher Site |
         .----------------------| tx files - l() |
         |                      '----------------'
         |                               |
         |                               |
 duckduckgo-publisher:                   |               Publisher : xbin/site_scraper.sh      
 $ dzil release                       Scraper <----------Locale::Simple : locale_simple_scraper
 $ bin/ddg_publisher(?)                  |
         |                               v
         |                         .-----------.
         |                         |  Tokens   |
         |                         | (po file) |        - $ duckpan poupload newsite.po                          
         |                         '-----------'        - Community platform -> Token domain management ->       
         |                               |                      New token domain -> 'newsite'                    
         |                        duckpan poupload<------ Translate -> po manager -> compare with token domain ->
         |                               |                      'newsite' -> Migrate this po to newsite          
         |                               v              - Last step merges new translations with domain          
         |                         .-----------.
         |                         | Community |
         |                         | Platform  |
         |                         '-----------'
         |                        \o/    |      o 
         |                         |    Cool   /|\
         |                        / \Volunteers/ \
         |                               |
         |                               v
         |                       .--------------.
         |                       | Translations |
         |                       '--------------'
         |                               |                  ddgc@ddgc$ ~/live/script/ddgc_update_locale_dist.pl \
         |                  ddgc_update_locale_dist.pl <----           --domain=duckduckgo-whatisdnt             
         |                               |
         |                               |
         |                               |
         |                               v
         |          duckpan         .---------.
         |---DDGC::Locale::$domain--| DuckPAN |
         |                          '---------'
         v
   .-----------.
   | DDG Sites |
   '-----------'

New Languages

To add a new language, you'll likely be armed with information mailed by a volunteer via the platform. For this example, we'll add "Latvian in Latvia".

Countries and Languages are added separately and are dependent on each other (country will have a primary language configured).

From the admin area, got to Language & Country Editor -> Country Editor -> New Country... and enter the requested details, in this case:

When this information is saved, you may then add a language. Go to Language editor -> New language...

  • Language name in English: Latvian in Latvia
  • Right to left: No
  • Language and country name in local language: Latviski Latvijā
  • Language only in local language: Latviešu valoda
  • Country: Latvia
  • Locale: lv_LV
  • nplurals: 2
  • Plural: n != 1

You may now go back to the country editor and update Latvia to set its primary language to Latvian.

The next step is to generate a flag icon, which is done from the shell by:

$ script/ddgc_generate_flag_sprites.pl

Though the existing flag sprites may be cached (they all exist in one file) so it could take a short while for the flag to appear.

Then we need to add this language to the various token domains:

$ script/ddgc_add_language_to_domain.pl lv_LV duckduckgo-duckduckgo
$ script/ddgc_add_language_to_domain.pl lv_LV duckduckgo-donttrackus
$ script/ddgc_add_language_to_domain.pl lv_LV duckduckgo-whatisdnt
$ script/ddgc_add_language_to_domain.pl lv_LV duckduckgo-dontbubbleus
$ script/ddgc_add_language_to_domain.pl lv_LV duckduckgo-fixtracking
$ script/ddgc_add_language_to_domain.pl lv_LV test

We should now be able to collect translations in Latvian!

Code

Community platform

Invoked by script/ddgc_update_locale_dist.pl - the important part of this invokes DDGC::DB::Result::Token::Domain::generate_pos which calls DDGC::DB::Result::Token::Domain::Language::generate_po_for_locale_in_dir_as_with_fallback for each language configured in the Community Platform. This generates the actual localised .po and .json files to be packaged.

The decision on which specific individual translation to include is in DDGC::DB::Result::Token::Language::auto_use:

	for (keys %first) {
		if ($best) {
			$best = $first{$_} if $votes{$_} > $votes{$best->key};
			$best = $first{$_} if $votes{$_} == $votes{$best->key}
								&& $grade{$_} > $grade{$best->key};
		} else {
			$best = $first{$_};
		}
	}
	if ($best) {
		$self->set_translation($best);
		return $self->update;
	}

The upload endpoint accepts new gettext files to duck.co/translate/po/upload. This simply stashes files in a cache as long as the login credentials/access level are sound (i.e. the supplied credentials are for a Translation Manager) and the file contains some translation entries. The destination filename is versioned with a time stamp. These files are then available for review/import from duck.co/translate/po. Clicking one of the uploaded files from here will allow to review / import new and modified tokens.

Also requires HTML::Packer and JavaScript::Value::Escape

bin/ddg_publisher

Generates static pages for each locale. Wraps DDG::App::Publisher.

xbin/site_scraper.sh

Wraps Locale::Simple scraper to generate tags for core files and a given site. Takes a single parameter, the domain to generate a po file for.

$ xbin/site_scraper.sh dontbubbleus > dontbubbleus.po
$ file dontbubbleus.po                               
dontbubbleus.po: GNU gettext message catalogue, ASCII text, with very long lines

When duckpan publisher is invoked, an instance of App::DuckPAN::WebPublisher is created (via App::DuckPAN::Cmd::Publisher) to serve each of the sites present in the current Publisher repository.

This is a pre-defined list of sites which would need to be expanded to accommodate new web properties.

At the top of sites like whatisdnt.com, there is a language selection drop-down composed of links to /?kad=$locale, e.g.:

	<li><a href="/?kad=ar_EG">
		العربية - مصر
	</a></li>

These are generated in the header.tx file for each site for each locale with > 95% translation coverage. This is ascertained by querying an instance of DDGC::Locale::$site, in this case DDGC::Locale::DuckduckgoWhatisdnt.

The Locale module is configure inside the site's config module, like in DDG::Publisher::Site::Whatisdnt:

sub locale_package { 'DDGC::Locale::DuckduckgoWhatisdnt' }
sub locale_dist { 'DDGC-Locale-DuckduckgoWhatisdnt' }
sub locale_domain { 'duckduckgo-whatisdnt' }

These are added to the instance in DDG::Publisher::SiteRole.

So, what happens when a link to a new locale is clicked?

Requests are fulfilled by App::DuckPAN::WebPublisher::request, first we set the default locale:

	my $locale = defined $ENV{DDG_LOCALE} ? $ENV{DDG_LOCALE} : 'en_US';

Then the port the request came in on is used to decide which site to serve, from a pre-populated hashref:

	my $site = $self->ports->{$request->port}->{site};
	my $url = $self->ports->{$request->port}->{url};

Locale (current_language) is set if the kad parameter is set:

	$self->current_language($request->param('kad')) if $request->param('kad');

Normalise the url : request '/index' for '/' and strip any trailing slash:

	my $uri = $request->path_info eq '/' ? '/index' : $request->path_info;
	$uri =~ s/\/$//;

In DDG::Publisher::SiteRole, the fullpath_files call merges fullpath_files for all dirs with DirRole.

Dirs with DirRole are made up of instances of DDG::Publisher::File. This module takes care of rendering / retrieving file content.

So, for the requested language we build a complete URI to the file and request it from fullpath_files in SiteRole, setting the response to its content if it exists.

Failing that, a new request is generated for the requested resource (image, css etc?) and the response is set to that if successful.

	my $file = $uri.'/'.$self->current_language.'.html';

	if (defined $site->fullpath_files->{$file}) {
		print 'Request '.$request->path_info.' uses '.$file.' from DDG::Publisher...'."\n";
		$body = $site->fullpath_files->{$file}->uncached_content;
		$response->code("200");
		$response->content_type('text/html');
	} else {
		my $res = $self->app->http->request(HTTP::Request->new(GET => $url.$request->request_uri));
		if ($res->is_success) {
			$body = $res->decoded_content;
			$response->code($res->code);
			$response->content_type($res->content_type);
		} else {
			$body = "GET ".$url.$request->request_uri.": ".$res->status_line;
			warn $body, "\n";
		}
	}

This response is then returned.

Files

There are a few components which come together to locate and generate a file for publishing. The basic hierarchy is (publisherfiles.asciio):

                        .----------------------.
                        | DDG::Publisher::File |
                        '----------------------'
                                    ^
                                    |
                                Has Many
                                    |
           .--DDG::Publisher Roles--|--------.
           |                        |        |
           |  .----------.    .----------.   |
           |  | SiteRole |    | DirRole  |<------.
           |  '----------'    '----------'   |   |
           |        ^                        |  Is A
           '--------|------------------------'   |
                    |                            |
                   Is A        .-----------------------------------.
                    |          | DDG::Publisher::Site::$SITE::Root |
                    |          '-----------------------------------'
                    |                            ^
                    |                            |
     .-----------------------------.           Has A
     | DDG::Publisher::Site::$SITE |-------------'
     '-----------------------------'