Skip to content

TranslationFlow

Ulrich Germann edited this page Nov 5, 2020 · 21 revisions

Flow of a translation request through the translation service.

This document discusses the of a translation request through the translation service. It is currently a live discussion document and things will change as the discussion progresses. Note that my description does not accurately reflect what's currently going on in mts (currently workers process MaxiBatches directoy). To facilitate discussion, please respond inline (with MarkDown >, >>, etc) and 'sign' your contributions with your initials. - UG

Contributors to this document (please add yourself with your name and initials if you comment inline):

  • Ulrich Germann (UG)

Translation requests

Parameters for submitting a request

I don't think we need a struct for this (the following can just be function call parameters), except maybe for the translation options. - UG

  • input (string): The string to be translated
  • sourceLanguage (string): The source language (not to be used initially, but may be useful later when a single TranslationService API offers multiple translation directions.)
  • targetLanguage (string): The source language (not to be used initially, but may be useful later when a single TranslationService API offers multiple translation directions.)
  • search paramters:
    • withAlignment: include alignment info in the response?

      I suggest to make this optional, because providing it costs extra time and that is wasted if the client has no need for it. - UG

    • withQualityEstimate: include quality estimates Details to be determined; not relevant at this point. - UG
    • Optionally a callback function that is to be executed at the end of processing the request (e.g., because the client doesn't want to keep track of the future; this is reasonable for example in a message passing scenario where one thread reads from an input channel and instead of keeping track of things, the response is to be sent to an output channel; the default callback is to fulfil the promise corresponding to the future.
    • ...

Upon receiving a request, the server returns a TranslationRequest object (or a unique_ptr to one), which contains

  • the original parameters from above
  • some stats that reflect the current state of processing
  • a future to the result

With respect to optional cancelling, my suggestion for implementation would be that the server returns a shared_ptr to the TranslationRequest, and we keep a weak_ptr within the service for processing. If the request goes away, the weak ptr will be invalid, so the service knows not to bother with it. - UG

Request processing

After receiving a request the service performs the following preprocessing steps:

  • sentence splitting
  • tokenization

and pushes each tokenized sentence onto the MaxiBatchQueue. It gets back a future to the result for this sentence translation (or rather, a struct that contains this future, this allows us to monitor jobs while they are in progress). The MaxiBatchQueue lines up pending individual sentence translation jobs (Job) for processing.

As an interim summary: Client posts paragraph-level request, gets back struct that contains future to paragraph-level result. At the sentence level, we use the same mechanism, but that's internal to the service and not exposed to the outside.

The BatchGenerator monitors both the MaxiBatchQueue and the MiniBatchQueue. It reads at most MaxiBatchSize tokenized input (we can be flexible whether that's measured in tokens or sentences) from the MaxiBatchQueue, sorts the respective sentences, creates batches of sentences of ideally similar sentence length, and pushes those onto the MiniBatchQueue. It processes less than MaxiBatchSize input if both the MaxiBatchQueue and the MiniBatchQueue are empty. Then it just processes what's there to keep the MiniBatchQueue filled.

The translation service maintains a number of workers (one per 'device', which can be a CPU core or a GPU), each of which monitors the MiniBatchQueue and processes one batch after another in a run-until-Simon-says-stop loop. After batch processing, a callback function is called for each individual sentence that fulfils the promise. Once all promises within a multi-sentence request have been fulfiled, the promise for the request is fulfilled.

Self-cancelling requests

I suggest to use shared and weak pointers for self-cancelling requests.

  • Service returns shared pointer to TranslationRequest object, stores only weak ptr.

  • When processing the TranslationRequest, Service stores shared pointers to the sentence-level internal jobs on the TranslationRequest, keeps only week pointers otherwise. Weak pointers to sentence-level jobs go onto the MaxiBatchQueue. When batching jobs for translation, the batch generator generates shared_ptr<Batch> and stores this pointer in the sentence-level job object/struct. A weak_ptr goes onto the MiniBatchQueue. If all the jobs in a batch go away (because all the respective original translation requests vanished / were cancelled), the weak_ptr in the MiniBatchQueue becomes invalid, so the workers know not to bother with them.

I realize that this involves quite a few memory allocations. I'm open to alternative suggestions.

Clone this wiki locally