-
Notifications
You must be signed in to change notification settings - Fork 39
TranslationFlow
This document discusses the flow of a translation request through the translation service. It is currently a live discussion document and things will change as the discussion progresses. Note that my description does not accurately reflect what's currently going on in mts (currently workers process MaxiBatches directly). To facilitate discussion, please respond inline (with MarkDown >, >>, etc) and 'sign' your contributions with your initials. - UG
The original document was posted by UG, so any text that's not quoted and signed should be assumed to come from UG.
Contributors to this document (please add yourself with your name and initials if you comment inline):
- Ulrich Germann (UG)
- Jerin Philip (JP)
- Kennneth Heafield (KH)
KH Sigh, are we redesigning the API again?
UG I have no strong feelings and am fairly flexible about this. That said, I think there are benefits to returning a struct right away with a future to the result as a member rather than a future to the overall structure, as it will allow the client to monitor progress if they want to. If the client gets a future to the completed request, it can't monitor progress. Excact use case for progress monitoring TBD. Could just be a translation progress bar somewhere. The main point of this write-up is to discuss internal processing, not API. This sections just lays out what input I expect, regardless of how exactly I get it. I really don't care that much about the minutiae of the API design.
I don't think we need a struct for this (the following can just be function call parameters), except maybe for the translation options. - UG
- input (string): The string to be translated
(JP: Is this bounded in length?)
UG: At some point we'll also need to add errors and warnings to the response. It should be bounded in length, but ultimately it's the number of tokens and not the number of bytes that matters, so we'll need two checks: one for string length and one for number of tokens. If the sentence is too long, we have two options: chop it up and add a warning that it was chopped up, or refuse it with an appropriate error message.
KH This came up on the call. Chop. I don't think we need a warning yet. The cap should ideally be denominated in tokens.
- sourceLanguage (string): The source language (not to be used initially, but may be useful later when a single TranslationService API offers multiple translation directions.)
- targetLanguage (string): The source language (not to be used initially, but may be useful later when a single TranslationService API offers multiple translation directions.)
- search paramters:
-
withAlignment: include alignment info in the response?
I suggest to make this optional, because providing it costs extra time and that is wasted if the client has no need for it. - UG
- withQualityEstimate: include quality estimates Details to be determined; not relevant at this point. - UG
- Optionally a callback function that is to be executed at the end of processing the request (e.g., because the client doesn't want to keep track of the future; this is reasonable for example in a message passing scenario where one thread reads from an input channel and instead of keeping track of things, the response is to be sent to an output channel; the default callback is to fulfil the promise corresponding to the future.
- ...
-
withAlignment: include alignment info in the response?
Upon receiving a request, the server returns a TranslationRequest object (or a unique_ptr to one), which contains
- the original parameters from above
- some stats that reflect the current state of processing
- a future to the result
With respect to optional cancelling, my suggestion for implementation would be that the server returns a shared_ptr to the TranslationRequest, and we keep a weak_ptr within the service for processing. If the request goes away, the weak ptr will be invalid, so the service knows not to bother with it. - UG
After receiving a request the service performs the following preprocessing steps:
- sentence splitting
- tokenization
and pushes each tokenized sentence onto the MaxiBatchQueue. It gets back a future to the result for this sentence translation (or rather, a struct that contains this future, this allows us to monitor jobs while they are in progress). The MaxiBatchQueue lines up pending individual sentence translation jobs (Job) for processing.
As an interim summary: Client posts paragraph-level request, gets back struct that contains future to paragraph-level result. At the sentence level, we use the same mechanism, but that's internal to the service and not exposed to the outside.
The BatchGenerator monitors both the MaxiBatchQueue and the MiniBatchQueue. It reads at most MaxiBatchSize tokenized input (we can be flexible whether that's measured in tokens or sentences) from the MaxiBatchQueue, sorts the respective sentences, creates batches of sentences of ideally similar sentence length, and pushes those onto the MiniBatchQueue. It processes less than MaxiBatchSize input if both the MaxiBatchQueue and the MiniBatchQueue are empty. Then it just processes what's there to keep the MiniBatchQueue filled.
The translation service maintains a number of workers (one per 'device', which can be a CPU core or a GPU), each of which monitors the MiniBatchQueue and processes one batch after another in a run-until-Simon-says-stop loop. After batch processing, a callback function is called for each individual sentence that fulfils the promise. Once all promises within a multi-sentence request have been fulfiled, the promise for the request is fulfilled.
I suggest to use shared and weak pointers for self-cancelling requests.
-
Service returns shared pointer to TranslationRequest object, stores only weak ptr.
-
When processing the TranslationRequest, Service stores shared pointers to the sentence-level internal jobs on the TranslationRequest, keeps only week pointers otherwise. Weak pointers to sentence-level jobs go onto the MaxiBatchQueue. When batching jobs for translation, the batch generator generates
shared_ptr<Batch>
and stores this pointer in the sentence-level job object/struct. A weak_ptr goes onto the MiniBatchQueue. If all the jobs in a batch go away (because all the respective original translation requests vanished / were cancelled), the weak_ptr in the MiniBatchQueue becomes invalid, so the workers know not to bother with them.
I realize that this involves quite a few memory allocations. I'm open to alternative suggestions.
JP: What is this now? Same payload as REST with JSON response? How do ranges come in here?
UG: I've been trying to get a response to the latter question for a while, especially with respect to the fact that JSON is usually encoded in UTF-8 and JavaScript uses UTF-16 internally, so byte ranges may not be as useful as code point ranges.
By the way, if you want to take over alignment handling (map token alignments back to StringPiece for the time being; we can scratch our heads as to how to convert that to jSON later), that would be great, as nothing has been done in that respect yet beyond reporting token alignments in the JSON handler. As for the first two questions, in the REST server is JSON blob in (via POST), JSON blob out. I'm using RapidJSON for JSON handling; the code is in the src/service/api directory and its subdirectories in mts, specifically
- job2json and hyp2json here: https://github.com/browsermt/mts/blob/master/src/service/api/rapidjson_utils.cpp
- https://github.com/browsermt/mts/blob/master/src/service/api/node_translation.cpp
RapidJSON is a bit unwieldy because it is designed to be fast and avoids memory allocation wherever possible, so Kenneth will love it.