Style paraphrasers work best in a two-stage pipeline, can re-use HuggingFace `generate(...)` APIs #298

martiansideofthemoon · 2021-09-21T10:51:05Z

Hi everyone, I'm the original author of the STRAP paraphrasers (paper link) which were recently accepted to NL-Augmenter (#227), an effort led by @Filco306. Excited to see these models in NL-Augmenter!

After discussing with @Filco306 and seeing the PR, I saw that 6 different variants of the paraphraser have been provided, a "Basic" style agnostic paraphraser as well as five style-specific paraphrasers (link). While the "Basic" paraphraser is implemented fine, for the style-specific paraphrasers it's recommended to use a two-step pipelined process ---

(1) normalize the text using the "Basic" paraphraser;
(2) pass the output from (1) through the style-specific paraphraser.

This is important since all style-specific paraphrasers were trained on the outputs of "Basic", so any other text is technically out-of-distribution. In an ablation study (-Inf PP. in Table 3 of the paper) we saw a significant drop in style transfer performance without this step. Moreover, the two-step process helps boost output diversity since the "Basic" paraphraser strips input style. This should be fairly simple to implement.

Another minor point is that the models are fully compatible with the new HuggingFace generate(...) APIs, which provide additional functionality compared to what was originally implemented in my repository (in other words, this import can be avoided). Here's an example of how to do it,

out = gpt2.generate(
    input_ids=gpt2_sentences[:, 0:init_context_size],
    max_length=gpt2_sentences.shape[1],
    return_dict_in_generate=True,
    eos_token_id=eos_token_id,
    output_scores=True,
    do_sample=top_k > 0 or top_p > 0.0,
    top_k=top_k,
    top_p=top_p,
    temperature=temperature,
    num_beams=beam_size,
    token_type_ids=segments[:, 0:init_context_size]
)

Also CCing the NL-Augmenter reviewers for the style paraphraser to keep them in the loop --- @sebastianGehrmann @Nickeilf @juand-r @kaustubhdhole

The text was updated successfully, but these errors were encountered:

Filco306 · 2021-09-21T11:11:17Z

I agree fully here, and will do a separate PR with this. I see two options:

Have a boolean as an argument to the StyleParaphraser such as use_twostep_pipeline or
Have a separate class/ wrapper for the two step pipeline version.

WDYT?

Filco306 · 2021-09-21T18:07:29Z

@martiansideofthemoon have a look at this branch; it is not done yet, but it is on the way. Let me know what you think!

martiansideofthemoon mentioned this issue Sep 21, 2021

Issues report on NL-Augmenter repository martiansideofthemoon/style-transfer-paraphrase#27

Open

AbinayaM02 added the enhancement New feature or request label Sep 22, 2021

Filco306 mentioned this issue Oct 28, 2021

Change batch size and number of visible devices for text-style-transfer #353

Open

martiansideofthemoon mentioned this issue Jan 17, 2022

HuggingFace Outputs martiansideofthemoon/style-transfer-paraphrase#37

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Style paraphrasers work best in a two-stage pipeline, can re-use HuggingFace `generate(...)` APIs #298

Style paraphrasers work best in a two-stage pipeline, can re-use HuggingFace `generate(...)` APIs #298

martiansideofthemoon commented Sep 21, 2021 •

edited

Loading

Filco306 commented Sep 21, 2021

Filco306 commented Sep 21, 2021

Style paraphrasers work best in a two-stage pipeline, can re-use HuggingFace generate(...) APIs #298

Style paraphrasers work best in a two-stage pipeline, can re-use HuggingFace generate(...) APIs #298

Comments

martiansideofthemoon commented Sep 21, 2021 • edited Loading

Filco306 commented Sep 21, 2021

Filco306 commented Sep 21, 2021

Style paraphrasers work best in a two-stage pipeline, can re-use HuggingFace `generate(...)` APIs #298

Style paraphrasers work best in a two-stage pipeline, can re-use HuggingFace `generate(...)` APIs #298

martiansideofthemoon commented Sep 21, 2021 •

edited

Loading