ASR > LibriSpeech > ESPnet

We include the 100-best LibriSpeech decoding outputs from "Effective Sentence Scoring Method using Bidirectional Language Model for Speech Recognition" (Shin et al., 2019) with the authors' permission; please cite their work if reusing their lists. The outputs come from a 5-layer encoder, 1-layer decoder BLSTMP model implemented in ESPnet. The files are data/*.am.json.

The split sizes are per 16GB Tesla V100 GPUs; in our case (p3.8xlarge) there are four. Scale appropriately for your per-GPU memory.

TODO: Model artifacts.

Scoring

# Stock BERT/RoBERTa base
for set in dev-clean dev-other test-clean test-other ; do
    for model in bert-base-en-uncased bert-base-en-cased roberta-base-en-cased ; do
        mkdir -p output/${model}/
        echo ${set} ${model}
        mlm score \
            --mode hyp \
            --model ${model} \
            --gpus 0,1,2,3 \
            --split-size 2000 \
            --eos \
            data/${set}.am.json \
            > output/${model}/${set}.lm.json
    done
done
# Trained BERT base
for set in dev-clean dev-other test-clean test-other ; do
    for model in bert-base-en-uncased ; do
        mkdir -p output/${model}-380k/
        echo ${set} ${model}-380k
        mlm score \
            --mode hyp \
            --model ${model} \
            --gpus 0,1,2,3 \
            --split-size 2000 \
            --weights params/bert-base-en-uncased-380k.params \
            --eos \
            data/${set}.am.json \
            > output/${model}-380k/${set}.lm.json
    done
done

Reranking

# Stock BERT/RoBERTa on development set
for set in dev-clean ; do
    for model in bert-base-en-uncased bert-base-en-cased roberta-base-en-cased ; do
        for weight in $(seq 0 0.05 1.0) ; do
            echo ${set} ${model} ${weight}; \
            mlm rescore \
                --model ${model} \
                --weight ${weight} \
                data/${set}.am.json \
                output/${model}/${set}.lm.json \
                > output/${model}/${set}.lambda-${weight}.json
        done
    done
done
# Once you have the best hyperparameter, evaluate test
for set in test-clean ; do
    for tup in bert-base-en-uncased,,0.35 bert-base-en-cased,,0.35 ; do
        IFS="," read model suffix weight <<< "${tup}"
        echo ${set} ${model}${suffix} ${weight}
        mlm rescore \
            --model ${model} \
            --weight ${weight} \
            data/${set}.am.json \
            output/${model}${suffix}/${set}.lm.json \
            > output/${model}${suffix}/${set}.lambda-${weight}.json
        done
    done
done

Maskless finetuning

Note: Paper results are from a domain-adapted BERT.

We first download the normalized text corpus:

scripts/librispeech-download-text.sh data-distill/

We then score the corpus with a masked LM. We print 8 commands, one per GPU:

model=bert-base-en-uncased
split_size=12
for tup in 0,00,09 1,10,19 2,20,29 3,30,39 4,40,49 5,50,59 6,60,69 7,70,79 ; do
    IFS="," read gpu start end <<< ${tup}
    echo "scripts/librispeech-score.sh data-distill/ output-distill/${model} ${start} ${end} ${gpu} ${split_size} ${model}"
done

Modify GPU splits as desired, and e.g., run on different screens.

For now, one must concatenate the used parts and scores into a single file:

model=bert-base-en-uncased
cat data-distill/part.* > output-distill/part.all
cat output-distill/${model}/part.*.ref.scores > output-distill/${model}/part.all.ref.scores

We then finetune BERT towards these sentence scores:

# `--corpus-dir output-distill` corresponds to reading from `output-distill/part.all`
model=bert-base-en-uncased
mkdir -p output-distill/${model}/params-1e-5_8gpu_384/ 
mlm finetune \
    --model ${model} \
    --gpus 0,1,2,3,4,5,6,7 \
    --eos \
    --corpus-dir output-distill \
    --score-dir output-distill/${model} \
    --output-dir output-distill/${model}/params-1e-5_8gpu_384/ \
    --split-size 30

Parameters will be saved to output-distill/${model}/params-1e-5_8gpu_384/.

We then score using these weights. Note the flags --weights and --no-mask. This runs much faster than masked scoring:

model=bert-base-en-uncased
for set in dev-clean ; do
    echo ${set} ${model}
    mlm score \
        --mode hyp \
        --model ${model} \
        --gpus 0 \
        --weights output-distill/${model}/params-1e-5_8gpu_384/epoch-10.params \
        --eos \
        --no-mask \
        --split-size 500 \
        data/${set}.am.json \
        > output-distill/${model}/params-1e-5_8gpu_384/${set}.lm.json
done

Finally, rerank:

for set in dev-clean ; do
    for weight in $(seq 0 0.05 1.0) ; do
        echo ${set} ${model} ${weight} 
        mlm rescore \
            --model ${model} \
            --weight ${weight} \
            data/${set}.am.json \
            output-distill/${model}/params-1e-5_8gpu_384/${set}.lm.json \
            > output-distill/${model}/params-1e-5_8gpu_384/${set}.lambda-${weight}.json
    done
done

Binning

TODO To compute cross-entropy statistics:

mlm bin

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

ASR > LibriSpeech > ESPnet

Scoring

Reranking

Maskless finetuning

Binning

Files

README.md

Latest commit

History

README.md

File metadata and controls

ASR > LibriSpeech > ESPnet

Scoring

Reranking

Maskless finetuning

Binning