This repository has been archived by the owner on Jan 5, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 51
Release Notes
Ozan Çağlayan edited this page Sep 26, 2018
·
1 revision
- Ability to install through
pip
. - Advanced layers are now organized into subfolders.
- New basic layers: Convolution over sequence, MaxMargin.
- New attention layers: Co-attention, multi-head attention, hierarchical attention.
- New encoders: Arbitrary sequence-of-vectors encoder, BiLSTMp speech feature encoder.
- New decoders: Multi-source decoder, switching decoder, vector decoder.
- New datasets: Kaldi dataset (.ark/.scp reader), Shelve dataset, Numpy sequence dataset.
- Added learning rate annealing: See
lr_decay*
options inconfig.py
. - Removed subword-nmt and METEOR files from repository. We now depend on
the PIP package for subword-nmt. For METEOR,
nmtpy-install-extra
should be launched after installation. - More multi-task and multi-input/output
translate
andtraining
regimes. - New early-stopping metrics: Character and word error rate (cer,wer) and ROUGE (rouge).
- Curriculum learning option for the
BucketBatchSampler
, i.e. length-ordered batches. - New models:
- ASR: Listen-attend-and-spell like automatic speech recognition
- Multitask*: Experimental multi-tasking & scheduling between many inputs/outputs.
- Add
environment.yml
for easy installation usingconda
. You can now create a ready-to-useconda
environment by just callingconda env create -f environment.yml
. - Make
NumpyDataset
memory efficient by keepingfloat16
arrays as they are until batch creation time. - Rename
Multi30kRawDataset
toMulti30kDataset
which now supports both raw image files and pre-extracted visual features file stored as.npy
. - Add CNN feature extraction script under
scripts/
. - Add doubly stochastic attention to
ShowAttendAndTell
and multimodal NMT. - New model
MNMTDecinit
to initialize decoder with auxiliary features. - New model
AMNMTFeatures
which is the attentive MMT but with features file instead of end-to-end feature extraction which was memory hungry.
- Updates to
ShowAttendAndTell
model.
- Removed old
Multi30kDataset
. - Sort batches by source sequence length instead of target.
- Fix
ShowAttendAndTell
model. It should now work.
- Added
Multi30kRawDataset
for training end-to-end systems from raw images as input. - Added
NumpyDataset
to read.npy/.npz
tensor files as input features. - You can now pass
-S
tonmtpy train
to produce shorter experiment files with not all the hyperparameters in file name. - New post-processing filter option
de-spm
for Google SentencePiece (SPM) processed files. -
sacrebleu
is now a dependency as it is now accepted as an early-stopping metric. It only makes sense to use it with SPM processed files since they are detokenized once post-processed. - Added
sklearn
as a dependency for some metrics. - Added
momentum
andnesterov
parameters to[train]
section for SGD. -
ImageEncoder
layer is improved in many ways. Please see the code for further details. - Added unmerged upstream PR for
ModuleDict()
support. -
METEOR
will now fallback to English if language can not be detected from file suffixes. -
-f
now produces a separate numpy file for token frequencies when building vocabulary files withnmtpy-build-vocab
. - Added new command
nmtpy test
for non beam-search inference modes. - Removed
nmtpy resume
command and addedpretrained_file
option for[train]
to initialize model weights from a checkpoint. - Added
freeze_layers
option for[train]
to give comma-separated list of layer name prefixes to freeze. - Improved seeding: seed is now printed in order to reproduce the results.
- Added IPython notebook for attention visualization.
-
Layers
- New shallow
SimpleGRUDecoder
layer. -
TextEncoder
: Ability to setmaxnorm
andgradscale
of embeddings and work with or without sorted-length batches. -
ConditionalDecoder
: Make it work with GRU/LSTM, allow settingmaxnorm/gradscale
for embeddings. -
ConditionalMMDecoder
: Same as above.
- New shallow
-
nmtpy translate
-
--avoid-double
and--avoid-unk
removed for now. - Added Google's length penalty normalization switch
--lp-alpha
. - Added ensembling which is enabled automatically if you give more than 1 model checkpoints.
-
- New machine learning metric wrappers in
utils/ml_metrics.py
:- Label-ranking average precision
lrap
- Coverage error
- Mean reciprocal rank
- Label-ranking average precision
- You can now use
$HOME
and$USER
in your configuration files. - Fixed an overflow error that would cause NMT with more than 255 tokens to fail.
- METEOR worker process is now correctly killed after validations.
- Many runs of an experiment are now suffixed with a unique random string instead of incremental integers to avoid race conditions in cluster setups.
- Replaced
utils.nn.get_network_topology()
with a newTopology
class that will parse thedirection
string of the model in a more smart way. - If
CUDA_VISIBLE_DEVICES
is set, theGPUManager
will always honor it. - Dropped creation of temporary/advisory lock files under
/tmp
for GPU reservation. - Time measurements during training are now structered into batch overhead, training and evaluation timings.
-
Datasets
- Added
TextDataset
for standalone text file reading. - Added
OneHotDataset
, a variant ofTextDataset
where the sequences are not prefixed/suffixed with<bos>
and<eos>
respectively. - Added experimental
MultiParallelDataset
that merges an arbitrary number of parallel datasets together.
- Added
-
nmtpy translate
-
.nodbl
and.nounk
suffixes are now added to output files for--avoid-double
and--avoid-unk
arguments respectively. - A model-agnostic enough
beam_search()
is now separated out into its own filenmtpytorch/search.py
. -
max_len
default is increased to 200.
-
- New experimental
Multi30kDataset
andImageFolderDataset
classes -
torchvision
dependency added for CNN support -
nmtpy-coco-metrics
now computes one METEOR withoutnorm=True
- Mainloop mechanism is completely refactored with backward-incompatible
configuration option changes for
[train]
section:-
patience_delta
option is removed - Added
eval_batch_size
to define batch size for GPU beam-search during training -
eval_freq
default is now3000
which means per3000
minibatches -
eval_metrics
now defaults toloss
. As before, you can provide a list of metrics likebleu,meteor,loss
to compute all of them and early-stop based on the first - Added
eval_zero (default: False)
which tells to evaluate the model once on dev set right before the training starts. Useful for sanity checking if you fine-tune a model initialized with pre-trained weights - Removed
save_best_n
: we no longer save the bestN
models on dev set w.r.t. early-stopping metric - Added
save_best_metrics (default: True)
which will save best models on dev set w.r.t each metric provided ineval_metrics
. This kind of remedies the removal ofsave_best_n
-
checkpoint_freq
now to defaults to5000
which means per5000
minibatches. - Added
n_checkpoints (default: 5)
to define the number of last checkpoints that will be kept ifcheckpoint_freq > 0
i.e. checkpointing enabled
-
- Added
ExtendedInterpolation
support to configuration files:- You can now define intermediate variables in
.conf
files to avoid typing same paths again and again. A variable can be referenced from within its section usingtensorboard_dir: ${save_path}/tb
notation Cross-section references are also possible:${data:root}
will be replaced by the value of theroot
variable defined in the[data]
section.
- You can now define intermediate variables in
- Added
-p/--pretrained
tonmtpy train
to initialize the weights of the model using another checkpoint.ckpt
. - Improved input/output handling for
nmtpy translate
:-
-s
accepts a comma-separated test sets defined in the configuration file of the experiment to translate them at once. Example:-s val,newstest2016,newstest2017
- The mutually exclusive counterpart of
-s
is-S
which receives a single input file of source sentences. - For both cases, an output prefix should now be provided with
-o
. In the case of multiple test sets, the output prefix will be appended the name of the test set and the beam size. If you just provide a single file with-S
the final output name will only reflect the beam size information.
-
- Two new arguments for
nmtpy-build-vocab
:-
-f
: Stores frequency counts as well inside the finaljson
vocabulary -
-x
: Does not add special markers<eos>,<bos>,<unk>,<pad>
into the vocabulary
-
- Added
Fusion()
layer toconcat,sum,mul
an arbitrary number of inputs - Added experimental
ImageEncoder()
layer to seamlessly plug a VGG or ResNet CNN usingtorchvision
pretrained models -
Attention
layer arguments improved. You can now select the bottleneck dimensionality for MLP attention withatt_bottleneck
. Thedot
attention is still not tested and probably broken.
New layers/architectures:
- Added AttentiveMNMT which implements modality-specific multimodal attention from the paper Multimodal Attention for Neural Machine Translation
- Added ShowAttendAndTell model
Changes in NMT:
-
dec_init
defaults tomean_ctx
, i.e. the decoder will be initialized with the mean context computed from the source encoder -
enc_lnorm
which was just a placeholder is now removed since we do not provided layer-normalization for now - Beam Search is completely moved to GPU
nmtpytorch is developed in Informatics Lab / Le Mans University - France