Releases: NVIDIA/sentiment-discovery
Releases · NVIDIA/sentiment-discovery
v0.3.large_batch_stable: Code necessary to reproduce results from our large batch training paper
v0.3.large_batch_stable: Code necessary to reproduce results from our large batch training paper
Latest
This release is used to reproduce results from our Large Scale LM paper.
v0.3 Release: Speed & Memory Usage improvements + PyTorch 0.5 updates
We've switched our mLSTM model to internally used PyTorch's fused LSTM cell which provides significantly improved GPU memory usage (allowing for larger batch size training) and slight improvements to speed compared to the unfused version we had included in earlier versions.
In order to convert any models you've trained in the past to be usable with this version, please see this issue.
We've also updated our distributed code to address the recent April 3rd changes made to PyTorch's Tensors and Variables.
v0.2 Release: FP16, Distributed, and Usability updates
Our main goal with this release is two-fold:
- address concerns around usability
- Update repo with new code for FP16, distributed training
Usability
- We've brought our training/generation code more in line with the pytorch word language model example
- Provide PyTorch classifier module/function for classifying sentiment from input text tensor
- Provide pretrained classifiers/language models for this module
- Provide simple standalone classifier script/example capable of classifying an input csv/json and writing results to other csv/jsons
- Flattening our directory structure to make code easier to find
- Putting reusable PyTorch functionality (new RNN api, weight norm functionality, eventually all fp16 functionality) in its own standalone python module to be published at a later date
FP16 + Distributed
- FP16 optimizer wrapper for optimizating FP16 models according to our [best practices] (https://github.com/NVIDIA/sentiment-discovery/blob/master/analysis/reproduction.md#fp16-training)
- available in
fp16/fp16.py
- available in
- Lightweight distributed wrapper for all reducing gradients across multiple gpus with either nccl or gloo backends
model/distributed.py
- distributed worker launch script
multiproc.py
Main v0 release
Module updates
- Fused LSTM kernels in mLSTM module with
fuse_lstm
flags
Model updates - improved model serialization size and options
- no saving of gradients
- saving optimizer is optional
- reloading weights trained with weight norm is more stable
Weight Norm/Reparameterization update
- modified hooks to work with fused LSTM kernel
Data updates - Parses dataset types (csv, json, etc) automatically. Only need to specify supervised vs unsupervised
- Added loose json functionality
- Tested csv datasets more thoroughly
- Save Names of processed results fixed so that original file's name stays the same now.
- Fixed DataParallel/DistributedDP batching of evaluation datasets
- Made it easier to specify validation/test datasets
- Made it easier to specify dataset shards
- Added negative sequence lengths for datasets.