Releases: DeepRec-AI/DeepRec
r1.15.5-deeprec2206
Major Features and Improvements
Embedding
- Multi-tier of EmbeddingVariable, add SSD_HashKV which is better performance than LevelDB.
- Support GPU EmbeddingVariable which gather/apply ops place on GPU.
- Add user API to record frequence and version for EmbeddingVariable.
Graph Optimization
- Add Embedding Fusion ops for CPU/GPU.
- Optimize SmartStage performance on GPU.
Runtime Optimization
- Executor, support cost-based and critical path ops first.
- GPUAllocator, support CUDA malloc async allocator. (need to use >= CUDA 11.2)
- CPUAllocator, automatically memory allocation policy generation.
- PMEMAllocator, optimize allocator and add statistic.
Ops & Hardware Acceleration
- Implement SparseReshape, SparseApplyAdam, SparseApplyAdagrad, SparseApplyFtrl, ApplyAdamAsync, SparseApplyAdamAsync, KvSparseApplyAdamAsync GPU kernels.
- Optimize UnSortedSegment on CPU.
- Upgrade OneDNN to v2.6.
IO & Dataset
- ParquetDataset, add parquet dataset which could reduce storage and improve performance.
Model Save/Restore
- Asynchronous restore EmbeddingVariable from checkpoint.
Serving
- SessionGroup, highly improve QPS and RT in inference.
ModelZoo
- Add models SimpleMultiTask, ESSM, DBMTL, MMoE, BST.
Profiler
- Support for mapping of operators and real thread ids in timeline.
BugFix
- Fix EmbeddingVariable core when EmbeddingVariable only has primary embedding value.
- Fix abnormal behavior in L2-norm calculation.
- Fix save checkpoint issue when use LevelDB in EmbeddingVariable.
- Fix delete old checkpoint failure when use incremental checkpoint.
- Fix build failure with CUDA 11.6.
More details of features: https://deeprec.readthedocs.io/zh/latest/
Release Images
CPU Image
alideeprec/deeprec-release:deeprec2206-cpu-py36-ubuntu18.04
GPU Image
alideeprec/deeprec-release:deeprec2206-gpu-py36-cu110-ubuntu18.04
r1.15.5-deeprec2204u1
Major Features and Improvements
BugFix
- Fix saving checkpoint issue when use EmbeddingVariable. (#167)
- Fix inputs from different frames issue when use auto graph fusion. (#144)
- Fix embedding_lookup_sparse graph issue.
Release Images
CPU Image
alideeprec/deeprec-release:deeprec2204u1-cpu-py36-ubuntu18.04
GPU Image
alideeprec/deeprec-release:deeprec2204u1-gpu-py36-cu110-ubuntu18.04
r1.15.5-deeprec2204
Major Features and Improvements
Embedding
- Support hybrid storage of EmbeddingVariable (DRAM, PMEM, LevelDB)
- Support memory-continuous storage of multi-slot EmbeddingVariable.
- Optimize beta1_power and beta2_power slots of EmbeddingVariable.
- Support restore frequency of features in EmbeddingVariable.
Distributed Training
- Integrate SOK in DeepRec.
Graph Optimization
- Auto Graph Fusion, support float32/int32/int64 type for select fusion.
- SmartStage, fix graph contains circle bug when enable SmartStage optimization.
Runtime Optimization
- GPUTensorPoolAllocator, which reduce GPU memory usage and improve performance.
- PMEMAllocator, support allocation in persistent memory.
Optimizer
- Optimize AdamOptimizer performance.
Op & Hardware Acceleration
- Change fused MatMul layout type and number thread for small size inputs.
IO & Dataset
- KafkaGroupIODataset, support consumer rebalance.
Model Save/Restore
- Support dump incremental graph info.
Serving
- Add serving module (ODL processor), which support Online Deep Learning (ODL).
More details of features: https://deeprec.readthedocs.io/zh/latest/
Release Images
CPU Image
registry.cn-shanghai.aliyuncs.com/pai-dlc-share/deeprec-training:deeprec2204-cpu-py36-ubuntu18.04
GPU Image
registry.cn-shanghai.aliyuncs.com/pai-dlc-share/deeprec-training:deeprec2204-gpu-py36-cu110-ubuntu18.04
Known Issue
Some user report issue when use Embedding Variable, such as #167. The bug is fixed in r1.15.5-deeprec2204u1.
r1.15.5-deeprec2201
This is the first release of DeepRec. DeepRec has super large-scale distributed training capability, supporting model training of trillion samples and 100 billion Embedding Processing. For sparse model scenarios, in-depth performance optimization has been conducted across CPU and GPU platform.
Major Features and Improvements
Embedding
- Embedding Variable (including feature eviction and feature filter)
- Dynamic Dimension Embedding Variable
- Adaptive Embedding
- Multi-Hash Variable
Distributed Training
- GRPC++
- StarServer
Graph Optimization
- Auto Micro Batch
- Auto Graph Fusion
- Embedding Fusion
- Smart Stage
Runtime Optimization
- CPU Memory Optimization
- GPU Memory Optimization
- GPU Virtual Memory
Optimizer
- AdamAsync Optimizer
- AdagradDecay Optimizer
Op & Hardware Acceleration
- Unique, Gather, DynamicStitch, BiasAdd, Select, Transpose, SparseSegmentReduction, where, DynamicPartition, SparseConcat tens of ops' CPU/GPU optimization.
- support oneDNN-2.3.2 & bf16
- Support TF32
IO & Dataset
- WorkQueue
- KafkaDataset
More details of features: https://deeprec.readthedocs.io/zh/latest/
Release Images
CPU Image
registry.cn-shanghai.aliyuncs.com/pai-dlc-share/deeprec-training:deeprec2201-cpu-py36-ubuntu18.04
GPU Image
registry.cn-shanghai.aliyuncs.com/pai-dlc-share/deeprec-training:deeprec2201-gpu-py36-cu110-ubuntu18.04