- [09/02/2023] Adding AMD GPU support, released Docker images for ROCM 5.3->5.6
- [08/16/2023] Adding Starcoder model support
- [08/14/2023] Released Docker image for different CUDA versions
- OS: Linux
- GPU backend: Hip-ROCm or CUDA
- CUDA version: 10.2 – 12.0
- NVIDIA compute capability: 6.0 or higher
- Python: 3.6 or higher
- Package dependencies: see here
You can install FlexFlow using pip:
pip install flexflow
If you run into any issue during the install, or if you would like to use the C++ API without needing to install from source, you can also use our pre-built Docker package for different CUDA versions and the hip_rocm
backend. To download and run our pre-built Docker container:
docker run --gpus all -it --rm --shm-size=8g ghcr.io/flexflow/flexflow-cuda-12.0:latest
To download a Docker container for a backend other than CUDA v12.0, you can replace the cuda-12.0
suffix with any of the following backends: cuda-11.1
, cuda-11.2
, cuda-11.3
, cuda-11.4
, cuda-11.5
, cuda-11.6
, cuda-11.7
, cuda-11.8
, and hip_rocm-5.3
, hip_rocm-5.4
, hip_rocm-5.5
, hip_rocm-5.6
). More info on the Docker images, with instructions to build a new image from source, or run with additional configurations, can be found here.
You can install FlexFlow Serve from source code by building the inference branch of FlexFlow. Please follow these instructions.
To get started, check out the quickstart guides below for the FlexFlow training and serving libraries.
Please let us know if you encounter any bugs or have any suggestions by submitting an issue.
We welcome all contributions to FlexFlow from bug fixes to new features and extensions.
FlexFlow Serve:
- Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Zeyu Wang, Rae Ying Yee Wong, Alan Zhu, Lijie Yang, Xiaoxiang Shi, Chunan Shi, Zhuoming Chen, Daiyaan Arfeen, Reyna Abhyankar, Zhihao Jia. SpecInfer: Accelerating Generative Large Language Model Serving with Speculative Inference and Token Tree Verification. In ArXiV, May 2023.
FlexFlow Train:
-
Colin Unger, Zhihao Jia, Wei Wu, Sina Lin, Mandeep Baines, Carlos Efrain Quintero Narvaez, Vinay Ramakrishnaiah, Nirmal Prajapati, Pat McCormick, Jamaludin Mohd-Yusof, Xi Luo, Dheevatsa Mudigere, Jongsoo Park, Misha Smelyanskiy, and Alex Aiken. Unity: Accelerating DNN Training Through Joint Optimization of Algebraic Transformations and Parallelization. In Proceedings of the Symposium on Operating Systems Design and Implementation (OSDI), July 2022.
-
Zhihao Jia, Matei Zaharia, and Alex Aiken. Beyond Data and Model Parallelism for Deep Neural Networks. In Proceedings of the 2nd Conference on Machine Learning and Systems (MLSys), April 2019.
-
Zhihao Jia, Sina Lin, Charles R. Qi, and Alex Aiken. Exploring Hidden Dimensions in Parallelizing Convolutional Neural Networks. In Proceedings of the International Conference on Machine Learning (ICML), July 2018.
FlexFlow is developed and maintained by teams at CMU, Facebook, Los Alamos National Lab, MIT, and Stanford (alphabetically).
FlexFlow uses Apache License 2.0.