-
Notifications
You must be signed in to change notification settings - Fork 52
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[CI] Update CI image and unit tests (#289)
CI images: Using nvcr.io/nvidia/pytorch: 24.05-py3 as the base image, the name of the built image is 'flagscale_cicd:v2.0-pytorch-2.5.1-cuda-12.4.131-ngc-24.05' - V2.0 represents a major image update - Pytorch 2.5.1-CUDA-12.4.131-NGC-24.05 represents software and basic image versions Unit tests : Run all unit tests, skip/fix errors, overall pass - For megatron unit testing: fixing(know why something went wrong) or skip - For Flagscale unit testing: fixing Functional tests : - Add training tests for Llava onevision (Temporarily closed due to data updates) Bug fix : - Adjust the temporary file path for coverage to avoid coverage loss caused by container destruction TODO : - Add vLLM inference testing - Add Dockerfile.ci (using conda)
- Loading branch information
1 parent
6df55e3
commit 2c880b3
Showing
33 changed files
with
561 additions
and
59 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
name: Clean Old Report | ||
|
||
on: | ||
workflow_call: | ||
inputs: | ||
backend: | ||
required: true | ||
type: string | ||
|
||
jobs: | ||
clean-report: | ||
runs-on: self-hosted | ||
container: | ||
image: localhost:5000/flagscale_cicd:v2.0-pytorch-2.5.1-cuda-12.4.131-ngc-24.05 | ||
ports: | ||
- 80 | ||
volumes: | ||
- /home/flagscale_cicd/flask/static:/workspace/report | ||
- /home/flagscale_cicd/flask/config:/workspace/config | ||
options: --hostname flagscale_cicd | ||
|
||
steps: | ||
- name: Clean Old Report Report | ||
run: | | ||
REPORT_ADDR=$(cat "/workspace/config/report_address") | ||
echo "Clean old Report report at the http://${REPORT_ADDR}/${{github.sha}}/cov-report-${{ inputs.backend }}/diff-cover-report-${{ inputs.backend }}.html" | ||
if [ -d "/workspace/report/${{ github.sha }}/cov-report-${{ inputs.backend }}" ]; then | ||
rm -r /workspace/report/${{ github.sha }}/cov-report-${{ inputs.backend }} | ||
fi | ||
if [ -d "/workspace/report/${{ github.sha }}/cov-temp-${{ inputs.backend }}" ]; then | ||
rm -r /workspace/report/${{ github.sha }}/cov-temp-${{ inputs.backend }} | ||
fi |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,100 @@ | ||
FROM nvcr.io/nvidia/pytorch:24.05-py3 | ||
|
||
ENV DEBIAN_FRONTEND noninteractive | ||
ENV TZ=Asia/Shanghai | ||
|
||
|
||
############################################################################## | ||
# To avoid "curl 92 HTTP/2 stream 0 was not closed cleanly: CANCEL (err 8)" or "fetch-pack: unexpected disconnect while reading sideband packet". | ||
############################################################################## | ||
# lowSpeedTime=300s lowSpeedLimit=100B | ||
RUN git config --global http.lowSpeedTime 300 \ | ||
&& git config --global http.lowSpeedLimit 100 \ | ||
&& git config --global http.postBuffer 524288000 | ||
|
||
|
||
############################################################################## | ||
# Change apt source to Ksyun | ||
############################################################################## | ||
RUN sed -i "s#\S\+#http://apt.ksyun.cn/ubuntu/#2" /etc/apt/sources.list && \ | ||
> /etc/apt/apt.conf.d/docker-clean && \ | ||
> /etc/dpkg/dpkg.cfg.d/pkg-config-hook-config | ||
|
||
|
||
############################################################################## | ||
# Install basic utilities | ||
############################################################################## | ||
RUN apt-get update && \ | ||
apt-get install -y --no-install-recommends \ | ||
software-properties-common build-essential autotools-dev \ | ||
nfs-common pdsh \ | ||
curl wget vim tmux less unzip \ | ||
htop iftop iotop ca-certificates openssh-client openssh-server \ | ||
rsync iputils-ping net-tools sudo \ | ||
tzdata psmisc screen libx11-dev llvm-dev && \ | ||
apt-get clean && \ | ||
rm -rf /var/lib/apt/lists/* | ||
|
||
|
||
############################################################################## | ||
# Uninstall unnecessary packages and their dependencies | ||
############################################################################## | ||
RUN pip install --upgrade pip && pip install pip-autoremove && \ | ||
pip-autoremove torch torchvision torchaudio torch-tensorrt transformer_engine \ | ||
pytorch-quantization pytorch-triton \ | ||
flash-attn tensorboard apex cudf dask-cudf \ | ||
cugraph cugraph-dgl cugraph-pyg cugraph-service-server -y | ||
|
||
|
||
############################################################################## | ||
# Install PyTorch | ||
############################################################################## | ||
RUN pip install --upgrade pip \ | ||
&& pip install --no-cache-dir torch==2.5.1 torchvision torchaudio \ | ||
-f https://download.pytorch.org/whl/cu124/torch_stable.html -v \ | ||
|| { echo 'PyTorch installation failed'; exit 1; } | ||
|
||
|
||
############################################################################## | ||
# Install, run, and test dependent environments and data | ||
############################################################################## | ||
RUN pip install pytest pytest-cov pytest_mock pytest-random-order \ | ||
pre-commit black isort diff-cover \ | ||
zarr tensorstore==0.1.45 wrapt tiktoken omegaconf setuptools_scm hydra-core Ray==2.40.0 numpy==1.26.4 pillow==10.4.0 \ | ||
git+https://github.com/fanshiqing/[email protected] nltk==3.8.1 \ | ||
&& python -m nltk.downloader -d /root/nltk_data punkt | ||
|
||
|
||
# apex | ||
RUN cd /workspace \ | ||
&& git clone https://github.com/NVIDIA/apex \ | ||
&& cd apex \ | ||
&& pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation \ | ||
--config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./ | ||
|
||
|
||
# flash-attention | ||
# Supported flash-attn versions are >= 2.1.1, <= 2.6.3. | ||
# flash-attn==2.6.3 | ||
RUN cd /workspace \ | ||
&& git clone https://github.com/Dao-AILab/flash-attention.git \ | ||
&& cd flash-attention \ | ||
&& git checkout c1d146c \ | ||
&& git submodule update --init --recursive \ | ||
&& MAX_JOBS=96 python setup.py install | ||
|
||
|
||
# TransformerEngin | ||
RUN cd /workspace \ | ||
&& git clone -b stable https://github.com/NVIDIA/TransformerEngine.git \ | ||
&& cd TransformerEngine \ | ||
&& git submodule update --init --recursive \ | ||
&& pip install . | ||
|
||
|
||
# xformers | ||
RUN cd /workspace \ | ||
&& git clone https://github.com/facebookresearch/xformers.git \ | ||
&& cd xformers \ | ||
&& git submodule update --init --recursive \ | ||
&& pip install -v -U . |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.