Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] Update CI image and unit tests #289

Merged
merged 41 commits into from
Dec 18, 2024
Merged

Conversation

phoenixdong
Copy link
Collaborator

@phoenixdong phoenixdong commented Dec 11, 2024

CI images:

Using nvcr.io/nvidia/pytorch: 24.05-py3 as the base image, the name of the built image is 'flagscale_cicd:v2.0-pytorch-2.5.1-cuda-12.4.131-ngc-24.05'

  • V2.0 represents a major image update

  • Pytorch 2.5.1-CUDA-12.4.131-NGC-24.05 represents software and basic image versions

Unit tests :

Run all unit tests, skip/fix errors, overall pass

  • For megatron unit testing: fixing(know why something went wrong) or skip

  • For Flagscale unit testing: fixing

Functional tests :

  • Add training tests for Llava onevision (Temporarily closed due to data updates)

Bug fix :

  • Adjust the temporary file path for coverage to avoid coverage loss caused by container destruction

TODO :

  • Add vLLM inference testing

  • Add Dockerfile.ci (using conda)

Copy link
Contributor

@aoyulong aoyulong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@aoyulong aoyulong merged commit 2c880b3 into FlagOpen:main Dec 18, 2024
24 of 30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants