Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] Update CI image and unit tests #283

Closed
wants to merge 25 commits into from

Conversation

phoenixdong
Copy link
Collaborator

@phoenixdong phoenixdong commented Nov 28, 2024

CI images:

Using nvcr.io/nvidia/pytorch: 24.05-py3 as the base image, the name of the built image is 'flagscale_cicd:v2.0-pytorch-2.5.1-cuda-12.4.131-ngc-24.05'

  • V2.0 represents a major image update

  • Pytorch 2.5.1-CUDA-12.4.131-NGC-24.05 represents software and basic image versions

Unit tests :

Run all unit tests, skip/fix errors, overall pass

  • For megatron unit testing: repair those that can be repaired, and skip those that cannot be repaired and megatron itself also reports an error

  • For Flagscale unit testing: fixing

Bug fix :

  • Adjust the temporary file path for coverage to avoid coverage loss caused by container destruction

TODO :

  • Add vLLM inference testing

  • Add training tests for Llava onevision

@phoenixdong phoenixdong requested a review from a team as a code owner November 28, 2024 07:16
@phoenixdong phoenixdong changed the title [CI] Add export testing and adjust the testing status verification method [CI] Add export testing and adjust some implementations in CI Nov 28, 2024
@phoenixdong phoenixdong changed the title [CI] Add export testing and adjust some implementations in CI [CI] Update CI image and unit testing Dec 8, 2024
@phoenixdong phoenixdong changed the title [CI] Update CI image and unit testing [CI] Update CI image and unit tests Dec 8, 2024
@@ -280,6 +281,8 @@ def test_load_distribution(self, parallelization_along_dp, tmp_path_dist_ckpt):

assert loaded_state_dict.keys() == state_dict.keys()

# The mock function running internally was not called
@pytest.mark.skipif(os.getenv('flagscale_skip') == '1', reason="flagscale_skip is enabled, skipping test.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The environment variable should be in uppercase letters.

@phoenixdong phoenixdong deleted the update_CI branch December 11, 2024 02:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants