Skip to content

Commit

Permalink
Merge pull request #497 from dtrudg/496
Browse files Browse the repository at this point in the history
fix: gpu: ensure MIGs available with --nvccli and no --contain (release-3.9)
  • Loading branch information
dtrudg authored Dec 29, 2021
2 parents afd5bbc + e0ce9e8 commit f60a1df
Show file tree
Hide file tree
Showing 2 changed files with 20 additions and 5 deletions.
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
# SingularityCE Changelog

## Changes Since Last Release

### Bug fixes

- Ensure MIGs are visible with `--nvccli` in non-contained mode, to match the
legacy GPU binding behaviour.
- Avoid fd leak in loop device transient error path.

## v3.9.2 \[2021-12-10\]

### Bug fixes
Expand Down
17 changes: 12 additions & 5 deletions cmd/internal/cli/actions_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -800,11 +800,18 @@ func setNvCCLIConfig(engineConfig *singularityConfig.EngineConfig) (err error) {
sylog.Debugf("Using nvidia-container-cli for GPU setup")
engineConfig.SetNvCCLI(true)

// When we use --contain we don't mount the NV devices by default in the nvidia-container-cli flow,
// they must be mounted via specifying with`NVIDIA_VISIBLE_DEVICES`. This differs from the legacy
// flow which mounts all GPU devices, always.
if (IsContained || IsContainAll) && os.Getenv("NVIDIA_VISIBLE_DEVICES") == "" {
sylog.Warningf("When using nvidia-container-cli with --contain NVIDIA_VISIBLE_DEVICES must be set or no GPUs will be available in container.")
if os.Getenv("NVIDIA_VISIBLE_DEVICES") == "" {
if IsContained || IsContainAll {
// When we use --contain we don't mount the NV devices by default in the nvidia-container-cli flow,
// they must be mounted via specifying with`NVIDIA_VISIBLE_DEVICES`. This differs from the legacy
// flow which mounts all GPU devices, always... so warn the user.
sylog.Warningf("When using nvidia-container-cli with --contain NVIDIA_VISIBLE_DEVICES must be set or no GPUs will be available in container.")
} else {
// In non-contained mode set NVIDIA_VISIBLE_DEVICES="all" by default, so MIGs are available.
// Otherwise there is a difference vs legacy GPU binding. See Issue #471.
sylog.Infof("Setting 'NVIDIA_VISIBLE_DEVICES=all' to emulate legacy GPU binding.")
os.Setenv("NVIDIA_VISIBLE_DEVICES", "all")
}
}

// Pass NVIDIA_ env vars that will be converted to nvidia-container-cli options
Expand Down

0 comments on commit f60a1df

Please sign in to comment.