Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(mpi): remove global launcherRunsWorkload flag. #198

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

SimonCqk
Copy link
Collaborator

Signed-off-by: SimonCqk [email protected]

Ⅰ. Describe what this PR does

remove launcherRunsWorkload global startup flag, which can be inferred by whether has Launcher role in job spec.

II. Does this pull request fix one issue?

fix #194

III. Special notes for reviewers if any.

@SimonCqk SimonCqk requested a review from jian-he October 21, 2021 12:53
@codecov-commenter
Copy link

Codecov Report

Merging #198 (d1cbc96) into master (0eb96cd) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #198   +/-   ##
=======================================
  Coverage   23.18%   23.18%           
=======================================
  Files          75       75           
  Lines        4502     4502           
=======================================
  Hits         1044     1044           
  Misses       3319     3319           
  Partials      139      139           
Flag Coverage Δ
unittests 23.18% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0eb96cd...d1cbc96. Read the comment docs.

@jian-he
Copy link
Collaborator

jian-he commented Oct 22, 2021

does this mean the launcher pod always run the workload ? is this expected ?

@SimonCqk
Copy link
Collaborator Author

does this mean the launcher pod always run the workload ? is this expected ?

if launcher role is in job spec, then it is, otherwise workers should tigger workload running.

@jian-he
Copy link
Collaborator

jian-he commented Oct 26, 2021

does this mean the launcher pod always run the workload ? is this expected ?

if launcher role is in job spec, then it is, otherwise workers should tigger workload running.

then, why do we need the launcher pod in the first place, if it runs the same as worker..

@jian-he
Copy link
Collaborator

jian-he commented Oct 26, 2021

anyway, the patch looks ok

@SimonCqk
Copy link
Collaborator Author

then, why do we need the launcher pod in the first place, if it runs the same as worker..

Launcher pod execution scripts are generated by controller and it triggers training by hooking scripts in OpenMPI framework extension points, if only Worker roles in job spec, user has to handle it manually.

@HeGaoYuan
Copy link
Contributor

does this mean the launcher pod always run the workload ? is this expected ?

if launcher role is in job spec, then it is, otherwise workers should tigger workload running.

I don't think it is ok. Because for the mpijob, we always have the launcher role in job spec.

@SimonCqk
Copy link
Collaborator Author

does this mean the launcher pod always run the workload ? is this expected ?

if launcher role is in job spec, then it is, otherwise workers should tigger workload running.

I don't think it is ok. Because for the mpijob, we always have the launcher role in job spec.

yes, that's what I mean, normally mpijob will be driven by launcher commands. In that case, should launcherRunsWorkload always be true?

@HeGaoYuan
Copy link
Contributor

does this mean the launcher pod always run the workload ? is this expected ?

if launcher role is in job spec, then it is, otherwise workers should tigger workload running.

I don't think it is ok. Because for the mpijob, we always have the launcher role in job spec.

yes, that's what I mean, normally mpijob will be driven by launcher commands. In that case, should launcherRunsWorkload always be true?

In my opinion, launcherRunsWorkload should not always be true.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feature request] get rid of launcherRunsWorkload global flag
4 participants