Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Introduce TrialRunner Abstraction #720

Draft
wants to merge 162 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
162 commits
Select commit Hold shift + click to select a range
616f44e
do not pass the optimizer into _run()
motus Feb 21, 2024
33e332a
mypy fixes
motus Feb 21, 2024
0247259
start splitting the optimization loop into two
motus Feb 22, 2024
483e378
first complete version of the optimization loop (not tested yet)
motus Feb 23, 2024
addd5a4
Merge branch 'main' into sergiym/run/2loops
motus Feb 23, 2024
e97266f
allow running mlos_bench.run._main directly from unit tests + add a u…
motus Feb 23, 2024
64771fd
move in-process launch to a separate unit test file
motus Feb 23, 2024
bd7c55e
add is_warm_up flag to the optimization step
motus Feb 23, 2024
387722a
Merge branch 'main' of github.com:microsoft/MLOS into sergiym/run/2loops
motus Feb 23, 2024
9f15aee
in-process optimizaiton loop invocation works!
motus Feb 23, 2024
65cd072
add multi-iteration optimization to in-process test; fix the mlos_cor…
motus Feb 24, 2024
c010d95
make in-process launcerh tests pass
motus Feb 24, 2024
7cfef3a
remove unnecessary local variables to make pylint happy
motus Feb 24, 2024
7233180
move trial_config_repeat_count checks to the launcher
motus Feb 24, 2024
be7dcec
make experiment.load() return trial_ids and use them in the optimizat…
motus Feb 24, 2024
3c52e03
use proper last_trial_id in the main loop; fix the unit tests
motus Feb 24, 2024
0d9dc97
update launcher tests with the new output patterns
motus Feb 24, 2024
4e171e0
remove unused variable
motus Feb 24, 2024
ab69fa0
Merge branch 'main' into sergiym/run/2loops
motus Feb 26, 2024
52adab8
better naming for functions in the optimization loop
motus Feb 26, 2024
df893d9
start implementing the scheduler class
motus Feb 27, 2024
5aca764
change the default value for is_warm_up parameter to False
motus Feb 27, 2024
4d183df
Merge branch 'sergiym/run/2loops' into sergiym/run/scheduler
motus Feb 27, 2024
309e10c
started to implement teh start() method of the sync scheduler
motus Feb 27, 2024
9a72b40
Merge branch 'main' of github.com:microsoft/MLOS into sergiym/run/sch…
motus Feb 27, 2024
ffe23e1
implement proper Scheduler constructor
motus Feb 27, 2024
cb863e0
more clean-ups to the base scheduler
motus Feb 27, 2024
990b019
minor pylint fixes
motus Feb 28, 2024
2ac0520
add _add_trial_to_queue() method
motus Feb 28, 2024
b95100a
better handling of warm-up phase (no redundant code)
motus Feb 28, 2024
e15033d
split the sccheduler implementation into the base class and the sync …
motus Feb 28, 2024
6eab1b0
use the new scheduler in _main()
motus Feb 28, 2024
9c7f2cc
add scheduler config parameters that can be overridden from global co…
motus Feb 28, 2024
479a5ed
add todo comments
motus Feb 28, 2024
50dad9f
update the scores for launcher unit tests + fix teh regexps
motus Feb 28, 2024
220ece1
add logging to the sync optimization loop
motus Feb 28, 2024
29cec19
add more logging to the scheduler class
motus Feb 28, 2024
6f8bb2c
move (sync) implementation of the run_trial() to SyncScheduler; other…
motus Feb 29, 2024
6adb2d0
wip
bpkroth Mar 4, 2024
41a0c37
start tracking which trial runner a trial is assigned to
bpkroth Mar 5, 2024
2453427
Merge branch 'main' of github.com:microsoft/MLOS into sergiym/run/sch…
motus Mar 5, 2024
d8d8dfb
Merge branch 'sergiym/run/scheduler' of github.com:motus/MLOS into se…
motus Mar 5, 2024
8a32e5a
wip: adding trial runner
bpkroth Mar 5, 2024
57dc4c3
Merge remote-tracking branch 'sergiy/sergiym/run/scheduler' into para…
bpkroth Mar 6, 2024
e55f33e
wip: integrating trial runner to merged branch
bpkroth Mar 6, 2024
7df0770
Roll back forceful assignment of PATH when invoking a local process
motus Mar 8, 2024
da55c5e
instantiate Scheduler from JSON config in the launcher (no JSON schem…
motus Mar 8, 2024
f6eb5ef
fix unit tests
motus Mar 8, 2024
97438e7
add test for Launcher scheduler load in test_load_cli_config_examples…
motus Mar 8, 2024
715fab9
Merge branch 'sergiym/local_exec/env' into sergiym/run/scheduler_load
motus Mar 8, 2024
034aef9
fix the way launcher handles trial_config_repeat_count
motus Mar 9, 2024
629236f
minor type fixes
motus Mar 9, 2024
049fdb6
add required_keys for base Scheduler
motus Mar 9, 2024
094155c
remove unnecessary type annotation
motus Mar 9, 2024
a6a7283
typo in pylint exception
motus Mar 9, 2024
0a94a37
make all unit tests run
motus Mar 9, 2024
cf42730
add a missing import
motus Mar 11, 2024
6f31a2d
add ConfigSchema.SCHEDULER (not defined yet)
motus Mar 11, 2024
e6ceb5c
fix the teardown property propagation issue
motus Mar 11, 2024
3121fb0
proper ordering of launcher properties initialization
motus Mar 11, 2024
5951544
fix last unit tests
motus Mar 11, 2024
e3f515c
more unit test fixes
motus Mar 11, 2024
86f155e
add Scheduler JSON config schema
motus Mar 11, 2024
928ceff
validate scheduler JSON schema
motus Mar 11, 2024
1511c6e
add an example config for sync scheduler
motus Mar 11, 2024
38ab457
fix the instantiation of scheduler config from JSON file
motus Mar 11, 2024
9323a1c
minor logging improvements in the Scheduler
motus Mar 11, 2024
6b35444
fix the trial_config_repeat_count default values for CLI
motus Mar 11, 2024
b242f23
roll back some unnecessary test fixes
motus Mar 11, 2024
208c393
temporarily rollback the --max_iterations 9 setting in unit test
motus Mar 11, 2024
303c25f
roll back another small fix to minimize the diff
motus Mar 11, 2024
16ea2cb
undo a fix to LocalExecService that is in a separate PR
motus Mar 11, 2024
5ad4b74
keep minimizing the diff
motus Mar 11, 2024
e0845ea
minimize diff
motus Mar 11, 2024
ed95295
Merge branch 'main' into sergiym/run/scheduler_load
motus Mar 13, 2024
45a9293
Merge branch 'main' of github.com:microsoft/MLOS into sergiym/run/sch…
motus Mar 13, 2024
be0106a
Merge branch 'sergiym/run/scheduler_load' of github.com:motus/MLOS in…
motus Mar 13, 2024
71e3ced
Merge branch 'main' into sergiym/run/scheduler_load
motus Mar 13, 2024
ca9b3a1
Merge branch 'main' into sergiym/run/scheduler_load
motus Mar 14, 2024
52352dc
Merge remote-tracking branch 'upstream/main' into parallel-async-tria…
bpkroth Mar 15, 2024
1aa0e4b
Merge branch 'main' of github.com:motus/MLOS into sergiym/run/schedul…
motus Mar 15, 2024
bbf7922
Merge branch 'sergiym/run/scheduler_load' of github.com:motus/MLOS in…
motus Mar 15, 2024
b204ebc
Fix some storage schema related tests
bpkroth Mar 15, 2024
63da0e0
make local edits scheduler schema aware
bpkroth Mar 15, 2024
ba59035
include the scheduler schema in the global config
bpkroth Mar 15, 2024
2ca34cd
fixup relative paths
bpkroth Mar 15, 2024
946b0c4
basic schema testing
bpkroth Mar 15, 2024
58e8609
Merge branch 'main' into sergiym/run/scheduler_load
bpkroth Mar 15, 2024
7985a3e
add another test case
bpkroth Mar 15, 2024
8070c30
Update mlos_bench/mlos_bench/launcher.py
motus Mar 15, 2024
f395531
pylint
bpkroth Mar 15, 2024
678f4c5
Merge remote-tracking branch 'sergiy/sergiym/run/scheduler_load' into…
bpkroth Mar 15, 2024
969e496
remove async status changes for now - future PR
bpkroth Mar 15, 2024
6f4928f
wip
bpkroth Mar 15, 2024
92382dc
wip: refactor running of a trial to a separate class so we can do the…
bpkroth Mar 19, 2024
f00c975
Merge branch 'main' into trial-runner-abstraction
bpkroth Mar 19, 2024
e91b744
comments
bpkroth Mar 19, 2024
64e7575
consistency
bpkroth Mar 19, 2024
8a9e29e
Merge branch 'main' into trial-runner-abstraction
motus Mar 19, 2024
32c01c0
fixup
bpkroth Mar 20, 2024
0e89e25
schema tests
bpkroth Mar 20, 2024
5549925
spelling
bpkroth Mar 20, 2024
7feba3a
make sure trial_runner_id shows up by default
bpkroth Mar 20, 2024
cc7ed4d
wip: fixups
bpkroth Mar 20, 2024
8d794f1
fixme comments
bpkroth Mar 20, 2024
967b6e2
Launcher args fixups
bpkroth Mar 21, 2024
f4b1348
Fixups and testing for cli config file parsing
bpkroth Mar 21, 2024
a08bf72
more tests
bpkroth Mar 21, 2024
0a22b78
comments
bpkroth Mar 21, 2024
c64e0dc
Merge branch 'main' into launcher-test-args-fixups
bpkroth Apr 29, 2024
c09b427
wip
bpkroth Mar 21, 2024
329bd18
Merge branch 'main' into launcher-test-args-fixups
bpkroth May 10, 2024
680d9a3
Merge remote-tracking branch 'upstream/main' into launcher-test-args-…
bpkroth May 13, 2024
bcf05f9
fixups
bpkroth May 13, 2024
745fa4b
Merge commit 'fd9c8f9935ed41009963d67c100428dfe465dbe9' into trial-ru…
bpkroth Jul 15, 2024
021db59
cherry picking some files from main
bpkroth Jul 15, 2024
1b96ca2
selected reformats
bpkroth Jul 15, 2024
dc47ee5
Merge branch 'main' into trial-runner-abstraction
bpkroth Jul 17, 2024
9023eb7
slurp some files from main
bpkroth Jul 22, 2024
1f9f9c7
apply formatters selectively
bpkroth Jul 22, 2024
e7cda08
Merge branch 'main' into launcher-test-args-fixups
bpkroth Jul 22, 2024
2dee79f
fixups
bpkroth Jul 22, 2024
0e30b8e
Merge branch 'launcher-test-args-fixups' into trial-runner-abstraction
bpkroth Jul 22, 2024
9907455
apply comments
bpkroth Jul 23, 2024
f4e9c3f
Ignore negative config_id from the scheduler schema validation
bpkroth Jul 23, 2024
70647dc
whitespace
bpkroth Jul 23, 2024
4220aac
revert unnecessary lineswap
bpkroth Jul 23, 2024
47a3d20
Merge branch 'launcher-test-args-fixups' into trial-runner-abstraction
bpkroth Jul 23, 2024
2b4aaba
Merge branch 'main' into trial-runner-abstraction
bpkroth Jul 23, 2024
5a327dd
apply suggestion
bpkroth Jul 23, 2024
49087c4
formatting
bpkroth Jul 23, 2024
0ac79aa
wip: assigning new trial runner ids to old trials
bpkroth Jul 23, 2024
5c772cd
Merge branch 'main' into trial-runner-abstraction
bpkroth Aug 12, 2024
0657ece
Merge branch 'main' into trial-runner-abstraction
bpkroth Aug 20, 2024
18536e6
Merge branch 'main' into trial-runner-abstraction
bpkroth Sep 23, 2024
094cbf6
format
bpkroth Sep 23, 2024
6f54b21
refactor to allow easier scheduling overrides
bpkroth Sep 23, 2024
0ac19ae
reformat
bpkroth Sep 23, 2024
eab3c6e
comments
bpkroth Sep 23, 2024
b1fa8c2
tweaks
bpkroth Sep 23, 2024
a48e831
wip
bpkroth Sep 23, 2024
6557291
Adding status() output to MockEnv
bpkroth Sep 24, 2024
2877df1
expose root_env_config property
bpkroth Sep 25, 2024
e3005a0
refactor to use scheduler
bpkroth Sep 25, 2024
14cd3fe
Merge branch 'main' into use-scheduler-in-dummy-runs
bpkroth Sep 27, 2024
376461a
wip
bpkroth Sep 27, 2024
a61678c
tweaks to metadata checks
bpkroth Sep 27, 2024
fbdf3a1
comments
bpkroth Sep 27, 2024
192dd80
Merge branch 'use-scheduler-in-dummy-runs' into trial-runner-abstraction
bpkroth Sep 27, 2024
cfbe440
comments
bpkroth Sep 27, 2024
a5edafc
Merge branch 'use-scheduler-in-dummy-runs' into trial-runner-abstraction
bpkroth Sep 27, 2024
9e433b7
separate run vs status random
bpkroth Sep 30, 2024
6f50ed6
Merge branch 'use-scheduler-in-dummy-runs' into trial-runner-abstraction
bpkroth Sep 30, 2024
1b1ad69
adjustments
bpkroth Oct 1, 2024
8949624
Merge branch 'main' into trial-runner-abstraction
bpkroth Oct 2, 2024
aa158ba
Merge branch 'trial-runner-abstraction' of github.com:bpkroth/MLOS in…
bpkroth Oct 2, 2024
5fcfee6
Merge branch 'main' into trial-runner-abstraction
bpkroth Oct 3, 2024
65de078
move save_params to common for reuse by trial
bpkroth Oct 3, 2024
67dedf2
add more tests
bpkroth Oct 3, 2024
273ed9e
docstings
bpkroth Oct 3, 2024
143b7f8
Merge branch 'main' into trial-runner-abstraction
bpkroth Oct 8, 2024
110cb79
comments
bpkroth Oct 14, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions mlos_bench/mlos_bench/config/schemas/cli/cli-schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,13 @@
"examples": [3, 5]
},

"num_trial_runners": {
"description": "Number of trial runner instances to use to execute benchmark environments. Individual TrialRunners can be identified in configs with $trial_runner_id and optionally run in parallel.",
"type": "integer",
"minimum": 1,
"examples": [1, 3, 5, 10]
},

"storage": {
"description": "Path to the json config describing the storage backend to use.",
"$ref": "#/$defs/json_config_path"
Expand Down
32 changes: 30 additions & 2 deletions mlos_bench/mlos_bench/environments/base_environment.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,15 @@ class Environment(metaclass=abc.ABCMeta):
# pylint: disable=too-many-instance-attributes
"""An abstract base of all benchmark environments."""

# Should be provided by the runtime.
_COMMON_CONST_ARGS = {
"trial_runner_id",
}
_COMMON_REQ_ARGS = {
"experiment_id",
"trial_id",
}

@classmethod
def new( # pylint: disable=too-many-arguments
cls,
Expand Down Expand Up @@ -123,6 +132,12 @@ def __init__( # pylint: disable=too-many-arguments
An optional service object (e.g., providing methods to
deploy or reboot a VM/Host, etc.).
"""
global_config = global_config or {}
# Make some usual runtime arguments available for tests.
for arg in self._COMMON_CONST_ARGS:
global_config.setdefault(arg, None)
for arg in self._COMMON_REQ_ARGS:
global_config.setdefault(arg, None)
self._validate_json_config(config, name)
self.name = name
self.config = config
Expand Down Expand Up @@ -161,8 +176,9 @@ def __init__( # pylint: disable=too-many-arguments
req_args = set(config.get("required_args", [])) - set(
self._tunable_params.get_param_values().keys()
)
req_args.update(self._COMMON_CONST_ARGS)
merge_parameters(dest=self._const_args, source=global_config, required_keys=req_args)
self._const_args = self._expand_vars(self._const_args, global_config or {})
self._const_args = self._expand_vars(self._const_args, global_config)
bpkroth marked this conversation as resolved.
Show resolved Hide resolved

self._params = self._combine_tunables(self._tunable_params)
_LOG.debug("Parameters for '%s' :: %s", name, self._params)
Expand Down Expand Up @@ -332,6 +348,18 @@ def tunable_params(self) -> TunableGroups:
"""
return self._tunable_params

@property
def const_args(self) -> Dict[str, TunableValue]:
"""
Get the constant arguments for this Environment.

Returns
-------
parameters : Dict[str, TunableValue]
Key/value pairs of all environment const_args parameters.
"""
return self._const_args.copy()

@property
def parameters(self) -> Dict[str, TunableValue]:
"""
Expand All @@ -345,7 +373,7 @@ def parameters(self) -> Dict[str, TunableValue]:
Key/value pairs of all environment parameters
(i.e., `const_args` and `tunable_params`).
"""
return self._params
return self._params.copy()

def setup(self, tunables: TunableGroups, global_config: Optional[dict] = None) -> bool:
"""
Expand Down
52 changes: 45 additions & 7 deletions mlos_bench/mlos_bench/launcher.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
from mlos_bench.optimizers.mock_optimizer import MockOptimizer
from mlos_bench.optimizers.one_shot_optimizer import OneShotOptimizer
from mlos_bench.schedulers.base_scheduler import Scheduler
from mlos_bench.schedulers.trial_runner import TrialRunner
from mlos_bench.services.base_service import Service
from mlos_bench.services.config_persistence import ConfigPersistenceService
from mlos_bench.services.local.local_exec import LocalExecService
Expand All @@ -44,6 +45,7 @@ class Launcher:

def __init__(self, description: str, long_text: str = "", argv: Optional[List[str]] = None):
# pylint: disable=too-many-statements
# pylint: disable=too-complex
bpkroth marked this conversation as resolved.
Show resolved Hide resolved
# pylint: disable=too-many-locals
_LOG.info("Launch: %s", description)
epilog = """
Expand Down Expand Up @@ -108,6 +110,7 @@ def __init__(self, description: str, long_text: str = "", argv: Optional[List[st
args_rest=args_rest,
global_config=cli_config_args,
)
# TODO: Can we generalize these two rules using excluded_cli_args?
# experiment_id is generally taken from --globals files, but we also allow
# overriding it on the CLI.
# It's useful to keep it there explicitly mostly for the --help output.
Expand All @@ -117,6 +120,13 @@ def __init__(self, description: str, long_text: str = "", argv: Optional[List[st
# set it via command line
if args.trial_config_repeat_count:
self.global_config["trial_config_repeat_count"] = args.trial_config_repeat_count
self.global_config.setdefault("num_trial_runners", 1)
if args.num_trial_runners:
self.global_config["num_trial_runners"] = args.num_trial_runners
if self.global_config["num_trial_runners"] <= 0:
raise ValueError(
f"Invalid num_trial_runners: {self.global_config['num_trial_runners']}"
)
# Ensure that the trial_id is present since it gets used by some other
# configs but is typically controlled by the run optimize loop.
self.global_config.setdefault("trial_id", 1)
Expand All @@ -142,13 +152,28 @@ def __init__(self, description: str, long_text: str = "", argv: Optional[List[st
)
self.root_env_config = self._config_loader.resolve_path(env_path)

self.environment: Environment = self._config_loader.load_environment(
self.root_env_config, TunableGroups(), self.global_config, service=self._parent_service
self.trial_runners: List[TrialRunner] = []
for trial_runner_id in range(self.global_config["num_trial_runners"]):
# Create a new global config for each Environment with a unique trial_runner_id for it.
env_global_config = self.global_config.copy()
env_global_config["trial_runner_id"] = trial_runner_id
env = self._config_loader.load_environment(
self.root_env_config,
TunableGroups(),
env_global_config,
service=self._parent_service,
)
self.trial_runners.append(TrialRunner(trial_runner_id, env))
_LOG.info(
"Init %d trial runners for environments: %s",
len(self.trial_runners),
list(trial_runner.environment for trial_runner in self.trial_runners),
)
_LOG.info("Init environment: %s", self.environment)

# NOTE: Init tunable values *after* the Environment, but *before* the Optimizer
# NOTE: Init tunable values *after* the Environment(s), but *before* the Optimizer
# TODO: should we assign the same or different tunables for all TrialRunner Environments?
self.tunables = self._init_tunable_values(
self.trial_runners[0].environment,
args.random_init or config.get("random_init", False),
config.get("random_seed") if args.random_seed is None else args.random_seed,
config.get("tunable_values", []) + (args.tunable_values or []),
Expand Down Expand Up @@ -278,6 +303,18 @@ def add_argument(self, *args: Any, **kwargs: Any) -> None:
),
)

parser.add_argument(
"--num_trial_runners",
"--num-trial-runners",
required=False,
type=int,
help=(
"Number of TrialRunners to use for executing benchmark Environments. "
"Individual TrialRunners can be identified in configs with $trial_runner_id "
"and optionally run in parallel."
),
)

path_args_tracker.add_argument(
"--scheduler",
required=False,
Expand Down Expand Up @@ -428,14 +465,15 @@ def _load_config(

def _init_tunable_values(
self,
env: Environment,
random_init: bool,
seed: Optional[int],
args_tunables: Optional[str],
) -> TunableGroups:
"""Initialize the tunables and load key/value pairs of the tunable values from
given JSON files, if specified.
"""
tunables = self.environment.tunable_params
tunables = env.tunable_params
_LOG.debug("Init tunables: default = %s", tunables)

if random_init:
Expand Down Expand Up @@ -534,7 +572,7 @@ def _load_scheduler(self, args_scheduler: Optional[str]) -> Scheduler:
"teardown": self.teardown,
},
global_config=self.global_config,
environment=self.environment,
trial_runners=self.trial_runners,
optimizer=self.optimizer,
storage=self.storage,
root_env_config=self.root_env_config,
Expand All @@ -544,7 +582,7 @@ def _load_scheduler(self, args_scheduler: Optional[str]) -> Scheduler:
return self._config_loader.build_scheduler(
config=class_config,
global_config=self.global_config,
environment=self.environment,
trial_runners=self.trial_runners,
optimizer=self.optimizer,
storage=self.storage,
root_env_config=self.root_env_config,
Expand Down
Loading
Loading