Skip to content
This repository has been archived by the owner on Nov 23, 2017. It is now read-only.

Config file devel #41

Open
wants to merge 5 commits into
base: branch-1.6
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
94 changes: 92 additions & 2 deletions spark_ec2.py
Original file line number Diff line number Diff line change
Expand Up @@ -144,15 +144,36 @@ def setup_external_libs(libs):
tar.close()
os.remove(tgz_file_path)
print(" - Finished downloading {lib}.".format(lib=lib["name"]))
sys.path.insert(1, lib_dir)

if lib["add-path"]:
lib["add-path"](lib_dir)
else:
sys.path.insert(1, lib_dir)


def add_pyaml_path(lib_dir):
lib_py2 = os.path.join(lib_dir, "lib")
if os.path.exists(lib_py2) and sys.version < "3":
sys.path.insert(1, lib_py2)

lib_py3 = os.path.join(lib_dir, "lib3")
if os.path.exists(lib_py3) and sys.version >= "3":
sys.path.insert(1, lib_py3)


# Only PyPI libraries are supported.
external_libs = [
{
"name": "boto",
"version": "2.34.0",
"md5": "5556223d2d0cc4d06dd4829e671dcecd"
"md5": "5556223d2d0cc4d06dd4829e671dcecd",
"add-path": None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of add-path?

Copy link
Author

@ar-ms ar-ms Jul 27, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose add-path is to precise a custom function to execute, because some libraries need to be added to the system path differently than just doing sys.path.insert(1, libs/LIBRARY_NAME).

},
{
"name": "PyYAML",
"version": "3.11",
"md5": "f50e08ef0fe55178479d3a618efe21db",
"add-path": add_pyaml_path
}
]

Expand All @@ -162,11 +183,74 @@ def setup_external_libs(libs):
from boto.ec2.blockdevicemapping import BlockDeviceMapping, BlockDeviceType, EBSBlockDeviceType
from boto import ec2

import yaml

class UsageError(Exception):
pass


def process_conf_file(file_path, parser):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that the spark_ec2.py doesn't have a lot of unit tests, but it looks to me that this is a function that can be unit tested relatively easily ? Would be great if we can add a tests/test_yaml_config.py to exercise various code paths

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this is a good time to start creating unit tests.

You're right, this function is a little bit hard to test because it needs a parser object with all the options already filled. Do we want to try only when the paths fails ? Do you have some other tests in mind ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will also be good to test cases where say the mapping fails (i.e. say invalid config option name) or the mapping has multiple values etc.

"""
Load configuration file and extract arguments.
"""
# Loading the configuration file.
configuration = {}
try:
with open(file_path) as configuration_file:
configuration = yaml.safe_load(configuration_file)
except Exception as e:
print("[!] An error occured when loading the config file:\n{}".format(e), file=stderr)
sys.exit(1)

unneeded_opts = ("version", "help")

# mapppin_conf() aims to avoid maintaining a manual mapping dictionary.
# Transforms --option-like-this to option_like_this
def mapping_conf(opt):
normal_opt = str(opt).split("/")[-1]
trans_opt = normal_opt.strip("-")
trans_opt = trans_opt.replace("-", "_")
return {trans_opt: normal_opt}

map_conf = {}
[map_conf.update(mapping_conf(opt)) for opt in parser.option_list]
[map_conf.pop(unneeded_opt, None) for unneeded_opt in unneeded_opts]

# Generating path to identity file (located in ~/.ssh/)
home_dir = os.path.expanduser('~')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why we need to special case this ? Can't we just have the same semantics as passing --identity-file here as well ? i.e if the user wants to use ~/.ssh/mykey.pem cant they write identity_file: ~/.ssh/mykey.pem ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed about this with @nchammas, as ~/.ssh/ is the default location of keys files, we did this to automatically set the path. But yes, let the users precise the path directly is also good, we hesitated between letting the users precise the path and search the key in ~/.ssh/. I will change this to let the user precise directly the path.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think that keeping the same semantics as command line options in the configuration file is better ?

Like this:

--configuration-file: path
--slave: 10
...
...
...
--region: us-west-1
--copy-aws-credentials: true

By keeping the same semantic, we don't need the mapping function anymore.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By same semantics I just meant for the path. I think having the mapping function is better as it makes the config file more natural to use

key_pair_path = os.path.join(home_dir, ".ssh", configuration["identity_file"])
configuration["identity_file"] = key_pair_path

# Setting credentials to the environment to access AWS.
if "credentials" in configuration:
try:
key_id = configuration["credentials"]["aws_access_key_id"]
access_key = configuration["credentials"]["aws_secret_access_key"]
except KeyError:
pass
else:
os.environ["AWS_ACCESS_KEY_ID"] = key_id
os.environ["AWS_SECRET_ACCESS_KEY"] = access_key

# Creating the args from the values present in the configuration file.
args = []
options = set(map_conf).intersection(configuration)
for op in options:
takes_value = parser.get_option(map_conf[op]).takes_value()
# Extends args with options that takes an value, example: slaves.
if takes_value:
option_value = "{opt} {val}".format(opt=map_conf[op],
val=configuration[op])
args.extend(option_value.split())
elif configuration[op]:
# Option that doesn't takes value, like --copy-aws-credentials
# Verifying that those options are setted(True).
args.append(map_conf[op])

new_opts, _ = parser.parse_args(args)
return new_opts


# Configure and parse our command-line arguments
def parse_args():
parser = OptionParser(
Expand Down Expand Up @@ -327,13 +411,19 @@ def parse_args():
parser.add_option(
"--instance-profile-name", default=None,
help="IAM profile name to launch instances under")
parser.add_option("-c", "--conf-file", default=None,
help="Specify config file", metavar="FILE")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some more description of what the config file format is here ? A pointer to some other readme / markdown file will be fine if we dont want to make this too long. Also it'll be good to clarify precedence -- i.e. if a config option is present both on command line and in the file which one is used

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, we need to add some more description in the help section. It will be great to document this feature, can we add a pointer to the README.md file ?

If -c/--conf-file is used, only the arguments present in the configuration file are used and the options present in the command line, will be ignored. It'll nice to add this information somewhere everyone can spot it directly. Can we print a message to the console when -c/--conf-file is used, to indicate that only the arguments in the configuration file are used ? Or write it in the help section of add_option ?

When we will switch from optparse to argparse, we can easily manage config file and the other command line arguments at the same time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized we haven't merged #34 yet - Will it be fine if I merge that first ?


(opts, args) = parser.parse_args()

if len(args) != 2:
parser.print_help()
sys.exit(1)
(action, cluster_name) = args

if opts.conf_file:
opts = process_conf_file(opts.conf_file, parser)

# Boto config check
# http://boto.cloudhackers.com/en/latest/boto_config_tut.html
home_dir = os.getenv('HOME')
Expand Down