Skip to content
This repository has been archived by the owner on Nov 23, 2017. It is now read-only.

Config file devel #41

Open
wants to merge 5 commits into
base: branch-1.6
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions keys/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
This folder contains the identity files.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So users have to copy/move their SSH keys to this folder for spark-ec2 to be able to use them?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes spark-ec2 will search in keys/ directory

Copy link
Contributor

@nchammas nchammas Jul 27, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't/shouldn't we have spark-ec2 search in the default ~/.ssh/ (which is my preference), or in a user-specifiable directory?

Seems a bit off that people should have to copy or move identity files around just for spark-ec2.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, we can let the user precise an absolute path, if the file doesn't exist, we look in ~/.ssh/ ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, my preference is to just use the default of ~/.ssh/ and remove this option, but what you proposed works too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe to search for key pairs in ~/.ssh/ is the best solution to avoid conflict with paths.

94 changes: 92 additions & 2 deletions spark_ec2.py
Original file line number Diff line number Diff line change
Expand Up @@ -144,15 +144,36 @@ def setup_external_libs(libs):
tar.close()
os.remove(tgz_file_path)
print(" - Finished downloading {lib}.".format(lib=lib["name"]))
sys.path.insert(1, lib_dir)

if lib["add-path"]:
lib["add-path"](lib_dir)
else:
sys.path.insert(1, lib_dir)


def add_pyaml_path(lib_dir):
lib_py2 = os.path.join(lib_dir, "lib")
if os.path.exists(lib_py2) and sys.version < "3":
sys.path.insert(1, lib_py2)

lib_py3 = os.path.join(lib_dir, "lib3")
if os.path.exists(lib_py3) and sys.version >= "3":
sys.path.insert(1, lib_py3)


# Only PyPI libraries are supported.
external_libs = [
{
"name": "boto",
"version": "2.34.0",
"md5": "5556223d2d0cc4d06dd4829e671dcecd"
"md5": "5556223d2d0cc4d06dd4829e671dcecd",
"add-path": None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of add-path?

Copy link
Author

@ar-ms ar-ms Jul 27, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose add-path is to precise a custom function to execute, because some libraries need to be added to the system path differently than just doing sys.path.insert(1, libs/LIBRARY_NAME).

},
{
"name": "PyYAML",
"version": "3.11",
"md5": "f50e08ef0fe55178479d3a618efe21db",
"add-path": add_pyaml_path
}
]

Expand All @@ -162,11 +183,74 @@ def setup_external_libs(libs):
from boto.ec2.blockdevicemapping import BlockDeviceMapping, BlockDeviceType, EBSBlockDeviceType
from boto import ec2

import yaml

class UsageError(Exception):
pass


def process_conf_file(file_path, parser):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that the spark_ec2.py doesn't have a lot of unit tests, but it looks to me that this is a function that can be unit tested relatively easily ? Would be great if we can add a tests/test_yaml_config.py to exercise various code paths

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this is a good time to start creating unit tests.

You're right, this function is a little bit hard to test because it needs a parser object with all the options already filled. Do we want to try only when the paths fails ? Do you have some other tests in mind ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will also be good to test cases where say the mapping fails (i.e. say invalid config option name) or the mapping has multiple values etc.

"""
Load configuration file and extract arguments.
"""
# Loading the configuration file.
configuration = {}
try:
with open(file_path) as configuration_file:
configuration = yaml.safe_load(configuration_file)
except Exception as e:
print("[!] An error occured when loading the config file:\n{}".format(e), file=stderr)
sys.exit(1)

unneeded_opts = ("version", "help")

# mapppin_conf() aims to avoid maintaining a manual mapping dictionary.
# Transforms --option-like-this to option_like_this
def mapping_conf(opt):
normal_opt = str(opt).split("/")[-1]
trans_opt = normal_opt.strip("-")
trans_opt = trans_opt.replace("-", "_")
return {trans_opt: normal_opt}

map_conf = {}
[map_conf.update(mapping_conf(opt)) for opt in parser.option_list]
[map_conf.pop(unneeded_opt, None) for unneeded_opt in unneeded_opts]

# Path to identity file (located in keys folder under current location of spark-ec2 file)
configuration["identity_file"] = os.path.join(os.path.dirname(os.path.realpath(__file__)),
"keys",
configuration["identity_file"])

# Setting credentials to the environment to access AWS.
if "credentials" in configuration:
try:
key_id = configuration["credentials"]["aws_access_key_id"]
access_key = configuration["credentials"]["aws_secret_access_key"]
except KeyError:
pass
else:
os.environ["AWS_ACCESS_KEY_ID"] = key_id
os.environ["AWS_SECRET_ACCESS_KEY"] = access_key

# Creating the args from the values present in the configuration file.
args = []
options = set(map_conf).intersection(configuration)
for op in options:
takes_value = parser.get_option(map_conf[op]).takes_value()
# Extends args with options that takes an value, example: slaves.
if takes_value:
option_value = "{opt} {val}".format(opt=map_conf[op],
val=configuration[op])
args.extend(option_value.split())
elif configuration[op]:
# Option that doesn't takes value, like --copy-aws-credentials
# Verifying that those options are setted(True).
args.append(map_conf[op])

new_opts, _ = parser.parse_args(args)
return new_opts


# Configure and parse our command-line arguments
def parse_args():
parser = OptionParser(
Expand Down Expand Up @@ -327,13 +411,19 @@ def parse_args():
parser.add_option(
"--instance-profile-name", default=None,
help="IAM profile name to launch instances under")
parser.add_option("-c", "--conf-file", default=None,
help="Specify config file", metavar="FILE")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some more description of what the config file format is here ? A pointer to some other readme / markdown file will be fine if we dont want to make this too long. Also it'll be good to clarify precedence -- i.e. if a config option is present both on command line and in the file which one is used

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, we need to add some more description in the help section. It will be great to document this feature, can we add a pointer to the README.md file ?

If -c/--conf-file is used, only the arguments present in the configuration file are used and the options present in the command line, will be ignored. It'll nice to add this information somewhere everyone can spot it directly. Can we print a message to the console when -c/--conf-file is used, to indicate that only the arguments in the configuration file are used ? Or write it in the help section of add_option ?

When we will switch from optparse to argparse, we can easily manage config file and the other command line arguments at the same time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized we haven't merged #34 yet - Will it be fine if I merge that first ?


(opts, args) = parser.parse_args()

if len(args) != 2:
parser.print_help()
sys.exit(1)
(action, cluster_name) = args

if opts.conf_file:
opts = process_conf_file(opts.conf_file, parser)

# Boto config check
# http://boto.cloudhackers.com/en/latest/boto_config_tut.html
home_dir = os.getenv('HOME')
Expand Down