From 820bea1f9980beba5e464be3a46d41619b6576fe Mon Sep 17 00:00:00 2001 From: GitHub Action Date: Tue, 21 Jul 2020 16:55:19 +0000 Subject: [PATCH] Update documentation --- docs/common.md | 43 ++++++++++++--------- docs/hadoop.md | 32 ++++++++++------ docs/hive.md | 77 +++++++++++++++++++------------------ docs/misc.md | 95 +++++++++++++++++++++++++++++++++++----------- docs/prometheus.md | 19 ++-------- docs/spark.md | 23 ++++++----- 6 files changed, 175 insertions(+), 114 deletions(-) diff --git a/docs/common.md b/docs/common.md index 74fbd64..edc353d 100644 --- a/docs/common.md +++ b/docs/common.md @@ -2,6 +2,14 @@ Provides common utility functions +## Overview + +Function to populate nodeinfo + +Please call this method at start of node bootstrap + +## Index + * [populate_nodeinfo()](#populatenodeinfo) * [is_hadoop2_cluster()](#ishadoop2cluster) * [is_hs2_enabled()](#ishs2enabled) @@ -9,14 +17,13 @@ Provides common utility functions * [is_master_node()](#ismasternode) * [is_worker_node()](#isworkernode) - -## populate_nodeinfo() +### populate_nodeinfo() Function to populate nodeinfo Please call this method at start of node bootstrap -### Example +#### Example ```bash populate_nodeinfo @@ -24,11 +31,11 @@ populate_nodeinfo _Function has no arguments._ -## is_hadoop2_cluster() +### is_hadoop2_cluster() Function to check if the node belongs to a Hadoop2 cluster -### Example +#### Example ```bash if is_hadoop2_cluster; then @@ -38,16 +45,16 @@ fi _Function has no arguments._ -### Exit codes +#### Exit codes * **0**: If the cluster runs hadoop2 * **1**: Otherwise -## is_hs2_enabled() +### is_hs2_enabled() Function to check if a HiveServer2 is configured to run on a master node -### Example +#### Example ```bash if is_hs2_enabled; then @@ -57,16 +64,16 @@ fi _Function has no arguments._ -### Exit codes +#### Exit codes * **0**: When HiveServer2 is configured on a master node * **1**: Otherwise -## is_hs2_cluster() +### is_hs2_cluster() Function to check if a node belongs to a HiveServer2 cluster -### Example +#### Example ```bash if is_hs2_cluster; then @@ -76,16 +83,16 @@ fi _Function has no arguments._ -### Exit codes +#### Exit codes * **0**: When node belongs to a HiveServer2 cluster * **1**: Otherwise -## is_master_node() +### is_master_node() Function to check if a node is a cluster master node -### Example +#### Example ```bash if is_master_node; then @@ -95,16 +102,16 @@ fi _Function has no arguments._ -### Exit codes +#### Exit codes * **0**: When node is a cluster master node * **1**: Otherwise -## is_worker_node() +### is_worker_node() Function to check if a node is a cluster worker node -### Example +#### Example ```bash if is_worker_node; then @@ -114,7 +121,7 @@ fi _Function has no arguments._ -### Exit codes +#### Exit codes * **0**: When node is a cluster worker node * **1**: Otherwise diff --git a/docs/hadoop.md b/docs/hadoop.md index 507584a..19c5a5f 100644 --- a/docs/hadoop.md +++ b/docs/hadoop.md @@ -2,21 +2,29 @@ Provides Hadoop2 utility functions +## Overview + +Function to restart hadoop services on the cluster master + +This may be used if you're using a different version +of Java, for example + +## Index + * [restart_master_services()](#restartmasterservices) * [restart_worker_services()](#restartworkerservices) * [restart_hadoop_services()](#restarthadoopservices) * [use_java8()](#usejava8) * [wait_until_namenode_running()](#waituntilnamenoderunning) - -## restart_master_services() +### restart_master_services() Function to restart hadoop services on the cluster master This may be used if you're using a different version of Java, for example -### Example +#### Example ```bash restart_master_services @@ -24,14 +32,14 @@ restart_master_services _Function has no arguments._ -## restart_worker_services() +### restart_worker_services() Function to restart hadoop services on the cluster workers This only restarts the datanode service since the nodemanager is started after the bootstrap is run -### Example +#### Example ```bash restart_worker_services @@ -39,11 +47,11 @@ restart_worker_services _Function has no arguments._ -## restart_hadoop_services() +### restart_hadoop_services() Generic function to restart hadoop services -### Example +#### Example ```bash restart_hadoop_services @@ -51,7 +59,7 @@ restart_hadoop_services _Function has no arguments._ -## use_java8() +### use_java8() Use Java 8 for hadoop daemons and jobs @@ -61,7 +69,7 @@ to use Java 8. This is only required if your cluster: 1. is in AWS, and 2. is running Hive or Spark < 2.2 -### Example +#### Example ```bash use_java8 @@ -69,17 +77,17 @@ use_java8 _Function has no arguments._ -## wait_until_namenode_running() +### wait_until_namenode_running() Wait until namenode is out of safe mode -### Example +#### Example ```bash wait_until_namenode_running 25 5 ``` -### Arguments +#### Arguments * **$1** (int): Number of attempts function will make to get namenode out of safemode. Defaults to 50 * **$2** (int): Number of seconds each attempt will sleep for, waiting for namenode to come out of sleep mode. Defaults to 5 diff --git a/docs/hive.md b/docs/hive.md index 3dedb9d..c6f4c54 100644 --- a/docs/hive.md +++ b/docs/hive.md @@ -2,23 +2,31 @@ Provides function to install Hive Glue Catalog Sync Agent -* [install_glue_sync()](#installgluesync) +## Overview + +Installs Hive Glue Catalog Sync Agent + +Requires Hive 2.x +Currently supported only on AWS +## Index -## install_glue_sync() +* [install_glue_sync()](#installgluesync) + +### install_glue_sync() Installs Hive Glue Catalog Sync Agent Requires Hive 2.x Currently supported only on AWS -### Example +#### Example ```bash install_glue_sync us-east-1 ``` -### Arguments +#### Arguments * **$1** (string): Region for AWS Athena. Defaults to `us-east-1` @@ -26,16 +34,21 @@ install_glue_sync us-east-1 Provides functions to start/stop/restart thrift metastore server +## Overview + +Function to start thrift metastore server + +## Index + * [start_thrift_metastore()](#startthriftmetastore) * [stop_thrift_metastore()](#stopthriftmetastore) * [restart_thrift_metastore()](#restartthriftmetastore) - -## start_thrift_metastore() +### start_thrift_metastore() Function to start thrift metastore server -### Example +#### Example ```bash start_thrift_metastore @@ -43,11 +56,11 @@ start_thrift_metastore _Function has no arguments._ -## stop_thrift_metastore() +### stop_thrift_metastore() Function to stop thrift metastore server -### Example +#### Example ```bash stop_thrift_metastore @@ -55,11 +68,11 @@ stop_thrift_metastore _Function has no arguments._ -## restart_thrift_metastore() +### restart_thrift_metastore() Function to restart thrift metastore server -### Example +#### Example ```bash restart_thrift_metastore @@ -71,45 +84,35 @@ _Function has no arguments._ Provides function to install Apache Ranger client for Hive -* [install_ranger()](#installranger) - - -## install_ranger() +## Overview Install Apache Ranger client for Hive Currently supported only on AWS Requires HiveServer2 -### Example - -```bash -install_ranger -h example.host -p 6080 -r examplerepo -``` - -### Arguments -* -h string Hostname of Ranger admin. Defaults to `localhost` -* -p int Port where Ranger admin is running. Defaults to `6080` -* -r string Name of Ranger repository. Defaults to `hivedev` -* -S string Hostname of Solr admin. Defaults to `""` -* -P int Port where Solr admin is running. Defaults to `6083` # hive/hiveserver2.sh Provides functions to start/stop/restart HiveServer2 +## Overview + +Function to check if HiveServer2 is configured + +## Index + * [is_hs2_configured()](#ishs2configured) * [stop_hs2()](#stophs2) * [start_hs2()](#starths2) * [restart_hs2()](#restarths2) - -## is_hs2_configured() +### is_hs2_configured() Function to check if HiveServer2 is configured -### Example +#### Example ```bash if [[ is_hs2_configured ]]; then @@ -119,18 +122,18 @@ fi _Function has no arguments._ -### Exit codes +#### Exit codes * **0**: If HiveServer2 is configured * **1**: Otherwise -## stop_hs2() +### stop_hs2() Function to stop HiveServer2 JVM Works on both Hadoop2 and HiveServer2 clusters -### Example +#### Example ```bash stop_hs2 @@ -138,13 +141,13 @@ stop_hs2 _Function has no arguments._ -## start_hs2() +### start_hs2() Function to start HiveServer2 JVM Works on both Hadoop2 and HiveServer2 clusters -### Example +#### Example ```bash start_hs2 @@ -152,13 +155,13 @@ start_hs2 _Function has no arguments._ -## restart_hs2() +### restart_hs2() Function to restart HiveServer2 JVM Works on both Hadoop2 and HiveServer2 clusters -### Example +#### Example ```bash restart_hs2 diff --git a/docs/misc.md b/docs/misc.md index ca0ab3e..172bba6 100644 --- a/docs/misc.md +++ b/docs/misc.md @@ -2,11 +2,22 @@ Provides miscellaneous utility functions +## Overview + +Set the timezone + +This function sets the timezone on the cluster node. +The timezone to set is a mandatory parameter and must be present in /usr/share/zoneinfo +Eg: "US/Mountain", "America/Los_Angeles" etc. + +After setting the timezone, it is advised to restart engine daemons on the master and worker nodes + +## Index + * [set_timezone()](#settimezone) * [add_to_authorized_keys()](#addtoauthorizedkeys) - -## set_timezone() +### set_timezone() Set the timezone @@ -16,27 +27,27 @@ Eg: "US/Mountain", "America/Los_Angeles" etc. After setting the timezone, it is advised to restart engine daemons on the master and worker nodes -### Example +#### Example ```bash set_timezone "America/Los_Angeles" ``` -### Arguments +#### Arguments * **$1** (string): Timezone to set -## add_to_authorized_keys() +### add_to_authorized_keys() Add a public key to authorized_keys -### Example +#### Example ```bash add_to_authorized_keys "ssh-rsa xyzxyzxyzxyz...xyzxyz user@example.com" ec2-user ``` -### Arguments +#### Arguments * **$1** (string): Public key to add to authorized_keys file * **$2** (string): User for which the public key is added. Defaults to `ec2-user` @@ -45,10 +56,24 @@ add_to_authorized_keys "ssh-rsa xyzxyzxyzxyz...xyzxyz user@example.com" ec2-user Provides function to install Python virtualenv -* [install_python_venv()](#installpythonvenv) +## Overview +Install and activate a Python virtualenv + +This function activates the new virtualenv, so install +any libraries you want after calling this with "pip install" + +Alternatively you can also use a requirements file. For example +to use a requirements file stored in S3 or Azure Blob Store, run + +/usr/lib/hadoop2/bin/hadoop dfs -get {s3|wasb}://path/to/requirements/file /tmp/requirements.txt +pip install -r /tmp/requirements.txt + +## Index -## install_python_venv() +* [install_python_venv()](#installpythonvenv) + +### install_python_venv() Install and activate a Python virtualenv @@ -58,21 +83,21 @@ any libraries you want after calling this with "pip install" Alternatively you can also use a requirements file. For example to use a requirements file stored in S3 or Azure Blob Store, run - /usr/lib/hadoop2/bin/hadoop dfs -get {s3|wasb}://path/to/requirements/file /tmp/requirements.txt - pip install -r /tmp/requirements.txt +/usr/lib/hadoop2/bin/hadoop dfs -get {s3|wasb}://path/to/requirements/file /tmp/requirements.txt +pip install -r /tmp/requirements.txt -### Example +#### Example ```bash install_python_env 3.6 /path/to/virtualenv/py36 ``` -### Arguments +#### Arguments * **$1** (float): Version of Python to use. Defaults to 3.6 * **$2** (string): Location to create virtualenv in. Defaults to /usr/lib/virtualenv/py36 -### Exit codes +#### Exit codes * **0**: Python virtualenv was created and activated * **1**: Python executable for virtualenv couldn't be found or installed @@ -81,10 +106,26 @@ install_python_env 3.6 /path/to/virtualenv/py36 Provides function to mount a NFS volume -* [mount_nfs_volume()](#mountnfsvolume) +## Overview +Mounts an NFS volume on master and worker nodes + +Instructions for AWS EFS mount: +1. After creating the EFS file system, create a security group +2. Create an inbound traffic rule for this security group that allows traffic on +port 2049 (NFS) from this security group as described here: +https://docs.aws.amazon.com/efs/latest/ug/accessing-fs-create-security-groups.html +3. Add this security group as a persistent security group for the cluster from which +you want to mount the EFS store, as described here: +http://docs.qubole.com/en/latest/admin-guide/how-to-topics/persistent-security-group.html + +TODO: add instructions for Azure file share -## mount_nfs_volume() +## Index + +* [mount_nfs_volume()](#mountnfsvolume) + +### mount_nfs_volume() Mounts an NFS volume on master and worker nodes @@ -99,13 +140,13 @@ http://docs.qubole.com/en/latest/admin-guide/how-to-topics/persistent-security-g TODO: add instructions for Azure file share -### Example +#### Example ```bash mount_nfs_volume "example.nfs.share:/" /mnt/efs ``` -### Arguments +#### Arguments * **$1** (string): Path to NFS share * **$2** (string): Mount point to use @@ -114,29 +155,37 @@ mount_nfs_volume "example.nfs.share:/" /mnt/efs Provides function to configure AWS CLI -* [configure_awscli()](#configureawscli) +## Overview + +Configure AWS CLI +A credentials file containing the AWS Access Key and the AWS Secret Key +separated by a space, comma, tab or newline must be provided + +## Index + +* [configure_awscli()](#configureawscli) -## configure_awscli() +### configure_awscli() Configure AWS CLI A credentials file containing the AWS Access Key and the AWS Secret Key separated by a space, comma, tab or newline must be provided -### Example +#### Example ```bash configure_awscli -p exampleprofile -r us-east-1 -c /path/to/credentials/file ``` -### Arguments +#### Arguments * -p string Name of the profile. Defaults to `default` * -r string AWS region. Defaults to `us-east-1` * -c string Path to credentials file -### Exit codes +#### Exit codes * **0**: AWS CLI is configured * **1**: AWS CLI or credentials file not found diff --git a/docs/prometheus.md b/docs/prometheus.md index 8320b7b..9b065ed 100644 --- a/docs/prometheus.md +++ b/docs/prometheus.md @@ -2,24 +2,13 @@ Provides functions to configure Prometheus -* [configure_prometheus_ram_on_master()](#configureprometheusramonmaster) - - -## configure_prometheus_ram_on_master() +## Overview Ability to override the memory usage of prometheus daemon on master. Example : 500M - function requires one argument to be passed. - Argument must specify the ram to be allocated to the prometheus service from master node ram. - Input should be an integer. All the values are assumed in MB. - -### Example - -```bash - configure_prometheus_ram_on_master 600 -``` +function requires one argument to be passed. +Argument must specify the ram to be allocated to the prometheus service from master node ram. +Input should be an integer. All the values are assumed in MB. -### Arguments -* **$1** (integer): Prometheus ram to be substituted in MB. diff --git a/docs/spark.md b/docs/spark.md index 65934b8..988ac1b 100644 --- a/docs/spark.md +++ b/docs/spark.md @@ -2,16 +2,21 @@ Provides functions to start/stop/restart Spark History Server +## Overview + +Function to start Spark History Server + +## Index + * [start_history_server()](#starthistoryserver) * [stop_history_server()](#stophistoryserver) * [restart_history_server()](#restarthistoryserver) - -## start_history_server() +### start_history_server() Function to start Spark History Server -### Example +#### Example ```bash start_history_server @@ -19,16 +24,16 @@ start_history_server _Function has no arguments._ -### Exit codes +#### Exit codes * **0**: When Spark History Server is started * **1**: Otherwise -## stop_history_server() +### stop_history_server() Function to stop Spark History Server -### Example +#### Example ```bash stop_history_server @@ -36,16 +41,16 @@ stop_history_server _Function has no arguments._ -### Exit codes +#### Exit codes * **0**: When Spark History Server is stopped * **1**: Otherwise -## restart_history_server() +### restart_history_server() Function to restart Spark History Server -### Example +#### Example ```bash restart_history_server