Skip to content

Commit

Permalink
Update documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
actions-user committed Mar 18, 2020
1 parent 2a22a2f commit ed11bd0
Show file tree
Hide file tree
Showing 7 changed files with 705 additions and 17 deletions.
25 changes: 8 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# bootstrap-functions
This repository holds common functions that can be used in qubole node bootstraps
This repository holds common functions that can be used in Qubole node bootstraps

## How to use

Expand All @@ -12,22 +12,13 @@ mount_nfs fs-7abd2444.efs.us-east-1.amazonaws.com:/ /mnt/efs
```

## Available functions
The following functions are available at present
* [configure_awscli](misc/awscli.sh#L11) - Configure AWS CLI
* [install_python_venv](misc/python_venv.sh#L17) - Install and activate a Python virtualenv
* [install_ranger](hive/ranger-client.sh#L13) - Install Apache Ranger client for Hive
* [mount_nfs_volume](misc/mount_nfs.sh#L21) - Mounts an NFS volume on master and worker nodes
* [restart_hs2](hive/hiveserver2.sh#L30) - Restart HiveServer2 JVM - works on both Hadoop2 and HiveServer2 cluster
* [set_timezone](misc/util.sh#L14) - Set the timezone
* [add_to_authorized_keys](misc/util.sh#L38) - Add public key to authorized_keys
* [install_glue_sync](hive/glue-sync.sh#L11) - Installs Hive Glue Catalog Sync Agent
* [start_history_server](spark/util.sh#L8) - Start Spark History Server
* [stop_history_server](spark/util.sh#L20) - Stop Spark History Server
* [restart_history_server](spark/util.sh#L32) - Restart Spark History Server
* [restart_master_services](hadoop/util.sh#L13) - Restart hadoop services on the cluster master
* [restart_worker_services](hadoop/util.sh#L43) - Restart hadoop services on cluster workers
* [use_java8](hadoop/util.sh#L61) - Use Java 8 for hadoop daemons and jobs
* [wait_until_namenode_running](hadoop/util.sh#L82) - Wait until namenode is out of safe mode.
The following set of functions are available at present:
* [spark](docs/spark.md)
* [shdoc](docs/shdoc.md)
* [misc](docs/misc.md)
* [hive](docs/hive.md)
* [hadoop](docs/hadoop.md)
* [common](docs/common.md)

## Contributing
Please raise a pull request for any modifications or additions you would like to make. There may be a delay between when you want to start using a method and when it might be available via Qubole's AMI. To work around this, it is recommended to put a placeholder `source` line in your bootstrap script. For example
Expand Down
121 changes: 121 additions & 0 deletions docs/common.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
# common/utils.sh

Provides common utility functions

* [populate_nodeinfo()](#populatenodeinfo)
* [is_hadoop2_cluster()](#ishadoop2cluster)
* [is_hs2_enabled()](#ishs2enabled)
* [is_hs2_cluster()](#ishs2cluster)
* [is_master_node()](#ismasternode)
* [is_worker_node()](#isworkernode)


## populate_nodeinfo()

Function to populate nodeinfo

Please call this method at start of node bootstrap

### Example

```bash
populate_nodeinfo
```

_Function has no arguments._

## is_hadoop2_cluster()

Function to check if the node belongs to a Hadoop2 cluster

### Example

```bash
if is_hadoop2_cluster; then
# do something here
fi
```

_Function has no arguments._

### Exit codes

* **0**: If the cluster runs hadoop2
* **1**: Otherwise

## is_hs2_enabled()

Function to check if a HiveServer2 is configured to run on a master node

### Example

```bash
if is_hs2_enabled; then
# do something here
fi
```

_Function has no arguments._

### Exit codes

* **0**: When HiveServer2 is configured on a master node
* **1**: Otherwise

## is_hs2_cluster()

Function to check if a node belongs to a HiveServer2 cluster

### Example

```bash
if is_hs2_cluster; then
# do something here
fi
```

_Function has no arguments._

### Exit codes

* **0**: When node belongs to a HiveServer2 cluster
* **1**: Otherwise

## is_master_node()

Function to check if a node is a cluster master node

### Example

```bash
if is_master_node; then
# do something here
fi
```

_Function has no arguments._

### Exit codes

* **0**: When node is a cluster master node
* **1**: Otherwise

## is_worker_node()

Function to check if a node is a cluster worker node

### Example

```bash
if is_worker_node; then
# do something here
fi
```

_Function has no arguments._

### Exit codes

* **0**: When node is a cluster worker node
* **1**: Otherwise

86 changes: 86 additions & 0 deletions docs/hadoop.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# hadoop/util.sh

Provides Hadoop2 utility functions

* [restart_master_services()](#restartmasterservices)
* [restart_worker_services()](#restartworkerservices)
* [restart_hadoop_services()](#restarthadoopservices)
* [use_java8()](#usejava8)
* [wait_until_namenode_running()](#waituntilnamenoderunning)


## restart_master_services()

Function to restart hadoop services on the cluster master

This may be used if you're using a different version
of Java, for example

### Example

```bash
restart_master_services
```

_Function has no arguments._

## restart_worker_services()

Function to restart hadoop services on the cluster workers

This only restarts the datanode service since the
nodemanager is started after the bootstrap is run

### Example

```bash
restart_worker_services
```

_Function has no arguments._

## restart_hadoop_services()

Generic function to restart hadoop services

### Example

```bash
restart_hadoop_services
```

_Function has no arguments._

## use_java8()

Use Java 8 for hadoop daemons and jobs

By default, the hadoop daemons and jobs on Qubole
clusters run on Java 7. Use this function if you would like
to use Java 8. This is only required if your cluster:
1. is in AWS, and
2. is running Hive or Spark < 2.2

### Example

```bash
use_java8
```

_Function has no arguments._

## wait_until_namenode_running()

Wait until namenode is out of safe mode

### Example

```bash
wait_until_namenode_running 25 5
```

### Arguments

* **$1** (int): Number of attempts function will make to get namenode out of safemode. Defaults to 50
* **$2** (int): Number of seconds each attempt will sleep for, waiting for namenode to come out of sleep mode. Defaults to 5

Loading

0 comments on commit ed11bd0

Please sign in to comment.