Skip to content

Commit

Permalink
Services update (#60)
Browse files Browse the repository at this point in the history
* Updated services to use Managed Prometheus. I made slight changes to install file to remove conflicts and unused code

* Update README.md

* working on deployment script

* removing deploy script I will aadd later

* Update README.md

saving

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* more descriptive naming

* fix bug not copying config files

* fix Alma bugs not allowing service to start

---------

Co-authored-by: Ubuntu <rafsalas@ub22h3e8f000003.dqjt2vfkzmou3mpiz4ibgqqi1c.ax.internal.cloudapp.net>
Co-authored-by: almalinux Cloud User <hpcuser@hbv22ec6c000000.ypfoet0fe2mefochwlryckov1h.dxbx.internal.cloudapp.net>
  • Loading branch information
3 people authored May 31, 2023
1 parent ea3427b commit 26f88e2
Show file tree
Hide file tree
Showing 8 changed files with 142 additions and 146 deletions.
131 changes: 69 additions & 62 deletions linux_service/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,74 +3,81 @@ Moneo as a Linux Service
Description
-----
Setting up Moneo exporters as Linux service will allow for easy management and deployment of exporters.
This guide will walk you through how to set up Linux services for Moneo exporters.

Prerequisites
-----
If using [Azure's Ubuntu HPC AI VM image](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/microsoft-dsvm.ubuntu-hpc?tab=overview) all dependencies will already be installed. Dependencies can be installed on workers using this script [Install Script](../src/worker/install/install.sh).

Bellow are the dependencies needed (installed by the the install script):
1. Python Packages:
- prometheus-client==0.16.0
- psutil==5.9.4
- filelock==3.10.0
2. DCGM 3.1.6

Instructions without Publisher service
-----
1. Install dependencies using install script (not needed if dependencies already installed)
- ```sudo ../src/worker/install/install.sh```

2. Run the [configure_service.sh](./configure_service.sh) with the full Moneo path as an argument
- ```sudo ./configure_service.sh <Moneo_PATH>```
- If an argument isn't provide it will use the default directory: i.e. /opt/azurehpc/tools/Moneo

Note: The configure script will modify the [email protected] file to point to the exporter scripts.
Three launch methods provided:
1. The basic launch method launches the exporters on the compute node. It is up to the user to either:
- Use Moneo CLI to launch the manager Grafana and Prometheus containers on a head node.
- Or use you own method to scrape from the exporter ports ("nvidia_exporter": 8000 "net_exporter": 8001 "node_exporter": 8002).
2. Launch exporters and an [Azure Monitor](../docs/AzureMonitorAgent.md) publisher.
- Before launch you must modify the "azure_monitor_agent_config" section of [publisher_config](../src/worker/publisher/config/publisher_config.json) file with the Azure Monitor workspace connection string.
3. Azure Managed Grafana/Prometheus.
- This will require you to set up Managed Prometheus and Managed Grafana
- See prereqs for [Managed Prometheus](../docs/ManagedPrometheusAgent.md)
- Once Managed Prometheus is set up you can link it to a Grafana Dashboard.
- See [Azure Managed Grafana overview](https://learn.microsoft.com/en-us/azure/managed-grafana/overview) for info on setting up Grafana.

3. To start the services run the following commands:
- With start script:
``` sudo ./start_moneo_services.sh```
- Manually:
```
sudo systemctl start moneo@node_exporter.service
sudo systemctl start moneo@net_exporter.service
sudo systemctl start moneo@nvidia_exporter.service
```
4. To stop the services run:
- With stop script:
``` sudo ./stop_moneo_services.sh ```
- Manually:
```
sudo systemctl stop moneo@node_exporter.service
sudo systemctl stop moneo@net_exporter.service
sudo systemctl stop moneo@nvidia_exporter.service
```
5. To run these commands on multiple VMs in parallel you can use a tool like parallel-ssh:
- ```parallel-ssh -i -t 0 -h hostfile "<command>"```
This guide will walk you through how to set up Linux services for Moneo exporters.

Instructions for Moneo services with Publisher service
Prerequisites
-----
The publisher service is experimental and requires additional setup to use.
1. Modify publisher config files
- Moneo/src/worker/install/config/geneva_config.json
- Moneo/src/worker/publisher/config/publisher_config.json
If using [Azure's Ubuntu HPC AI VM image](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/microsoft-dsvm.ubuntu-hpc?tab=overview) all dependencies will already be installed. Additional dependencies are installed as part of this guide. Please see [Install Script](../src/worker/install/install.sh) for details on what Python and Ubuntu packages are installed. DCGM 3.1.6 and higher is required for GPU nodes. This will also be checked/installed via the install script as part of this guide.

2. Install dependencies using install script (not needed if dependencies already installed)
- Include Geneva agent install: ```sudo ../src/worker/install/install.sh geneva```
- Include Azure monitor install: ```sudo ../src/worker/install/install.sh azure_monitor```

3. Run the [configure_service.sh](./configure_service.sh) with the full Moneo path as an argument
- ```sudo ./configure_service.sh <Moneo_PATH> <publisher type>```
- Publisher types: "geneva" and "azure_monitor"

4. To start the services run the following commands based on the publisher type:
- ```sudo ./start_moneo_services.sh geneva <moneo path>```
- ```sudo ./start_moneo_services.sh azure_monitor```
5. To stop the services run:
- ```sudo ./stop_moneo_services.sh ```
6. To run these commands on multiple VMs in parallel you can use a tool like parallel-ssh:
- ```parallel-ssh -i -t 0 -h hostfile "<command>"```
Below are the prereqs needed:
- PSSH (This can be interchanged with other tools that can do distributed commands. The instructions will use PSSH for Ubuntu)
- AlmaLinux 8.7
- Ubuntu 20.04/22.04
- Moneo cloned/installed in the same directory on all compute nodes.
- A host file with the target compute nodes.


Instructions for Configuring, Installing and Launching Moneo services
-----
### Configuration and Installation ###
Configuration/Installation is only required once. Afte that is complete the Linux services can be started and stopped as desired.
1. Configuration and installation of the Linux service is done with the following command:
```parallel-ssh -i -t 0 -h hostfile "sudo <Full Path to Moneo>/linux_service/configure_service.sh <Full Path to Moneo>"```
- If You will only be launching the exporters without AZ monitor or Managed Prometheus Continue to the Launch Services section else continue.
2. For Azure Monitor or Managed Prometheus methods if you have not yet modified the configuration files reference the following:
- For Azure Managed Prometheus:
- modify [prom_sidecar_config.json](../src/worker/publisher/config) and copy the file to the compute nodes.
- ```parallel-scp -h hostfile <Full Path to Moneo>/src/worker/publisher/config/prom_sidecar_config.json <Full Path to Moneo>/src/worker/publisher/config```
- Lastly check that that the managed user identity used to set up Managed Prometheus (Azure role assignments) is assigned to your VMSS.
- For Azure Monitor:
- modify the connection string of "azure_monitor_agent_config" section and copy the file to the compute nodes.
- ```parallel-scp -h hostfile <Full Path to Moneo>/src/worker/publisher/config/publisher_config.json <Full Path to Moneo>/src/worker/publisher/config```
### Launch Services ###
The [start_moneo_services.sh ](./start_moneo_services.sh) script is used to start the Linux services once configuration/installation is complete.
The script takes 3 arguments:
1. Full directory path of Moneo
2. Start with Managed Prometheus (true/false)
3. Start with Azure Monitor (true/false)
An example command would look like (Exporters only): /home/<user>/Moneo/linux_service/start_moneo_services.sh /home/<user>/Moneo false false

#### Exporters only Launch ####
```parallel-ssh -i -t 0 -h hostfile "sudo <Full Path to Moneo>/linux_service/start_moneo_services.sh <Full Path to Moneo> false false"```
#### Exporters with Azure Monitor ####
```parallel-ssh -i -t 0 -h hostfile "sudo <Full Path to Moneo>/linux_service/start_moneo_services.sh <Full Path to Moneo> false true"```
#### Exporters with Managed Prometheus ####
```parallel-ssh -i -t 0 -h hostfile "sudo <Full Path to Moneo>/linux_service/start_moneo_services.sh <Full Path to Moneo> true false"```

### Stop Services ###
Stopping services is the same command for all methods.
```parallel-ssh -i -t 0 -h hostfile "sudo <Full Path to Moneo>/linux_service/stop_moneo_services.sh"```

### Recap ###
Assuming configuration files have been updated and user managed ID applied if necessary (Managed Prometheus) reference these commands for the work flow:
- Configuration/Install:
```parallel-ssh -i -t 0 -h hostfile "sudo <Full Path to Moneo>/linux_service/configure_service.sh <Full Path to Moneo>"```
- Extra Configure step for AZ Monitor and/or Managed Prometheus
```parallel-scp -h hostfile <Full Path to Moneo>/src/worker/publisher/config/<Respective config file> <Full Path to Moneo>/src/worker/publisher/config```
- Start
```parallel-ssh -i -t 0 -h hostfile "sudo <Full Path to Moneo>/linux_service/start_moneo_services.sh <Full Path to Moneo> <Managed Prom true/false> <Az Monitor true/false>"```
- Stop
```parallel-ssh -i -t 0 -h hostfile "sudo <Full Path to Moneo>/linux_service/stop_moneo_services.sh"```

Note: This guide uses PSSH to distribute the commands. Any tool that is similar to PSSH can be used such as PDSH. The scipts can also be called from job schedulers or individually.


Updating job ID
-----
Expand All @@ -80,5 +87,5 @@ To update job name/ID we can use the [job ID update script](../src/worker/jobIdU

or see [Update Job Id With Moneo CLI](../docs/JobFiltering.md)

Note: use parallel-ssh to distribute this command to a cluster (i.e. step 5 of the instructions)
Note: use parallel-ssh to distribute this command to a cluster

27 changes: 5 additions & 22 deletions linux_service/configure_service.sh
Original file line number Diff line number Diff line change
@@ -1,37 +1,20 @@
#!/bin/bash

MONEO_PATH=$1
PUBLISHER=$2

if [[ -z "$MONEO_PATH" ]];
then
MONEO_PATH=/opt/azurehpc/tools/Moneo
echo 'default Moneo path used'
fi

if [[ ! -d "$MONEO_PATH" ]];
then
echo "Error: Moneo path $MONEO_PATH does not exist. Ensure you are using the correct arguments
(i.e. ./configure_service.sh <Moneo_path>, or ./configure_service.sh <Moneo_path> <publisher-type>). Exiting."
(i.e. ./configure_service.sh <Moneo Full Path>, or ./configure_service.sh <Moneo Full Path> true/false). Exiting."
exit 1
fi

if [[ -n $PUBLISHER ]];
then
if [ "$PUBLISHER" != "geneva" ] && [ "$PUBLISHER" != "azure_monitor" ];
then
echo "Error: $PUBLISHER is not an acceptable value for publisher type. Options are 'geneva' or 'azure_monitor'. Exiting."
exit 1

fi
fi

# replace the moneo path place holder with actaul moneo path and Move service file to systemd directory
sed "s#<Moneo_Path>#$MONEO_PATH#g" $MONEO_PATH/linux_service/[email protected] > /etc/systemd/system/[email protected]

if [[ -n $PUBLISHER ]];
then
sed "s#<Moneo_Path>#$MONEO_PATH#g; s#<pub-type>#$PUBLISHER#g;" $MONEO_PATH/linux_service/moneo_publisher.service > /etc/systemd/system/moneo_publisher.service
fi
echo "configuring publisher service"
sed "s#<Moneo_Path>#$MONEO_PATH#g;" $MONEO_PATH/linux_service/moneo_publisher.service > /etc/systemd/system/moneo_publisher.service

$MONEO_PATH/src/worker/install/install.sh azure_monitor

systemctl daemon-reload
1 change: 0 additions & 1 deletion linux_service/[email protected]
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@ After=network.target
[Service]
Type=simple
Restart=no
ExecStartPre=<Moneo_Path>/linux_service/moneo_prestart.sh <Moneo_Path> %i.py
ExecStart=/usr/bin/python3 /tmp/moneo-worker/exporters/%i.py
User=root

Expand Down
34 changes: 2 additions & 32 deletions linux_service/moneo_prestart.sh
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
#!/bin/bash

MONEO_PATH=$1
EXE_TYPE=$2

# check MONEO_PATH variable is set if not set to default
if [[ -z "$MONEO_PATH" ]];
Expand All @@ -11,12 +10,6 @@ then
fi
echo "Moneo path=$MONEO_PATH"

if [[ -z "$EXE_TYPE" ]];
then
echo 'Error: No executable passed in. Exiting prestart script.'
exit 1
fi

# check that the path provided exists
if [[ ! -d "$MONEO_PATH" ]];
then
Expand All @@ -25,29 +18,6 @@ then
fi

# check/create the working director exists
mkdir -p /tmp/moneo-worker/exporters
mkdir -p /tmp/moneo-worker/publisher
mkdir -p /tmp/moneo-worker

# copy exporters or publisher
if [[ "metrics_publisher.py" == "$EXE_TYPE" ]];
then
if [[ ! -e "$MONEO_PATH/src/worker/publisher/$EXE_TYPE" ]];
then
echo "$MONEO_PATH/src/worker/publisher/$EXE_TYPE Does not exist"
exit 1
fi
cp -rf $MONEO_PATH/src/worker/publisher/* /tmp/moneo-worker/publisher/
else
if [[ ! -e "$MONEO_PATH/src/worker/exporters/$EXE_TYPE" ]];
then
echo "Error: $MONEO_PATH/src/worker/exporters/$EXE_TYPE Does not exist. Exiting prestart script"
exit 1
fi
cp $MONEO_PATH/src/worker/exporters/$EXE_TYPE /tmp/moneo-worker/exporters/
fi

# needed for node exporter
if [[ "node_exporter.py" == "$EXE_TYPE" ]];
then
cp $MONEO_PATH/src/worker/exporters/base_exporter.py /tmp/moneo-worker/exporters/
fi
cp -rf $MONEO_PATH/src/worker/* /tmp/moneo-worker/
3 changes: 1 addition & 2 deletions linux_service/moneo_publisher.service
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,7 @@ After=network.target
[Service]
Type=simple
Restart=no
ExecStartPre=<Moneo_Path>/linux_service/moneo_prestart.sh <Moneo_Path> metrics_publisher.py
ExecStart=/usr/bin/python3 /tmp/moneo-worker/publisher/metrics_publisher.py <pub-type>
ExecStart=/usr/bin/python3 /tmp/moneo-worker/publisher/metrics_publisher.py azure_monitor
User=root


Expand Down
77 changes: 57 additions & 20 deletions linux_service/start_moneo_services.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,51 @@
#!/bin/bash
PUBLISHER=$1
MONEO_PATH=$2
WITH_PUBLISHER=$3
WITH_MANAGED_PROM=$2
MONEO_PATH=$1

if [[ ! -d "$MONEO_PATH" ]];
then
echo "Error: Moneo path does not exist. Please install Moneo and/or provide the full path to this script. Exiting start script"
exit 1
fi



procs=("net_exporter" "node_exporter")

if lspci | grep -iq NVIDIA ; then
procs+=("nvidia_exporter")
fi

if [[ -n $WITH_PUBLISHER && $WITH_PUBLISHER = true ]]; then
procs+=("metrics_publisher")
fi

function proc_check(){
CHECK=`ps -eaf | grep /tmp/moneo-worker/`
for substring in "${procs[@]}"; do
if [[ $CHECK == *"$substring"* ]]; then
echo "$substring service started as expected."
else
echo "Some services failed to start"
exit 1
fi
done
if [[ -n $WITH_MANAGED_PROM && $WITH_MANAGED_PROM = true ]];
then
if [[ $(docker ps -a | grep prometheus_sidecar) && $(docker ps -a | grep prometheus) ]] ; then
echo "Prometheus and Prometheus_side_car docker containers running."
else
echo "Prometheus and/or Prometheus_side_car failed to start. Please ensure you have the proper user managed identity assigned to your VMSS/VM. (moneo-umi)"
exit 1
fi
fi
echo "All Services Running"
exit 0
}

$MONEO_PATH/linux_service/moneo_prestart.sh $MONEO_PATH


systemctl enable moneo@node_exporter.service
systemctl enable moneo@net_exporter.service
Expand All @@ -10,22 +55,14 @@ systemctl start moneo@node_exporter.service
systemctl start moneo@net_exporter.service
systemctl start moneo@nvidia_exporter.service

if [[ ! -z "$PUBLISHER" ]];
then
if [ "$PUBLISHER" = "geneva" ] && [ -d $MONEO_PATH ];
then
#starts Geneva agent
$MONEO_PATH/src/worker/start_geneva.sh cert $MONEO_PATH/src/worker/publisher/config
sleep 5 # wait a bit for the exporters to start
systemctl enable moneo_publisher.service
systemctl start moneo_publisher.service
elif [ "$PUBLISHER" = "azure_monitor" ];
then
sleep 5 # wait a bit for the exporters to start
systemctl enable moneo_publisher.service
systemctl start moneo_publisher.service
else
echo "Either PUBLISHER OR MONEO_PATH unrecognized. PUBLISHER can be geneva or azure_monitor. If publisher is geneva MONEO_PATH must be defined."
echo "Some services may have started use the stop_moneo_services script to perform a clean stop"
fi
if [[ -n $WITH_MANAGED_PROM && $WITH_MANAGED_PROM = true ]]; then
$MONEO_PATH/src/worker/start_managed_prometheus.sh
fi

if [[ -n $WITH_PUBLISHER && $WITH_PUBLISHER = true ]]; then
sleep 5 # wait a bit for the exporters to start
systemctl enable moneo_publisher.service
systemctl start moneo_publisher.service
fi

proc_check
12 changes: 7 additions & 5 deletions linux_service/stop_moneo_services.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,10 @@ systemctl disable moneo@net_exporter.service
systemctl disable moneo@nvidia_exporter.service
systemctl disable moneo_publisher.service

if [[ $(docker ps -a | grep genevamdmagent) ]]; then
echo "Stopping Geneva Metrics Extension(MA) container"
docker stop genevamdmagent
docker rm genevamdmagent
fi
if [[ $(docker ps -a | grep prometheus_sidecar) || $(docker ps -a | grep prometheus) ]]; then
echo "Stopping Prometheus containers"
docker stop prometheus_sidecar
docker rm prometheus_sidecar
docker stop prometheus
docker rm prometheus
fi
3 changes: 1 addition & 2 deletions src/worker/install/install.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
#!/bin/bash
arch="nvidia"

PUBLISHER_INSTALL=$1
MDM_DOCKER_VERSION=2.2023.316.006-5d91fa-20230316t1622
Expand All @@ -14,7 +13,7 @@ else
source $(dirname "${BASH_SOURCE[0]}")/common.sh
fi

python3 -m pip uninstall opentelemetry-sdk azure-monitor-opentelemetry opentelemetry-exporter-otlp -y
python3 -m pip uninstall opentelemetry-sdk azure-monitor-opentelemetry opentelemetry-exporter-otlp opentelemetry-exporter-otlp-proto-grpc opentelemetry-exporter-otlp-proto-http -y

if [ -n "$PUBLISHER_INSTALL" ];
then
Expand Down

0 comments on commit 26f88e2

Please sign in to comment.