Skip to content

Commit

Permalink
Improving Jobscripts uploader (#7)
Browse files Browse the repository at this point in the history
* Slurm spool directory path is now configurable
* Adding logging and a verbose mode
* Configuration script path is now a CLI parameter
  • Loading branch information
guilbaults authored Nov 7, 2022
1 parent 2cb6f43 commit 3a7df70
Show file tree
Hide file tree
Showing 6 changed files with 88 additions and 52 deletions.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@ cython_debug/
*.swp
userportal/settings/*-local.py
userportal/local.py
slurm_prolog/api_config.ini
slurm_jobscripts/slurm_jobscripts.ini
private.key
public.cert
idp_metadata.xml
Expand Down
2 changes: 1 addition & 1 deletion docs/data.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,4 +91,4 @@ groups:
The information in this database is used to show the current utilization per user within a group.

## Slurm jobscript
The script `slurm_jobscript/slurm_jobscripts_userportal.py` can be used to add the submitted script to the database of the portal. This should run on the Slurm server, it will collect the scripts from `/var/spool/slurmctld`. This script uses the REST API of Django to push the job script. A user with a token need to be created, check the [installation documentation](install.md) on how to create this API token.
The script `slurm_jobscript/slurm_jobscripts.py` can be used to add the submitted script to the database of the portal. This should run on the Slurm server, it will collect the scripts from the `spool` directory of slurm. This script uses the REST API of Django to push the job script. A user with a token need to be created, check the [installation documentation](install.md) on how to create this API token.
4 changes: 2 additions & 2 deletions docs/jobstats.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Jobstats
Each user can see their current uses on the cluster and a few hours in the past. The stats for each job are also available. Information about CPU, GPU, memory, filesystem, InfiniBand, power, etc. is also available per job. The submitted job script can also be collected and displayed on this page. Some automatic recommendations are also given to the user, based on the content of their job script and the stats of their job.
Each user can see their current uses on the cluster and a few hours in the past. The stats for each job are also available. Information about CPU, GPU, memory, filesystem, InfiniBand, power, etc. is also available per job. The submitted job script can also be collected from the Slurm server and then stored and displayed in the portal. Some automatic recommendations are also given to the user, based on the content of their job script and the stats of their job.

<a href="user.png"><img src="user.png" alt="Stats per user" width="100"/></a>
<a href="job.png"><img src="job.png" alt="Stats per job" width="100"/></a>
Expand All @@ -13,4 +13,4 @@ Optional:
* node\_exporter (show node information)
* redfish\_exporter (show power information)
* lustre\_exporter and lustre\_exporter\_slurm (show Lustre information)
* jobscript collector (show the submitted jobscript)
* slurm_jobscripts.py (show the submitted jobscript)
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,6 @@
token = changeme
host = http://localhost:8000
script_length = 100000

[slurm]
spool = /var/spool/slurmctld
81 changes: 81 additions & 0 deletions slurm_jobscripts/slurm_jobscripts.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
import requests
import configparser
import os
import time
import argparse
import logging

# This is script is taking the submitted script on the slurmctld server
# and send it to the userportal so it can be stored in a database


def send_job(jobid):
try:
with open('{spool}/hash.{mod}/job.{jobid}/script'.format(
spool=spool,
mod=jobid % 10,
jobid=jobid), 'r') as f:
content = f.read()[:script_length].strip('\x00')
logging.debug('Job script {}: {}'.format(jobid, content[:100])) # Only log first 100 characters into DEBUG log
r = requests.post('{}/api/jobscripts/'.format(host),
json={'id_job': int(jobid), 'submit_script': content},
headers={'Authorization': 'Token ' + token})
if r.status_code != 201:
if r.status_code == 401:
logging.error('Token is invalid')
elif 'job script with this id job already exists' in r.text:
logging.debug('Job script already exists')
else:
logging.error('Job script {} not saved: {}'.format(jobid, r.text))

except UnicodeDecodeError:
# Ignore problems with wrong file encoding
pass
except FileNotFoundError:
# The script disappeared before we could read it
pass


if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument(
'--config',
help='Path to the config file (default: %(default)s)',
type=str,
default='/etc/slurm/slurm_jobscripts.ini')
parser.add_argument('--verbose', help='Verbose output', action='store_true')
args = parser.parse_args()

if args.verbose:
logging.basicConfig(level=logging.DEBUG)
else:
logging.basicConfig(level=logging.INFO)

config = configparser.ConfigParser()
logging.debug('Reading config file: {}'.format(args.config))
config.read(args.config)
token = config['api']['token']
host = config['api']['host']
script_length = int(config['api']['script_length'])
spool = config['slurm']['spool']

jobs = set()

while True:
updated_jobs = set()
for mod in range(10):
try:
listing = os.listdir('{spool}/hash.{mod}'.format(spool=spool, mod=mod))
except FileNotFoundError:
logging.debug('hash.{mod} does not exist yet'.format(mod=mod))
continue
for job in filter(lambda x: 'job' in x, listing):
jobid = int(job[4:]) # parse the jobid (job.12345 -> 12345)
updated_jobs.add(jobid)

if jobid not in jobs:
logging.debug('New job: {}'.format(jobid))
send_job(jobid)

jobs = updated_jobs
time.sleep(5)
48 changes: 0 additions & 48 deletions slurm_jobscripts/slurm_jobscripts_userportal.py

This file was deleted.

0 comments on commit 3a7df70

Please sign in to comment.