Skip to content

Tutorial 4: Putting it all together

Suchandra Thapa edited this page Jun 13, 2014 · 1 revision

Introduction

This page combines some of the concepts introduced in previous tutorials to allow the user to create applications that remotely access data and software at the same time. After completing this, users should be able to fully utilize the capabilities of Parrot and SkeletonKey.

Prerequisites

The following items are needed in order to complete this tutorial:

  1. Webserver where the user can place files to access using the web
  2. HTCondor Cluster (optional)
  3. A working SkeletonKey install
  4. A squid proxy for Parrot to use
  5. A running Chirp server
  6. Familiarity with using SkeletonKey for remote data and software access (the second tutorial and third tutorial are sufficient)

Conventions

In the examples given in this tutorial, text in red denotes strings that should be replaced with user specific values. E.g. the URL for the user's webserver. In addition, this tutorial will assume that files can be made available through the webserver by copying them to ~/public_html on the machine where SkeletonKey is being installed.

Combined data and software access example

The next example will be guide the user through creating a job that will read and write from a filesystem exported by Chirp using software that's available using CVMFS. Before you start, please make sure that Chirp is installed and exporting a directory (this tutorial will assume that Chirp is exporting /tmp)

Creating the application tarball

Since we'll be running an application from a CVMFS repository, we'll create an application tarball to do some initial setup and then run the actual application

  1. Create a directory for the script

    [user@hostname ~]$ mkdir /tmp/combined_access

  2. Create a shell script, /tmp/combined_access/myapp.sh with the following lines:

    #!/bin/bash export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/cvmfs/uc3.uchicago.edu/sw/lib /cvmfs/uc3.uchicago.edu/sw/bin/Rscript ./combined_access/test.R ./combined_access/test.R ./combined_access/data.grb $CHIRP_MOUNT/output/$1 echo "Finishing script at: " echo date

  3. Create a R script /tmp/cvmfs/test.R with the following lines:

    #!/usr/bin/Rscript --vanilla

    library( raster) args <- commandArgs(TRUE) grbFile <- args[ 1] scanHowMany <- args[ 2] output <- args[3] grb <- brick( grbFile)

    for( n in 1:scanHowMany) { r <- subset( grb, n) cat( paste( names( r), cellStats( r, "sum"), sep= " "), "\n", file=output) }

  4. Next, make sure the myapp.sh script is executable and create a tarball:

    [user@hostname ~]$ chmod 755 /tmp/combined_access/myapp.sh [user@hostname ~]$ cd /tmp [user@hostname ~]$ tar cvzf combined_access.tar.gz combined_access

  5. Then copy the tarball to your webserver

    [user@hostname ~]$ cd /tmp [user@hostname ~]$ cp combined_access.tar.gz ~/public_html [user@hostname ~]$ chmod 644 ~/public_html/combined_access.tar.gz

  6. Finally, download the CVMFS repository key at http://uc3-data.uchicago.edu/uc3.key and make this available on your webserver

One thing to note here is that Parrot makes mounted CVMFS repositories available under /cvmfs/repository_name where repository_name is replaced by the name that the repository is published under.

Creating a job wrapper

You'll need to do the following on the machine where you installed SkeletonKey

  1. Open a file called combined.ini and add the following lines:

    [CVMFS] repo1 = uc3.uchicago.edu repo1_options = url=http://uc3-cvmfs.uchicago.edu/opt/uc3/,pubkey=http://repository_key_url,quota_limit=1000,proxies=squid-proxy:3128 repo1_key = http://repository_key_url

    [Directories] export_base = /tmp/user read = /, data write = /, output

    [Parrot] location = http://your.host/parrot.tar.gz

    [Application] location = http://your.host/combined-access.tar.gz script = ./combined_access/myapp.sh

  2. In combined-access.ini, change the url http://your.host/parrot.tar.gz to point to the url of the parrot tarball that you copied previously.

  3. Run SkeletonKey on combined-access.ini:

    [user@hostname ~]$ skeleton_key -c combined-access.ini

  4. Run the job wrapper to verify that it's working correctly

    [user@hostname ~]$ sh ./job_script.sh test.output

Using the job wrapper

Standalone

Once the job wrapper has been verified to work, it can be copied to another system and run:

[user@hostname ]$ scp job_script %REDanother_host:/ [user@hostname ~]$ ssh another_host [user@another_host ~] sh ./job_script

Submitting to HTCondor (Optional)

The following part of the tutorial is optional and will cover using a generated job wrapper in a HTCondor submit file.

  1. On your HTCondor submit node, create a file called sk.submit with the following contents

    universe = vanilla notification=never executable = ./job_script.sh arguments = test.output.$(Process) output = /tmp/sk/test_$(Cluster).$(Process).out error = /tmp/sk/test_$(Cluster).$(Process).err log = /tmp/sk/test.log ShouldTransferFiles = YES when_to_transfer_output = ON_EXIT queue 5

  2. Next, create /tmp/sk for the log and output files for condor

    [user@condor-submit-node ~] mkdir /tmp/sk

  3. Then copy the job wrapper to the HTCondor submit node

    [user@hostname ]$ scp job_script.sh condor-submit-node:/

  4. Finally submit the job to HTCondor and verify that the jobs ran successfully

    [user@hostname ~]$ ssh condor-submit-node [user@condor-submit-node ~] condor_submit sk.submit

Something to note in the HTCondor submit file, is that we're passing the name of the output file that should be written using the arguments setting and then using the $(Process) variable to ensure that each queued job writes to a different file. HTCondor will then pass the variable to the job_script.sh which then makes sure that it gets appended to the arguments passed to the myapp.sh script.