Skip to content

Commit

Permalink
Chap 3 - Completed translations of exercises
Browse files Browse the repository at this point in the history
  • Loading branch information
plstonge committed Feb 27, 2024
1 parent e06c1ad commit e129ee6
Showing 1 changed file with 65 additions and 71 deletions.
136 changes: 65 additions & 71 deletions 3-task-arrays.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -66,12 +66,12 @@
"metadata": {},
"source": [
"### GNU Parallel Command Syntax\n",
"Les éléments de base de la commande `parallel` :\n",
"The main basic elements of a `parallel` command are:\n",
"```Bash\n",
"parallel options gabarit_de_commande ::: liste de valeurs\n",
"parallel options command_template ::: list of values\n",
"```\n",
"\n",
"Voir :\n",
"See the manual page:\n",
"```Bash\n",
"man parallel # Press Q to quit\n",
"```"
Expand All @@ -83,17 +83,16 @@
"source": [
"### Use Cases\n",
"#### One Sequence of Parameter Values\n",
"Le paramètre changeant est donné via une paire d'`{}` :\n",
"The default placeholder for the changing parameter is `{}`:\n",
"```Bash\n",
"parallel echo fichier{}.txt ::: 1 2 3 4\n",
"parallel echo file{}.txt ::: 1 2 3 4\n",
"# parallel --citation # Commit to citing developers\n",
"```\n",
"\n",
"On peut réécrire la première commande en utilisant l'expansion des\n",
"accolades Bash `{a..b}` :\n",
"We can rewrite the above command by using the Bash expansion `{a..b}`:\n",
"```Bash\n",
"parallel echo fichier{}.txt ::: {1..4}\n",
"parallel echo fichier{}.txt ::: {01..10}\n",
"parallel echo file{}.txt ::: {1..4}\n",
"parallel echo file{}.txt ::: {01..10}\n",
"```"
]
},
Expand All @@ -102,37 +101,33 @@
"metadata": {},
"source": [
"#### Multiple Combinations of Parameter Values\n",
"**a)** Lorsqu'il y a **plusieurs séquences de paramètres à combiner**,\n",
"on peut utiliser des paires d'accolades numérotées telles que\n",
"`{1}`, `{2}`, etc. :\n",
"**a)** When you have **multiple sequences of parameters to combine**,\n",
"you can use numbered placeholders, like `{1}` and `{2}`:\n",
"```Bash\n",
"parallel echo fichier{1}{2}.txt ::: {01..10} ::: a b\n",
"parallel echo file{1}{2}.txt ::: {01..10} ::: a b\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**b)** Dans le cas où on retrouve les **combinaisons de paramètres\n",
"dans un fichier texte** :\n",
"**b)** In the case where **all combinations\n",
"of parameters are in a text file**:\n",
"```Bash\n",
"# parallel echo {} ::: 3 5 7 ::: 4 6 8 > param.txt\n",
"cat scripts/prll-creer-param.sh\n",
"bash scripts/prll-creer-param.sh\n",
"cat param.txt\n",
"cat scripts/param.txt\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"La commande `parallel` aura `-C ' '` pour spécifier le séparateur de\n",
"paramètres dans `param.txt`, ainsi que l'argument `::::` pour\n",
"spécifier ensuite ce nom de fichier :\n",
"The `parallel` command will have `-C ' '` to specify\n",
"the separator between parameters in `scripts/param.txt`,\n",
"and the `::::` argument to provide this filename:\n",
"```Bash\n",
"# parallel -C ' ' echo '{1}*{2} | bc > prod_{1}x{2}' :::: param.txt\n",
"# parallel -C ' ' echo '$(({1}*{2}))' :::: scripts/param.txt\n",
"cat scripts/prll-exec-param.sh\n",
"sbatch scripts/prll-exec-param.sh\n",
"```"
Expand All @@ -142,24 +137,20 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"**c)** Si on préfère valider la liste des commandes générées avant de\n",
"soumettre une tâche Slurm, on peut générer une **liste de commandes\n",
"dans un fichier texte** :\n",
"**c)** If you prefer to validate a **list of commands in a\n",
"text file** prior to their execution on the compute node:\n",
"```Bash\n",
"# parallel -C ' ' echo 'echo {1}\"*\"{2} \"|\" bc \">\" prod_{1}x{2}' :::: param.txt > commandes.txt\n",
"cat scripts/prll-creer-cmd.sh\n",
"bash scripts/prll-creer-cmd.sh\n",
"cat commandes.txt\n",
"cat scripts/cmd.txt\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Et dans la tâche Slurm :\n",
"The job script will have a much simplified `parallel` command:\n",
"```Bash\n",
"# parallel < commandes.txt\n",
"# parallel < scripts/cmd.txt\n",
"cat scripts/prll-exec-cmd.sh\n",
"sbatch scripts/prll-exec-cmd.sh\n",
"```"
Expand All @@ -170,9 +161,9 @@
"metadata": {},
"source": [
"#### Limiting the Number of Simultaneous Processes\n",
"Le paramètre `--jobs` permet de forcer une limite sur le nombre de\n",
"processus lancés à la fois. Par exemple, 8 cas avec 2 processus\n",
"en simultané :\n",
"The flag `--jobs` allows us to limit the number of\n",
"simultaneous running processes. For example, we can\n",
"have 8 tasks done with 2 simultaneous processes:\n",
"```Bash\n",
"parallel --jobs 2 'echo {} && sleep 3' ::: {1..8}\n",
"```"
Expand All @@ -183,44 +174,44 @@
"metadata": {},
"source": [
"### **Exercise** - Aligning DNA Sequences\n",
"Dans le dossier `donnees/`, vous devriez avoir plusieurs\n",
"fichiers Fasta (`*.fa`) qui ont été créés au premier\n",
"chapitre via le script de tâche `scripts/blastn-gen-seq.sh`.\n",
"**Si ce n'est pas le cas**, faites :\n",
"In the directory `data/`, you should already have multiple\n",
"Fasta files (`*.fa`) that were created in the previous\n",
"chapter with the job script `scripts/blastn-gen-seq.sh`.\n",
"**If this is not the case**, run the following:\n",
"```Bash\n",
"sbatch scripts/blastn-gen-seq.sh\n",
"```\n",
"\n",
"On devrait y retrouver :\n",
"* pour chaque espèce fictive A, B, C et D,\n",
" un fichier de séquences d'ADN \"connues\"\n",
" * Ces fichiers sont convertis en bases de données Blast\n",
"* pour 16 espèces \"inconnues\" K à Z, des séquences d'ADN à aligner\n",
" sur les séquences \"connues\" des espèces A, B, C et D"
"You should find:\n",
"* for each fictive species A, B, C and D,\n",
" a file of \"known\" DNA sequences\n",
" * These files are converted into a Blast database\n",
"* for 16 \"unknown\" species K through Z, DNA sequences\n",
" to align on \"known\" sequences of species A through D"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"On cherche donc à calculer l'alignement de toutes les\n",
"combinaisons `{A,B,C,D}` x `{K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z}`,\n",
"ce qui donne 64 combinaisons.\n",
"Le script de tâche suivant utilise GNU Parallel pour calculer les\n",
"différents cas avec 4 processeurs :\n",
"We then want to compute the alignement of all\n",
"combinations `{A,B,C,D}` x `{K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z}`,\n",
"which makes 64 combinations to test.\n",
"The following job script uses GNU Parallel in order\n",
"to compute all combinations with 4 CPU cores:\n",
"```Bash\n",
"cat scripts/blastn-parallel.sh\n",
"sbatch scripts/blastn-parallel.sh\n",
"```\n",
"\n",
"Pour surveiller l'état de la tâche :\n",
"To monitor the status of the compute job:\n",
"```Bash\n",
"squeue -u $USER\n",
"ls donnees/res_prll/\n",
"```\n",
"À la fin de la tâche, il est bon d'analyser les ressources utilisées :\n",
"At the end of the job, check used resources:\n",
"```\n",
"seff <jobid>\n",
"seff <job_id>\n",
"```"
]
},
Expand Down Expand Up @@ -292,18 +283,20 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"La variable `$SLURM_ARRAY_TASK_ID` peut être utilisée de différentes façons :\n",
"The variable `$SLURM_ARRAY_TASK_ID` can be used in many ways.\n",
"The below examples use `$N`, but `$SLURM_ARRAY_TASK_ID`\n",
"works the same in a job script:\n",
"```Bash\n",
"export SLURM_ARRAY_TASK_ID=71 # To emulate the variable\n",
"export N=71 # Only required for this example\n",
"\n",
"echo nom_fichier.$SLURM_ARRAY_TASK_ID\n",
"echo nom_dossier-$SLURM_ARRAY_TASK_ID\n",
"echo file.$N\n",
"echo directory-$N\n",
"```\n",
"\n",
"![From One Integer to 2D Coordinates](images/n2r_c.svg)\n",
"```Bash\n",
"PARAM_R=$((SLURM_ARRAY_TASK_ID / 12)) # Integer division\n",
"PARAM_C=$((SLURM_ARRAY_TASK_ID % 12)) # Modulo (division remainder)\n",
"PARAM_R=$((N / 12)) # Integer division\n",
"PARAM_C=$((N % 12)) # Modulo (division remainder)\n",
"echo $PARAM_R $PARAM_C\n",
"\n",
"head -n $((PARAM_R + 1)) param.txt | tail -1\n",
Expand All @@ -315,24 +308,24 @@
"metadata": {},
"source": [
"### **Exercise** - Job Arrays\n",
"Lancez le vecteur de tâches :\n",
"Submit the job array:\n",
"```Bash\n",
"cat scripts/blastn-array.sh\n",
"sbatch scripts/blastn-array.sh\n",
"\n",
"squeue -u $USER\n",
"```\n",
"\n",
"Après l'exécution des quatre tâches, inspectez les résultats :\n",
"Once all four jobs are done, inspect the results:\n",
"```Bash\n",
"ls slurm-*_*.out\n",
"ls -l donnees/res_array/\n",
"```\n",
"\n",
"* **Corrigez le script de tâche** pour que les 16 inconnues K à Z\n",
" soient traitées, avec une limite de **quatre** tâches à la fois.\n",
"* Relancez le vecteur de tâches pour tester la correction.\n",
"* Utilisez la commande `seff` pour étudier une des 16 tâches."
"* **Modify the job script** to process all 16 unknowns K through Z,\n",
" with a limit of **four** simultaneous processes per job\n",
"* Submit this modified job array\n",
"* Use the `seff` command to investigate one of the 16 jobs"
]
},
{
Expand All @@ -341,16 +334,17 @@
"source": [
"## Key Points\n",
"* **GNU Parallel**\n",
" [pour lancer de multiples combinaisons de paramètres](https://docs.alliancecan.ca/wiki/GNU_Parallel/fr)\n",
" [to run multiple combinations of parameters](https://docs.alliancecan.ca/wiki/GNU_Parallel)\n",
"\n",
"```Bash\n",
"parallel 'gabarit_cmd({1})' ::: valeurs1\n",
"parallel 'gabarit_cmd({1}, {2})' ::: valeurs1 ::: valeurs2\n",
"parallel -C <sep> 'gabarit_cmd({1}, {2})' :::: paires_param.txt\n",
"parallel --jobs 'N_cas_par_noeud' < liste_commandes.txt\n",
"parallel 'command_template({1})' ::: values1\n",
"parallel 'command_template({1}, {2})' ::: values1 ::: values2\n",
"parallel -C <sep> 'command_template({1}, {2})' :::: param_pairs.txt\n",
"parallel --jobs 'N_processes_per_node' < command_list.txt\n",
"```\n",
"\n",
"* **Vecteurs de tâches** pour lancer une série de longues ou de larges tâches\n",
"* **Job Arrays**\n",
" [to submit many long or large similar jobs](https://docs.alliancecan.ca/wiki/Job_arrays)\n",
"\n",
"```Bash\n",
"# $SLURM_ARRAY_TASK_ID will have only one value of ...\n",
Expand Down

0 comments on commit e129ee6

Please sign in to comment.