Skip to content

Latest commit

 

History

History
68 lines (44 loc) · 2.91 KB

File metadata and controls

68 lines (44 loc) · 2.91 KB

Sambanova

Connection to Sambanova

Connection to a SambaNova node is a two-step process. The first step is to ssh to the login node. The second step is to log in to a SambaNova node from the login node.

Sambanova connection diagram

Login to the Sambanova login node from your local machine. This uses the MobilePASS+ token generated every time you log in to the system.

In the examples below, replace ALCFUserID with your ALCF user id.

ssh [email protected]
Password: < MobilePASS+ code >

Note: Use the ssh "-v" option in order to debug any ssh problems.

Once you are on the login node, ssh to one of the sambanova compute node.

ssh sn30-r1-h1       

It is also recommended to ssh to other compute nodes namely, sn30-r1-h1, sn30-r1-h2, sn30-r2-h1, sn30-r2-h2, sn30-r3-h1, sn30-r3-h2, sn30-r4-h1, sn30-r4-h2. Note: This avoids all your jobs being queued up on the same node.

Job Queuing and Submission

SambaNova uses Slurm for job submission and queueing. Below are some of the important commands for using Slurm.

  • srun : can be used to run individual Python scripts in parallel with other scripts on a cluster managed by Slurm.
  • sbatch :jobs can be submitted to the Slurm workload manager through a batch script by using the sbatch command.
  • squeue : command provides information about jobs located in the Slurm scheduling queue.
  • sinfo : is used to view partition and node information for a system running Slurm.
  • scancel : is used to signal or cancel jobs, job arrays, or job steps.

Hands-on Example

Additonal Examples

Copy Applications to $HOME directory

Sambanova software stack and associated environmental variables are automatically setup at login for a SN30 node.

Each of the samples or application examples provided by SambaNova has its own pre-built virtual environment which can be readily used. They are present in the /opt/sambaflow/apps/ directory tree within each of the applications. This directory contains all the different models currently supported with the Sambanova software stack.

Homework

For BERT example, understand flags used in the script. Change values for flag --ntasks and measure its effect on performance. Submit proof (contents printed out to your terminal, path to a logfile or screenshot) that you were able to successfully follow the instructions and execute.

Additional Resources