SRUN
An interactive job is a job that returns a command line prompt (instead of running a script) when the job runs. Interactive jobs are useful when debugging or interacting with an application. The srun command is used to submit an interactive job to Slurm. When the job starts, a command line prompt will appear on one of the compute nodes assigned to the job. From here commands can be executed using the resources allocated on the local node.
[user@discovery ~]$ srun --acount=rc --pty /bin/bash
[user@p04 ~]$ hostname
p04.hpcc.dartmouth.edu
[user@p04 ~]$
Jobs submitted with srun –pty /bin/bash will be assigned the cluster default values of 1 CPU and 1024MB of memory. The account must also be specified; the job will not run otherwise. If additional resources are required, they can be requested as options to the srun command. The following example job is assigned 2 nodes with 2 CPUS and 4GB of memory each:
srun --nodes=2 --ntasks-per-node=4 --mem-per-cpu=1GB --cpus-per-task=1 --account=rc --pty /bin/bash
[user@q06 ~]$
An interactive job is launched on a compute node and provides you with a command line prompt. Interactive jobs are useful when debugging or interacting with an application. You will use the srun
command to launch an interactive job. Once the job has started, commands can be executed utilizing resources on the local node.
$ srun --pty /bin/bash
$ hostname
p04.hpcc.dartmouth.edu
$
Jobs submitted with bash srun --pty /bin/bash
will be assigned the cluster default values of 1 CPU and 1024MB of memory. The account must also be specified; the job will not run otherwise. If additional resources are required, they can be requested as options to the srun command. The following example job is assigned 2 nodes with 2 CPUS and 4GB of memory each:
srun --nodes=2 --ntasks-per-node=4 --mem-per-cpu=1GB --cpus-per-task=1 --pty /bin/bash
q06 ~]$
Occasionally you may need to run an interactive job on a GPU node to test the code that you are using on hardware which is GPU aware. You can query what GPU resources are available with sinfo -O gres -p <name of queue>
$ sinfo -O gres
GRES
gpu:k80:4(S:1)
gpu:v100:4(S:0-1)
From this output we can see that k80’s and v100’s are available. Now we can submit an interactive job requesting those specific resources. For example, if we wanted k80 GPUs we would submit our interactive job like:
srun -p gpuq --gres=gpu:k80:1 --pty /bin/bash
From the output of nvidia-smi we can see that we have been assigned two k80 GPUs:
$ srun -p gpuq --gres=gpu:k80:2 --pty /bin/bash
[g03]$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:8A:00.0 Off | 0 |
| N/A 34C P8 26W / 149W | 0MiB / 11441MiB | 0% E. Process |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K80 Off | 00000000:8B:00.0 Off | 0 |
| N/A 30C P8 31W / 149W | 0MiB / 11441MiB | 0% E. Process |
| | | N/A |
+-------------------------------+----------------------+----------------------+
[@g03 ~]$ echo $CUDA_VISIBLE_DEVICES
0,1```