SBATCH
Discovery Job Submission
Sample Slurm Script to Submit a Single Processor Job.
Create a script file that includes the details of the job that you want to run.
It can include the name of the program, the memory, wall time and processor requirements of the job, which queue it should run in and how to notify you of the results of the job.
Here is an example submit script.
#!/bin/bash
# Name of the job
#SBATCH --job-name=single-core-test-job
# Number of compute nodes
#SBATCH --nodes=1
# Number of cores, in this case one
#SBATCH --ntasks-per-node=1
# Walltime (job duration)
#SBATCH --time=00:15:00
# Email notifications
#SBATCH --mail-type=BEGIN,END,FAIL
hostname
date
sleep 60
All of the lines that begin with a #SBATCH are directives to Slurm. The meaning of the directives in the sample script are exampled in a comment line that precedes the directive.
The full list of available directives is explained in the man page for the sbatch command which is available on discovery.
sbatch will copy the current shell environment and the scheduler will recreate that environment on the allocated compute node when the job starts. The job script does NOT run .bashrc or .bash_profile, and so may not have the same environment as a fresh login shell. This is important if you use aliases, or the conda system to set up your own custom version of python and sets of python packages. Since conda defines shell functions, it must be configured before you can call, e.g. conda activate my-envThe simplest way to do this is for the first line of your script to be:
#!/bin/bash -l
Which explicitly starts bash as a login shell
Now submit the job and check its status:
[user@discovery slurm]$ sbatch my_first_slurm.sh
Submitted batch job 4056
[user@discovery slurm]$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
4056 standard multicor john R 0:01 1 p04
[user@discovery slurm]$ scontrol show job 4056
JobId=4056 JobName=multicore_job
UserId=user(48374) GroupId=rc-users(480987) MCS_label=rc
Priority=4294901747 Nice=0 Account=rc QOS=normal
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
RunTime=00:00:09 TimeLimit=00:15:00 TimeMin=N/A
SubmitTime=2021-05-14T12:25:53 EligibleTime=2021-05-14T12:25:53
AccrueTime=2021-05-14T12:25:53
StartTime=2021-05-14T12:25:54 EndTime=2021-05-14T12:40:54 Deadline=N/A
SuspendTime=None SecsPreSuspend=0 LastSchedEval=2021-05-14T12:25:54
Partition=standard AllocNode:Sid=discovery7:21489
ReqNodeList=(null) ExcNodeList=(null)
NodeList=p04
BatchHost=p04
NumNodes=1 NumCPUs=2 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
TRES=cpu=2,node=1,billing=2
Socks/Node=* NtasksPerN:B:S:C=1:0:*:* CoreSpec=*
MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=/dartfs-hpc/rc/home/8/dz99918/xnode_tests/slurm/my_first_slurm.sh
WorkDir=/dartfs-hpc/rc/home/8/dz99918/xnode_tests/slurm
StdErr=/dartfs-hpc/rc/home/8/dz99918/xnode_tests/slurm/slurm-4056.out
StdIn=/dev/null
StdOut=/dartfs-hpc/rc/home/8/dz99918/xnode_tests/slurm/slurm-4056.out
Power=
MailUser=<email> MailType=BEGIN,END,FAIL
NtasksPerTRES:0
JOBID is the unique ID of the job – in this case it is 4056. In the above example I am issuing scontrol to view information related to my job
The output file, slurm-4056.out, consists of three sections:
A header section, Prologue, which gives information such as JOBID, user name and node list. A body section which include user output to STDOUT. A footer section, Epilogue, which is similar to the header. A useful difference is the report of wallclock time towards the end. Typically your job will create one file and join STDOUT & STDERR. To have your job create two files for STDOUT & STDERR be sure to pass –output and –error. Here is an example:
--output=My_first_job-%x.%j.out
--error=My_First_job-%x.%j.err
File Management In a Batch Queue System Sometimes you may be running the same program in multiple jobs and you will need to be sure to keep your input and output files separate for each job.
One way to manage your data files is to have a separate directory for each job.
Copy the required input files to the directory and then edit the batch script file to include a line where you change to the directory that contains the input files.
cd /path/to/where/your/input/files/are
Place this line before the line where you issue the command to be run. By default your job files will be created in the directory that you submit from.
Below is an example script which will submit for 4 cores on a single compute node. Feel free to copy and paste it as a job template.
#!/bin/bash
#SBATCH --job-name=multicore_job
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --time=00:15:00
#SBATCH --mail-type=BEGIN,END,FAIL
mpirun -n 4 ./program_name <optional args>
When you are ready to submit the job, you can do so by issuing the sbatch command:
sbatch
For more information about job parameters, please take a look at:
Slurm Workload Manager(External)
#!/bin/bash
# Name of the job
#SBATCH --job-name=gpu_job
# Number of compute nodes
#SBATCH --nodes=1
# Number of cores, in this case one
#SBATCH --ntasks-per-node=1
# Request the GPU partition
#SBATCH --partition gpuq
# Request the GPU resources
#SBATCH --gres=gpu:2
# Walltime (job duration)
#SBATCH --time=00:15:00
# Email notifications
#SBATCH --mail-type=BEGIN,END,FAIL
nvidia-smi
echo $CUDA_VISIBLE_DEVICES
hostname
After submitting the job via sbatch, the output file contains the requested resources as shown by the nvidia-smi command and from the output of $CUDA_VISIBLE_DEVICES
!/bin/bash
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... On | 00000000:18:00.0 Off | 0 |
| N/A 33C P0 39W / 300W | 0MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-SXM2... On | 00000000:3B:00.0 Off | 0 |
| N/A 32C P0 40W / 300W | 0MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
0, 1
p04.hpcc.dartmouth.edu
Your program needs to know which GPU it has been assigned and the submission template above used $CUDA_VISIBLE_DEVICES to determine which GPU number your job has been assigned. You should pass the GPU number to your program as a command line argument and then set the default GPU in your code.
./program_name $CUDA_VISIBLE_DEVICES
Available GPU types can be found with the command sinfo -O gres -p . GPUs can be requested in both Batch and Interactive jobs.
$ sinfo -O gres -p gpuq
GRES
gpu:nvidia_a100_80gb
sinfo -O gres -p a5500
GRES
gpu:nvidia_rtx_a5500
Job Arrays Job arrays are multiple jobs to be executed with identical or related parameters. Job arrays are submitted with -a or –array=. The indices specification identifies what array index values should be used. Multiple values may be specified using a comma separated list and/or a range of values with a “-” separator: –array=0-15 or –array=0,6,16-32.
A step function can also be specified with a suffix containing a colon and number. For example,–array=0-15:4 is equivalent to –array=0,4,8,12. A maximum number of simultaneously running tasks from the job array may be specified using a “%” separator. For example –array=0-15%4 will limit the number of simultaneously running tasks from this job array to 4. The minimum index value is 0. The maximum value is 499999.
To receive mail alerts for each individual array task, –mail-type=ARRAY_TASKS should be added to the Slurm job script. Unless this option is specified, mail notifications on job BEGIN, END and FAIL apply to a job array as a whole rather than generating individual email messages for each task in the job array.
Below is an example submit script for submitting job arrays:
#!/bin/bash -l
# sbatch stops parsing at the first line which isn't a comment or whitespace
# SBATCH directives must be at the start of the line -- no indentation
# Name of the job
#SBATCH --job-name=sample_array_job
# Number of cores
#SBATCH --ntasks-per-node=1
# Array jobs. This example will create 25 jobs, but only allow at most 4 to run concurrently
#SBATCH --array=1-25%4
# Walltime (job duration)
#SBATCH --time=00:15:00
# Email notifications
#SBATCH --mail-type=BEGIN,END,FAIL
# Your commands go here. Each of the jobs is identical apart from environment variable
# $SLURM_ARRAY_TASK_ID, which will take values in the range 1-25
# They are all independent, and may run on different nodes at different times.
# The $SLURM_ARRAY_TASK_ID variable can be used to construct parameters to programs, select data files etc.
#
# The default output files will contain both the Job ID and the array task ID, and so will be distinct. If setting
# custom output files, you must be sure that array tasks don't all overwrite the same files.
echo "My SLURM_ARRAY_TASK_ID: " $SLURM_ARRAY_TASK_ID
sleep 300
hostname -s
Each job in the array will be allocated its own resources, possibly on different nodes.
The variable $SLURM_ARRAY_TASK_ID
will be different for each task, with values (in this example)
1-25, and can be used to construct arguments to programs to be run as part of the job. One way to use such an array is to create a file with 25 sets of arguments in it, then use
(sed …) construct returns a single line from the file.
e.g.
arguments=/path/to/file/with/program/arguments # 25-line file
myprogram $(sed -n -e "${SLURM_ARRAY_TASK_ID}p" $arguments)
The Batch System
There are four primary partitions on the cluster:
Users specify the amount of time and the number of processors required by their jobs. Several additional preemptable partitions exist for the newer GPU nodes.
Managing and Monitoring your jobs Some useful commands:
Command | Usage | Description |
---|---|---|
sbatch |
sbatch <job script> |
Submit a batch job to the queue |
squeue |
squeue |
Show status of Slurm batch jobs |
scancel |
scancel JOBID |
Cancel job |
sinfo |
sinfo |
Show information about partitions |
scontrol |
scontrol show job JOBID |
Check the status of a running or idle job |
The default length of any job submitted to the queue is currently set at one hour and the default maximum number of processors per user is set to a value based on their user status.
Information on Submitting Jobs to the Queue
sbatch
will have the environment which exists when you run the sbatch
command, unless #!/bin/bash -l
is used, in which case the job has the environment of a fresh login to Discovery (recommended).For example:
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=4