The Batch System

  • The batch system used on Discovery is Cluster Resource’s Torque Resource Manager along with their Moab Workload Manager.
  • Users login, via ssh,  to one of the head nodes (ie: altair) and submit jobs to be run on the compute nodes by writing a script file that describes their job.
  • They submit the job to the cluster using the qsub command.

There are four queues on the cluster:

  • default – This is the main queue for the cluster. It does not need to be specified as it is used by default.
  • largeq – This is for users that need to queue up more then 500 jobs. This is a routing queue. It acts as a funnel to deliver jobs to the default queue as space becomes available in the default queue for the user.
  • There is a user limit of 20,000 jobs and overall limit of 50,000 jobs in the largeq.
  • testq – This is a queue set up to run up to 16 jobs for up to 60 minutes on 16 processors on each of the three test nodes. A user can queue up no more the 16 jobs at a time.
  • You can use the tnodeload command to see how busy the test nodes are before submitting a job to them.
  • gpuq- This is a queue set up to run GPU related jobs on the two production GPU nodes.

Users specify the amount of time and the number of processors required by their jobs.

Managing and Monitoring your jobs

See the How To on monitoring your jobs after they are submitted.

Some useful commands:

Command

Example

Description

qsub qsub pbs_script submit a batch job to the queue
qstat qstat show status of Torque batch jobs
qdel qdel JOBID used to delete a job from the queue, where JOBID is a job number
obtained by executing the command qstat or showq
checkjob checkjob JOBID used to check that status of a running or idle job, where JOBID is
a job number obtained by executing the command qstat or showq
pbsmon pbsmon
will give you information about node usage on the cluster.
qr qr
qr -h
will give you information about your available cluster resources along
with the amount of resources you are using.
showq showq
showq -r -b -i
show all of the jobs in the batch queue
show running (-r), blocked (-b), or eligible(-i) jobs in the queue
qshow qshow
qshow -r
show consolidated view of showq
include the routing queue (largeq) in output
myjobs myjobs
myjobs -r -b -i
show all your jobs in the queue
show just your running (-r), blocked (-b), or eligible(-i) jobs in the queue
qnotify qnotify job-id hours
qnotify -l
set email notification for number of hours before job scheduled to end
list notifications set up

Information on Submitting Jobs to the Queue

  • The default length of any job submitted to the queue is currently set at one hour and the default maximum number of processors per user is set to a value based on their user status.
  • Jobs that run longer than twenty days  will be terminated by the scheduler.
    • These parameters are subject to change as we become more familiar with users needs.
  • It is important for users to specify the resources required by their jobs.
  • In the current configuration, the walltime and the number of nodes are the two parameters that matter.
  • If you don’t specify the walltime, the system default of one hour will be assumed and your job may end early.
  • See the Single Processor Job Example for further details

Information for Multiprocessor Jobs

  • For multiprocessor jobs, it is important to specify the number of nodes and processors required and to select nodes that are of the same feature and on the same switch.
  • The nodes are divided into cells. The nodes in each cell are homogeneous with similar chip vendors and speed, as well as disk and memory size.
  • Feature Assignments:

 

<td

Features

Nodes

CPU (cores)

Memory

Scratch Space

Comment

cella,amd,ib1 a01-a33 AMD (8) 2.7Ghz 32Gb 143Gb Infiniband
cellb,intel b01-b16 Intel (8) 2.3Ghz 32Gb 143Gb
cellc,amd c01-c27 AMD (16) 2.4Ghz 64Gb 820Gb
celld,amd,ib2 d01-d39 AMD (16) 3.0Ghz 64Gb 820Gb Infiniband
celle,amd e01-e23 AMD (16) 3.1Ghz 64Gb 820Gb
cellf,amd f01-f08 AMD (48) 2.8Ghz 192Gb 849Gb
  • Parallel programs that need to communicate between processes will run more efficiently if all of the processes are in the same group.
  • You can specify which group of nodes to use by adding the group to the PBS directive where you specify the number of nodes and processors.
  • For example:
    • #PBS -l nodes=3:ppn=8
      #PBS -l feature='cellb'
    • This example specifies that the job will run in cellb, on 3 nodes, using 8 processors per node.
    • Before you submit your job, use the features -a command to see which nodes are currently running jobs so you can select a cell that has free nodes.
    • The nodes in the ib1 and ib2 groups also have an infiniband interconnect which is a faster network connection than the standard 1Gb ethernet.
    • Use this group of nodes for MPI programs that have a lot of inter-process communication.
    • You will need to rebuild your program with an infiniband version of the MPI library as explained in the MPI section.

See the Sample parallel job scripts for examples of how to submit parallel jobs.

Using Node Features

  • Here are the node features that can be specified on the PBS node directive line:

Node Directive

Meaning

Example

ib1 run job on infiniband nodes #PBS -l feature=’ib1′
ib2 run job on infiniband nodes #PBS -l feature=’ib2′
cell# (a-f) run job on designated cell #PBS -l feature=’cella’
amd run job on AMD Opteron Processors #PBS -l feature=’amd’
intel run job on Intel Processors #PBS -l feature=’intel’

Interactive Batch Jobs

  • You can run an interactive batch job by using the -I option to qsub.
  • The job will be queued and scheduled as a non-interactive job.
  • When executed the standard input, output and error are connected through qsub to the terminal where qsub is run.
  • You can use the -l qsub option to specify the number of nodes where the job should run and also the walltime.
  • Here is an example of starting an interactive job that will run on two nodes for a walltime of 3 hours with X11 forwarding enabled:
    • qsub -I -X -l nodes=2:ppn=8 -l walltime=3:00:00
  • An interactive job is useful for debugging your program or insuring that you have reserved the node(s) for your job.

Policy on Number of Jobs

  • The scheduler can handle a total of about 4000 jobs in the running, idle, plus blocked portions of the queue.
  • The total number of running jobs can reach over 2000 (dependent on the number of online cpu’s) when the cluster is fully utilized with single-cpu jobs.
  • We attempt to limit the number of jobs in the idle and blocked portions of the queue to about 4000.
  • To help maintain this, we limit to a maximum of 600 jobs in the queue per user.
    • This limit has proved to work well to keep the total number of jobs safely below 4000.
    • If you have a need to queue up more then 600 jobs, submit your job to the queue named largeq.
    • Jobs will be routed to the default queue over time.
  • For more information, see the Scheduler Policies page.