The Batch System
- The batch system used on Discovery is Cluster Resource’s Torque Resource Manager along with their Moab Workload Manager.
- Users login, via ssh, to one of the head nodes (ie: altair) and submit jobs to be run on the compute nodes by writing a script file that describes their job.
- They submit the job to the cluster using the qsub (for legacy Discovery users) or mksub (for DartFS users) command.
There are four queues on the cluster:
- default – This is the main queue for the cluster. It does not need to be specified as it is used by default.
- largeq – This is for users that need to queue up more then 500 jobs. This is a routing queue. It acts as a funnel to deliver jobs to the default queue as space becomes available in the default queue for the user.
- There is a user limit of 20,000 jobs and overall limit of 50,000 jobs in the largeq.
- testq – This is a queue set up to run up to 16 jobs for up to 60 minutes on 16 processors on each of the three test nodes. A user can queue up no more the 16 jobs at a time.
- You can use the tnodeload command to see how busy the test nodes are before submitting a job to them.
- gpuq- This is a queue set up to run GPU related jobs on the two production GPU nodes.
Users specify the amount of time and the number of processors required by their jobs.
Managing and Monitoring your jobs
See the How To on monitoring your jobs after they are submitted.
Some useful commands:
|qsub (for legacy Discovery users)
mksub (for DartFS)
||submit a batch job to the queue|
||show status of Torque batch jobs|
||used to delete a job from the queue, where JOBID is a job number
obtained by executing the command qstat or showq
||used to check that status of a running or idle job, where JOBID is
a job number obtained by executing the command qstat or showq
||will give you information about node usage on the cluster.|
||will give you information about your available cluster resources along
with the amount of resources you are using.
||show all of the jobs in the batch queue
show running (-r), blocked (-b), or eligible(-i) jobs in the queue
||show consolidated view of showq
include the routing queue (largeq) in output
||show all your jobs in the queue
show just your running (-r), blocked (-b), or eligible(-i) jobs in the queue
||set email notification for number of hours before job scheduled to end
list notifications set up
Information on Submitting Jobs to the Queue
- The default length of any job submitted to the queue is currently set at one hour and the default maximum number of processors per user is set to a value based on their user status.
- Jobs that run longer than twenty days will be terminated by the scheduler.
- These parameters are subject to change as we become more familiar with users needs.
- It is important for users to specify the resources required by their jobs.
- In the current configuration, the walltime and the number of nodes are the two parameters that matter.
- If you don’t specify the walltime, the system default of one hour will be assumed and your job may end early.
- See the Single Processor Job Example for further details
Information for Multiprocessor Jobs
- For multiprocessor jobs, it is important to specify the number of nodes and processors required and to select nodes that are of the same feature and on the same switch.
- The nodes are divided into cells. The nodes in each cell are homogeneous with similar chip vendors and speed, as well as disk and memory size.
- Feature Assignments:
|cella,amd,ib1||a01-a33||AMD (8) 2.7Ghz||32Gb||143Gb||Infiniband|
|cellb,intel||b01-b16||Intel (8) 2.3Ghz||32Gb||143Gb|
|cellc,amd||c01-c27||AMD (16) 2.4Ghz||64Gb||820Gb|
|celld,amd,ib2||d01-d39||AMD (16) 3.0Ghz||64Gb||820Gb||Infiniband|
|celle,amd||e01-e23||AMD (16) 3.1Ghz||64Gb||820Gb|
|cellf,amd||f01-f08||AMD (48) 2.8Ghz||192Gb||849Gb|
- Parallel programs that need to communicate between processes will run more efficiently if all of the processes are in the same group.
- You can specify which group of nodes to use by adding the group to the PBS directive where you specify the number of nodes and processors.
- For example:
#PBS -l nodes=3:ppn=8
#PBS -l feature='cellb'
- This example specifies that the job will run in cellb, on 3 nodes, using 8 processors per node.
- Before you submit your job, use the features -a command to see which nodes are currently running jobs so you can select a cell that has free nodes.
- The nodes in the ib1 and ib2 groups also have an infiniband interconnect which is a faster network connection than the standard 1Gb ethernet.
- Use this group of nodes for MPI programs that have a lot of inter-process communication.
- You will need to rebuild your program with an infiniband version of the MPI library as explained in the MPI section.
See the Sample parallel job scripts for examples of how to submit parallel jobs.
Using Node Features
- Here are the node features that can be specified on the PBS node directive line:
|ib1||run job on infiniband nodes||#PBS -l feature=’ib1′|
|ib2||run job on infiniband nodes||#PBS -l feature=’ib2′|
|cell# (a-f)||run job on designated cell||#PBS -l feature=’cella’|
|amd||run job on AMD Opteron Processors||#PBS -l feature=’amd’|
|intel||run job on Intel Processors||#PBS -l feature=’intel’|
Interactive Batch Jobs
- You can run an interactive batch job by using the -I option to qsub (mksub for DartFS users).
- The job will be queued and scheduled as a non-interactive job.
- When executed the standard input, output and error are connected through qsub/mksub to the terminal where qsub/mksub is run.
- You can use the -l qsub/mksub option to specify the number of nodes where the job should run and also the walltime.
- Here is an example of starting an interactive job that will run on two nodes for a walltime of 3 hours with X11 forwarding enabled:
For Legacy Discovery Users
qsub -I -X -l nodes=2:ppn=8 -l walltime=3:00:00
For DartFS Discovery Users
mksub -I -X -l nodes=2:ppn=8 -l walltime=3:00:00
Policy on Number of Jobs
- The scheduler can handle a total of about 4000 jobs in the running, idle, plus blocked portions of the queue.
- The total number of running jobs can reach over 2000 (dependent on the number of online cpu’s) when the cluster is fully utilized with single-cpu jobs.
- We attempt to limit the number of jobs in the idle and blocked portions of the queue to about 4000.
- To help maintain this, we limit to a maximum of 600 jobs in the queue per user.
- This limit has proved to work well to keep the total number of jobs safely below 4000.
- If you have a need to queue up more then 600 jobs, submit your job to the queue named largeq.
- Jobs will be routed to the default queue over time.
- For more information, see the Scheduler Policies page.