Purpose

This document provides information to the members and users of Discovery about the scheduler and the limits imposed on accounts and users to both maximize and efficiently use the cluster.

Fairshare

In the context of this document, fairshare is based on simply the number of processors owned by the PI which are assigned to their cluster account within the scheduler’s configuration. It is used to determine resources available for all the users assigned to their account.

Job Scheduler

The job scheduler software determines when (time) and where(node/processor) all jobs are sent for processing.which processors to send each job to and when. It monitors the entire job queue, prioritizing waiting jobs (queued or not running jobs) based on requested versus available resources and current usage versus fairshare.

Assignments of fairshare allocations for the cluster are made at the level of a Discovery scheduler account.

Multiple user accounts may exist under a stakeholders account.

MOAB Scheduler

The scheduler works to schedule the use of compute processors running batch jobs in the Discovery cluster environment. It updates the status of the queue every 2 minutes. Parameters and policy settings can be tuned to efficiently handle a wide range of system workloads

Moab Scheduler Limits

Limits on the Number of Jobs in the Queue which are enforced by the scheduler.

The largest factors in determining limits on numbers of jobs are the Maximum Processor Second (MaxPS) and the Maximum Processors (MaxPROC) for each account.

The MaxPS is the number of processor core seconds for each account based on fairshare number times 864,000 seconds (one processor core running 24/7 for 10 days).

This is best explained by an example.

  • An account with a fairshare of eight processor cores has a MaxPS of 6,912,000 seconds (1920 hours or 80 days).
  • This account could start eight 10 day one-processor core jobs and use their entire MaxPS.
  • Alternatively, they could start sixteen 5 day one-processor core jobs or four 10 day two-processor core jobs or any other combination adding up to the MaxPS for that account.
  • The number of jobs allowed increases as the jobs are shorter.
  • Once jobs are started, the maximum seconds in use decreases as the remaining time for the job to finish shortens.

The MaxPS has several significant benefits:

  • Very long jobs can run for up to 20 days on the cluster so accounts are allowed to run jobs that allow them to use their fairshare, i.e. a fairshare based on eight processor cores should allow up to 4 one-processor core jobs for the entire 20 days.
  • The MaxPS allows accounts to have burst usage onto processor cores that belong to the fairshare of other users as long as they are not having a long term negative impact on other users.
  • The shorter the job length, the greater the burst.
  • The MaxPS encourages all users to set reasonably accurate Job Wall Clock times.
  • If a user requests 20 days when they only need a day, their usage will be limited by MaxPS.
  • The scheduler cannot know if a job is to complete early, so it must schedule time based on the Job Wall Clock request.

Users should also be aware that they should not set their job’s wallclock time too short since jobs are killed when they run out of wallclock time.

  • A good rule of thumb is to calculate your time and add 20% to create a workable walltime.
  • If you find that your job is running out of time, you can request that time be added.
  • The time added plus the original submitted walltime can not be over 20 days.

The default maximum allowed number of processors, MaxPROC, in use at any one time per Account is set at 4 times the fairshare purchase in any combination of single and multi-processor jobs.

Depending on the user limits, the MaxPS may limit the user to significantly less than that multiple.

Exceptions are also made to adjust this maximum to ensure groups with very large fairshare cannot overwhelm the cluster denying other members access while at the same time ensuring they have fair access to their purchase.

The Principal Investigator (PI) in charge of an individual account may also request upper limits on both MaxPS and MaxPROC of users in that account.

If the account has 3 or more users they will be assigned 1/3 of the resources available to the account, unless otherwise specified by the PI.

Other Limits

The maximum number of running jobs by any single user is set to 600.

Individual user limits in an account is set by the stakeholder and can be modified anytime.

MaxPROC for a single resource Account cannot exceed 80% of the cluster. This is set so a large account can not singly take over the entire cluster

The maximum number of jobs in the default queue running or idle is set to 600.

If you need to submit more then 600 jobs (up to a maximum of 20,000 jobs), you will need to use the largeq, which acts as a funnel to deliver jobs to the default queue when there is space available for the user.

These values tend to increase as processors are added to the cluster.

Please be aware that restrictions may be placed at any time on a user account if jobs are causing any problem with the cluster hardware or are interfering with the jobs of other users.

  • In such cases, we notify the user as soon as possible (although in extreme cases, we must sometimes kill problem jobs before we have made contact with the user).
  • We then work with the user to monitor the progress of their jobs until they can be run normally on the system without causing problems.
  • Many times this means restricting an account to one running job until each new job runs without encountering problems.
  • We will then incrementally increase the number of jobs a user can have running simultaneously, only if their jobs cause no issues at that level.
  • Eventually, we reset the account to their maximum.

Limits can always be adjusted for work with deadlines that need resources beyond the current maximum assigned.

  • If this is the case, the PI should contact the Discovery team and request a reservation.
  • Try to give at least 1 week notice prior to the resource need.
  • We will need that time to have the scheduler reserve the resources and to contact the other members of possible reduced resources for that period.