HPC Getting Started: Job Queues
High Performance Computing jobs often have long (or very long) run times and sometimes use many computing nodes. In addition, many researchers share our cluster, so the resources to run a job may not be available immediately. To share the cluster in as fair and equitable manner as possible, the batch system dispatches each job according to its priority, when the job is eligible and the resources to run it are available. To do this, several job queues have been set up. In general, jobs that need many computing nodes are only allowed a short run time, which prevents one job from hogging a large fraction of the cluster for a long time. Jobs that only need a small number of nodes, are allowed much longer run times.
When submitting a job to the batch system, be sure to give the scheduler enough information to put it into the proper queue. You should always specify the number of processors (cores) that the jobs needs and the amount of time you expect it to run. Normally you won't specify the queue, but just let the scheduler pick the correct one. There are only a few cases where you should specify the job queue yourself, and these are noted below.
In general, the more cores and the more run time a job needs, the lower its priority will be. A job needing just a few nodes and a day or two of run time is likely to be dispatched very quickly. Jobs needing hundreds of cores might wait in the queue for a little while longer.
When to specify a job queue
- Short runs to check a program and syntax may use the debug queue. These jobs run on a single node with a short run time limit.
- Gaussian jobs should specify the gauss queue.
- Jobs with large memory requirements should specify the FatComp queue.
- GPU enabled jobs should specify the GPU queue.
Job quotas help ensure that each user and research group gets a fair share of time on the cluster. A job will be ineligible to run whenever the researcher or research group is over quota. See HPC Getting Started: Quotas for more information.
Submit a job
Use the sbatch command to submit your job. Specify the resources it needs either on the command or in the job script. For more information on job scripts, see the HPC Getting Started: Job Scripts page. For example, if you are debugging a job with the job script abc.sh, and you expect it to run for less than 30 minutes, then you might enter:
After you submit the job, use the command checkjob -v job_id to see its status.
Job Queue Limits
All of the queues have wall clock time limits and processor core limits. As discussed above, you will normally just specify the time and cores you need, except in a few cases. To see all the queues currently defined and their limits, use the queue_wcl command. The queues and their limits are subject to change at any time.