HPC FAQ - Batch Jobs
Supercomputer documentation is always a work in progress! Please email questions, corrections, or suggestions to the HPC support team at firstname.lastname@example.org as usual. Thanks!
Frequently Asked Questions
- 1. What is a batch job?
- 2. How do I submit a batch job?
- 3. Can I run jobs on the login node?
- 4. What types of jobs must be run in batch?
- 5. Which batch queue should I use?
- 6. How do I check the status of my job?
- 7. Why has my job been Pending for so long?
- 8. How many jobs I can run at once?
- 9. Can I kill a batch job?
- 10. How do I checkpoint or restart a job?
- 11. What nodes is my job running on?
- 12. What are MOAB and SLURM?
- 13. Where can I get more information on MOAB and SLURM?
- 14. How do I do a timing run?
- 15. Can I run a job using multiple-level parallelism?
A batch job is a program or series of programs run on the cluster without manual intervention. Usually a script is used to supply the input data and parameters necessary for the job to run. The script is submitted to the batch schedule will be run when the resources are available. You don't need to specify which node or nodes on which the job will run. The batch system will select a node or nodes appropriately for the job. The batch system also allows for accounting and tracking of jobs. For more information, see Getting Started: Running Jobs.
A batch job is submitted to a batch queue using the sbatch command. For details on the commands for batch execution, see the examples (including batch job scripts) in /share/cluster/examples/dlx on the cluster and Getting Started: Job Scripts.
The login nodes are intended for editing files, compiling code, running short tests, and similar activities. Please your jobs as batch jobs whenever you can. As a special case, interactive jobs up to 120 cpu-minutes may be run in on the login node. If you your job exceeds this limit it may be canceled. The login nodes are shared by all of the cluster users, so any job that does intensive computing, produces heavy I/O, or spawns large number of processes will adversely affect the whole node. Please respect your fellow users!
Almost all jobs must be run through the batch system using the sbatch command. Non-batch jobs on any node except the login node will be killed, unless special permission has been obtained in advance. Send email to the Help HPC list at email@example.com to make arrangements.
Normally you won't specify the queue, but just let the scheduler pick the correct one for you. Be sure to give the scheduler enough information to put your job into the proper queue. Always specify the number of processors (cores) that the jobs needs and the amount of time you expect it to run. There are only a few cases where you should specify the job queue yourself, and these are described in Getting Started: Job Queues.
Use the queue_wcl command to show the names of the queues and and the limits associated with them. The sinfo command also displays information about the queues. Note that the queues are defined for particular types of jobs, to make sure the jobs have the appropriate resources and don't interfere with one another. Please run your job in the appropriate queue! If you have questions, or you need to do something you can't do under the queuing system, please send email to the Help HPC list at firstname.lastname@example.org describing your problem or question.
Use the squeue command to show the status of batch jobs. The default is to show all pending and running jobs. Use squeue -u myloginid to see your own jobs. Use squeue --help or man squeue to get more information about the command.
Use checkjob -v command to see why your job is pending.
There are very likely other pending jobs ahead of yours. You can see the pending jobs with the squeue -t pending command. The batch system manages the flow of jobs and allows jobs to run when the system resources are available to do so in a fair and orderly fashion. When the system is less busy your jobs might execute quickly. When the system is more busy, your jobs might wait for longer periods. This wait will vary based on many factors.
If your job has been pending for longer than you expect, please try to find out why before you submit a trouble report. More often than not, your job is just waiting on the resources you requested. For more information, see Getting Started: Running Jobs
Your job will be scheduled automatically when the resources are available.
Job queues and node allocations have been established to assure equitable distribution and access to the entire complex. See the Getting Started: Quotas for specific information.
Use scancel job_id to terminate a batch job. The time required for the job to actually terminate will vary some, depending on how busy the system and the network are, and on how many parallel processes the job is running. You can use the squeue command to find the jobid (see #6 above).
A checkpoint allows a job to be restarted when it is terminated for some reason outside of its control. There are many reasons why a job might be abnormal terminated. Here are a couple of examples: A hardware or software failure can stop a job before it finishes normally and writes out its results. Also, a job might still be running when a planned outage occurs. Since HPC jobs often run for days or weeks on hundreds of cores, an abnormal termination can result in the loss of lots of work and lots of computing resources. It sometimes causes researchers to miss deadlines. The longer a job takes to run, and the more nodes involved, the more likely a problem becomes.
We strongly urge that all jobs use checkpoints.
A checkpoint / restart facility must be built into the code. It cannot be done automatically by our current operating system or batch system. Every so often, the code must write to disk enough internal state information and intermediate results, so that the job can be restarted from that point. If you write your own code, you are responsible for building this into your code. If you are using a 3rd party package, then please see the application documentation for its checkpoint / restart capability.
The squeue command shows the status of batch jobs (see above).
MOAB is the batch job scheduler that decides which jobs should run next. SLURM is the Resource Manager that allocates compute nodes upon request. Together they submit the jobs, select the most suitable hosts, and interact with the individual tasks of parallel batch jobs.
A batch job is submitted to a queue with the sbatch command. The batch system then attends to all of the details associated with running it. A batch job may not run immediately after being submitted if the resources it needs (usually compute nodes) are not available. The job will wait until it reaches the front of the queue and the resources become available. At that time the batch job will be dispatched to the most suitable host or hosts available for execution.
The vendor's documentation on MOAB is at http://www.clusterresources.com/products/mwm/docs/.
Also see the Lawrence Livermore Moab Quick-Start User Guide at https://computing.llnl.gov/jobs/moab/QuickStartGuide.pdf.
The Lawrence Livermore SLURM: A Highly Scalable Resource Manager is at https://computing.llnl.gov/linux/slurm/.
A timing run is a job used to determine how effectively and efficiently a program performs. The job must to be run on a node that is not shared with any other job, or its timing could be affected by the other job. On some HPC systems, several jobs can share a single node, and exclusive use of a node for a timing run must be scheduled with the sysadmin. This is not an issue here, because the basic compute nodes, the fat nodes, and the GPU nodes are allocated exclusively to a single job by default.
This answer needs revision.A multiple-level parallel job is an MPI job with sub-processes that use parallelized library routines, OpenMP, or routines using loop-parallelism; also known as mixed-mode). Consult the following links more OpenMPI information.
Link for OPENMPI documentation: http://www.open-mpi.org/doc/v1.4/.
LLNL Tutorial: https://computing.llnl.gov/tutorials/mpi/.
Another tutorial: http://www.lam-mpi.org/tutorials/.
MPI FORUM: http://www.mpi-forum.org/.