HPC Getting Started: Job Scripts
A job script or batch script is a Linux shell script that will execute your code (or codes) when you submit it to the batch system. It often has options (often called pragma or directives), as well, that tell the batch system what resources it needs.
See also the Batch Job FAQ list.
A job script might look something like this:
#SBATCH -n 16
#SBATCH -t 00:30:00
setenv PKGHOME /usr/local/pkg9
module load pkg/9
- The first line specifies the shell, usually the bash shell. You may use more sophisticated scripting languages, such as Python, PHP, or Ruby.
- Then come any SLURM options. These shell comments are interpreted by the batch system.
- If your software needs any environmental variables, then set them.
- If your code requires any modules to be loaded, do that before executing the code. If the modules are loaded at login by your .bash file, then you do not need to load them your the job scripts.
- Finally, execute the program (or programs). You may also create directories, transfer files, and so on.
Use the sbatch command to submit your job. You can specify directives on the sbatch command (for example, sbatch -n 16 -t 5:00:00). When a directive is specified in both places, the value on the sbatch command will be used.
Here are some useful directives:
#SBATCH -n nnn
Request nnn processors (cores). This should normally be a multiple of the number of processors on a node (16 for compute nodes and 32 for fat nodes). For example: #SBATCH -n 32
#SBATCH -t time
Request a run time. The format is hh:mm:ss or ddd-hh:mm:ss For example, #SBATCH -t 40:00:00 wil request 40 hours.
#SBATCH -p qname
Put the job in this queue (partition). For example, #SBATCH -n debug
#SBATCH -J name
The name will appear along with the job id number when querying running jobs on the system. The default is the name of the batch script, or just "sbatch" when the script is read from standard input. For example, #SBATCH -J=Chem01
#SBATCH -o fname
The script’s standard output ill be put in fname. The default is to put both standard output and standard error into a named slurm-nnn.out", where the nnn is the job number. For example, #SBATCH -o=Chem01.out
#SBATCH -e fname
The script’s standard error will be put in fname. The default is to put both standard output and standard error into a named slurm-nnn.out", where the nnn is the job number. For example, #SBATCH -e=Chem01.err
Notify user by email when certain event types occur. Valid type values are BEGIN, END, FAIL, REQUEUE, and ALL (any state change). The user to be notified is indicated with --mail-user. For example, #SBATCH --mail-type ALL
Email address of user to notify (see --mail-type). The default is the submitting user, which will go to the address in your .forward file. For example, #SBATCH --mail-user firstname.lastname@example.org
Use the man sbatch command to get more information about sbatch and its options.
- Run your job from the scratch disk (~/scratch) for the best performance, especially if it does any significant amount of disk I/O.
- You will normally use either OpenMPI or the Intel MPI to run your job across more than one node. Put the mpirun command in your job script.
- Some jobs may require more memory and the cores will have to be spread out on more nodes. See slurm documentation on how to do this.
- If you have a lot of single node jobs or serial jobs that will each run for approximately the same amount of time you can bundle them. If you jobs vary greatly in time contact a consultant on how to bundle them.