HPC NAMD README
Supercomputer documentation is always a work in progress! Please email questions, corrections, or suggestions to the HPC support team at firstname.lastname@example.org as usual. Thanks!
See also the NAMD Frequently Asked Questions list.
Getting Started with NAMD
NAMD is a parallel molecular dynamics code designed for high-performance computing. For any system with a parameterized force field, NAMD can be used to compute the evolution of forces on atoms over time. NAMD is maintained by a dedicated group at the University of Illinois at Urbana-Champaign who regularly provide updates implementing some of the most current molecular simulation techniques available. For information, see the UIUC NAMD page. The Theoretical and Computational Biophysics Group at UIUC maintains an extensive set of tutorials designed to help users both create the necessary simulation input files and perform appropriate data analysis. To take one of these tutorials, see the UIUC NAMD Tutorials page. Users are encouraged to familiarize themselves with the necessary input files through these tutorials.
At this writing, two versions of NAMD have been compiled for use on the cluster, NAMD 2.9 and NAMD-2.9 with CUDA. NAMD 2.9 is for use on the basic compute nodes, NAMD 2.9 with CUDA for the GPU enabled nodes. GPU usage will be discussed below. Each of these versions is available through the module load command. To see which NAMD modules are available at any time, use the module avail namd command. To see which modules you have installed in your current login session, use the module list command. For more information, see the module command web page.
If you do not currently have the NAMD module you need loaded, you can load it for the current session with the module load NAMD/2.9 or the module load NAMD/2.9-cuda commands.
To load the NAMD module automatically each time you login, add the module load command to your login profile (.bashrc, .bash_profile, .login, .cshrc, or whatever). These are hidden files in your home directory; type the command ls –al to see the hidden files. Unless you have changed it, you will be using the bash shell, and you would edit the .bashrc or .bash_profile files. Here is an example .bashrc file:
# User specific aliases and functions
# Source global definitions
if [ -f /etc/bashrc ]; then
# load any desired persistent modules here; see man module
# eg module load foo
# to see modules loaded by default run: module list
#module load amber/9/openmpi/default
if [ -n "$MODULESHOME" ]; then
module load NAMD/2.9
Once you add the module load command to your login profile, issue the source ~/.bashrc command. Now, each time you login the module will be loaded automatically. Use the module list or the which namd2 commands to make sure the NAMD module has actually loaded. If not, something has gone awry!
Running NAMD 2.9 (for CPUs) on the cluster
Prepare all of the necessary input files as described in the NAMD tutorials in a directory on your scratch drive (/scratch/userid). Since /scratch is not backed up, make sure you have a copy of these in your /home directory. At a minimum, you will need
- a configuration file, md.inp, containing the NAMD commands to run your simulation
- an atomic coordinate file
- a protein structure file (PSF) specifying bonds, charges, and so on
- your batch script telling the scheduler what to do with your job
NAMD 2.9 should be run on the basic compute nodes, which have 64 GB of memory and 16 cores per node. Except in very rare situations, you should not require more memory than this. Per good practice, a minimum of 16 cores should be requested (that is, all the cores of node) when you run your simulation. We’ll talk about job performance (scaling) shortly.
You will prepare a batch script to tell the scheduler what resources you are requesting and what to do with your files. Here is an example batch script:
#SBATCH --time=24:00:00 # Wall clock time (HH:MM:SS)
#SBATCH -n 32 # Number of processors (2 nodes)
#SBATCH --job-name myjob # Name of job (anything)
#SBATCH --mail-type=ALL # Send email when job starts/ends/dies
#SBATCH --email@example.com # Put your email here
mpirun –np 32 namd2 md.inp > run.out
This is a very basic batch script. It tells the batch system to schedule 2 nodes for 24 hrs to run your simulation. The line that actually runs your job is mpirun namd2 md.inp > run.out. The mpirun portion tells the scheduler to run the parallel executable namd2 with input file md.inp. The output is directed to a file run.out. The md.inp file must also contain the locations (paths) of your other two input files. The batch script needs to be in the same directory as the md.inp file, unless you explicitly specify the path to md.inp in the script. Once you are happy with your input files, you can queue the batch script for execution with the sbatch myscript.sh command.
Routinely monitor your job’s progress to ensure it is actively writing output to the run.out file, which is your simulation log file. Under normal circumstances, NAMD will automatically end simulations producing errors. However, if you need to manually kill a job, use the scancel jobid. The scancel command will tell you the jobid when the job is queued, and you can determine it later by with the showq –u userid command.
Running NAMD 2.9-CUDA (for GPUs) on DLX
Good Practices in Running NAMD (or any application)
In general, we should try to avoid angering our fellow computational scientists as well as make the most efficient use of a sophisticated computational resource. Here, we present some basic good practices for computing on HPC resources. While this is located under the NAMD documentation, these are general practices that apply to any HPC application.
Location, location, location…
Most applications, and especially NAMD, should be run from the scratch disk. Generally, computers are designed so that the scratch drive is significantly more responsive for I/O. This means your job will run faster (it can read and write input and output faster), and you won’t be constantly accessing the home drive where your fellow scientists may be trying to work. While this is not exactly true of the DLX architecture, nearly every other HPC resource is designed this way, so you should be prepared to operate in this fashion.
Running on the login nodes
Don't do it. The cluster doesn’t prohibit you from running your jobs on the login nodes, but it is very bad practice to do so. This will really make your friends angry, not to mention the system administrators, as it makes the system very slow to respond to even the most basic commands. In general, it is acceptable to run single processor jobs very briefly on the login nodes to test if your configuration files are in order before you submit them to the queue. However, it is never acceptable to run parallel jobs on the login node.
Efficient use of community resources (scaling)
Performance of molecular dynamics codes does NOT scale linearly with number of processors. This means that choosing twice the number of processors does not necessarily mean the simulation will finish twice as fast. Performance is dependent on a number of factors such as number of atoms, choice of non-bonded cutoffs, and number of force evaluations per time step (i.e. multiple timestepping). For each different type of simulation you perform, you should determine ideal number of processors on which to run the job. To do this, set up several very short (~10,000 steps) simulations of the exact same system. Run the simulation several times on different numbers of processors. Start by trying 32, 48, 64, 80, and 96 processors. At the end of each log file (run.out) you will find your simulation statistics, on the line with “Wallclock time”. The time is reported in seconds. Once all your simulations have finished, you should plot the data (wallclock time vs. number of processors), an example of which is shown below. In the example, performance starts to decline beyond 64 processors. You should choose the number of processors immediately preceding a plateau in performance. For this particular simulation, we would choose to run on 64 processors.
This information can also be used to estimate the amount of time required to run your simulation. As shown on the right axis, you can determine the number of ns/day the simulation should produce (from your number of steps performed) as a function of number of processors. If we were to run our simulation on 64 processors, we should be able to get ~4 ns/day. If we want a 10 ns simulation, we would set the wallclock time in our batch script to a minimum of 60 hrs. You may want to consider an additional “buffer” of 4-8 hours to ensure the job runs to completion.
Christina M. Payne
Assistant Professor of Chemical Engineering
University of Kentucky
for preparing this documentation.