This is an old revision of the document!
GROMACS
Our recommendation: Follow these four steps—in this order!—to get the fastest program.
- Use the most recent version of GROMACS that we provide or build your own.
- Use the newest Hardware: the partitions
zen2_0256_a40x2
orzen3_0512_a100x2
on VSC5 have plenty nodes available. - Read our article on multi GPU setup and do some performance analysis.
- Run on multiple nodes with MPI; each with 1 GPU
- Additionally use multiple GPUs per node
GPU Partition
First you have to decide on which hardware GROMACS should run, we call
this a partition
, described in detail at
SLURM. On any login node, type sinfo
to get a list of the
available partitions. The partition has to be set in the batch script,
see the example below. Be aware that each partition has different
hardware, so choose the parameters accordingly. GROMACS decides mostly on its own how it wants to
work, so don't be surprised if it ignores settings like environment
variables.
Batch Script
Write a batch script
(example below) including:
- some SLURM parameters: the
#BATCH …
part) - exporting environment variables: e.g.
export CUDA_VISIBLE_DEVICES=0
- cleaning the environment:
module purge
- loading modules:
load gcc/7.3 …
- last but not least starting the program in question:
gmx_mpi …
- mybatchscript.sh
#!/bin/bash #SBATCH --job-name=myname #SBATCH --partition=zen2_0256_a40x2 #SBACTH --qos=zen2_0256_a40x2 #SBATCH --gres=gpu:1 unset OMP_NUM_THREADS export CUDA_VISIBLE_DEVICES=0 module purge module load cuda/11.5.0-gcc-11.2.0-ao7cp7w openmpi/4.1.4-gcc-11.2.0-ub765vm python/3.8.12-gcc-11.2.0-rvq5hov gromacs/2022.2-gcc-11.2.0-4x2vwol gmx_mpi mdrun -s topol.tpr
Type sbatch myscript.sh
to submit such your batch script to
SLURM. you get the job id, and your job will be scheduled and
executed automatically.
CPU / GPU Load
There is a whole page dedicated to monitoring the CPU and GPU, for GROMACS the relevant sections are section Live and GPU.
Performance
As a short example we ran gmx_mpi mdrun -s topol.tpr
with
different options, where topol.tpl
is just some sample topology,
we don't actually care about the result. Without any options GROMACS
already runs fine (a). Setting the number of tasks (b) is not needed;
if set wrong can even slow the calculation down significantly ( c ) due
to over provisioning! We would advice to enforce pinning, in our
example it does not show any effects though (d), we assume that the
tasks are pinned automatically already. The only further improvement
we could get was using the -update gpu
option, which puts more
load on the GPU (e).
# | cmd | ns / day | cpu load / % | gpu load / % | notes |
---|---|---|---|---|---|
a | – | 160 | 100 | 80 | |
b | -ntomp 8 | 160 | 100 | 80 | |
c | -ntomp 16 | 140 | 40 | 70 | gromacs warning: over provisioning |
d | -pin on | 160 | 100 | 80 | |
e | -update gpu | 170 | 100 | 90 |