GROMACS

This version is outdated by a newer approved version.

This version (2023/05/11 15:42) was approved by msiegel.The Previously approved version (2023/03/10 12:48) is available.

This is an old revision of the document!

Our recommendation: Follow these three steps to get the fastest program:

Use the most recent version of GROMACS that we provide or build your own.
Use the newest Hardware: the partitions zen2_0256_a40x2 or zen3_0512_a100x2 on VSC-5 have plenty nodes available with 2 GPUs each.
Do some performance analysis to decide if a single GPU Node (likely) or multiple CPU Nodes via MPI (unlikely) better suits your problem.

In most cases it does not make sense to run on multiple GPU nodes with MPI; weather using only one or both GPUs per node.

First you have to decide on which hardware GROMACS should run, we call this a partition, described in detail at SLURM. On any login node, type sinfo to get a list of the available partitions. The partition has to be set in the batch script, see the example below. Be aware that each partition has different hardware, so choose the parameters accordingly. GROMACS decides mostly on its own how it wants to work, so don't be surprised if it ignores settings like environment variables.

Write a batch script (example below) including:

some SLURM parameters: the #BATCH … part)
exporting environment variables: e.g. export CUDA_VISIBLE_DEVICES=0
cleaning the environment: module purge
loading modules: load gcc/7.3 …
last but not least starting the program in question: gmx_mpi …

mybatchscript.sh

#!/bin/bash
#SBATCH --job-name=myname
#SBATCH --partition=zen2_0256_a40x2
#SBATCH --qos=zen2_0256_a40x2
#SBATCH --gres=gpu:1
 
unset OMP_NUM_THREADS
export CUDA_VISIBLE_DEVICES=0
 
module purge
module load cuda/11.5.0-gcc-11.2.0-ao7cp7w openmpi/4.1.4-gcc-11.2.0-ub765vm python/3.8.12-gcc-11.2.0-rvq5hov gromacs/2022.2-gcc-11.2.0-4x2vwol
 
gmx_mpi mdrun -s topol.tpr

Type sbatch myscript.sh to submit such your batch script to SLURM. you get the job id, and your job will be scheduled and executed automatically.

There is a whole page dedicated to monitoring the CPU and GPU, for GROMACS the relevant sections are section Live and GPU.

As a short example we ran gmx_mpi mdrun -s topol.tpr with different options, where topol.tpl is just some sample topology, we don't actually care about the result. Without any options GROMACS already runs fine (a). Setting the number of tasks (b) is not needed; if set wrong can even slow the calculation down significantly ( c ) due to over provisioning! We would advice to enforce pinning, in our example it does not show any effects though (d), we assume that the tasks are pinned automatically already. The only further improvement we could get was using the -update gpu option, which puts more load on the GPU (e).

#	cmd	ns / day	cpu load / %	gpu load / %	notes
a	–	160	100	80
b	-ntomp 8	160	100	80
c	-ntomp 16	140	40	70	gromacs warning: over provisioning
d	-pin on	160	100	80
e	-update gpu	170	100	90

GROMACS

GPU Partition

Batch Script

CPU / GPU Load

Performance