This version is outdated by a newer approved version.DiffThis version (2022/07/15 11:36) was approved by msiegel.The Previously approved version (2022/07/15 11:35) is available.Diff

This is an old revision of the document!


GROMACS

Our recommendation: Follow these four steps—in this order!—to get the fastest program.

  1. Use the most recent version of GROMACS that we provide or build your own.
  2. Use the newest Hardware: the partitions gpu_a40dual or gpu_gtx1080single on VSC3 have plenty nodes available.
  3. Read our article on multi GPU setup and do some performance analysis.
  4. Run on multiple nodes with MPI; each with 1 GPU
  5. Additionally use multiple GPUs per node

First you have to decide on which hardware GROMACS should run, we call this a partition, described in detail at SLURM. On any login node, type sinfo to get a list of the available partitions. The partition has to be set in the batch script, see the example below. Be aware that each partition has different hardware, for example the partition gpu_gtx1080single on VSC3 has 1 GPU and a single socket à 4 cores, with 2 hyperthreads each core, listed at GPU Partitions on VSC3. Thus here it makes sense to let GROMACS run on 8 threads (-ntomp 8), yet it makes little sense to force more threads than that, as this would lead to oversubscribing. GROMACS decides mostly on its own how it wants to work, so don't be surprised if it ignores settings like environment variables.

Write a batch script (example below) including:

  • some SLURM parameters: the #BATCH … part)
  • exporting environment variables: e.g. export CUDA_VISIBLE_DEVICES=0
  • cleaning the environment: module purge
  • loading modules: load gcc/7.3 …
  • last but not least starting the program in question: gmx_mpi …
mybatchscript.sh
#!/bin/bash
#SBATCH --job-name=myname
#SBATCH --partition=gpu_gtx1080single
#SBATCH --gres=gpu:1
#SBATCH --nodes=1
 
unset OMP_NUM_THREADS
export CUDA_VISIBLE_DEVICES=0
 
module purge
module load gcc/7.3 nvidia/1.0 cuda/10.1.168 cmake/3.15.4 openmpi/4.0.5 python/3.7 gromacs/2021.2_gtx1080
 
gmx_mpi mdrun -s topol.tpr

Type sbatch myscript.sh to submit such your batch script to SLURM. you get the job id, and your job will be scheduled and executed automatically.

There is a whole page dedicated to monitoring the CPU and GPU, for GROMACS the relevant sections are section Live and GPU.

There is a whole article about the Performance of GROMACS on multi GPU systems.

As a short example we ran gmx_mpi mdrun -s topol.tpr with different options, where topol.tpl is just some sample topology, we don't actually care about the result. Without any options GROMACS already runs fine (a). Setting the number of tasks (b) is not needed; if set wrong can even slow the calculation down significantly ( c ) due to over provisioning! We would advice to enforce pinning, in our example it does not show any effects though (d), we assume that the tasks are pinned automatically already. The only further improvement we could get was using the -update gpu option, which puts more load on the GPU (e).

# cmd ns / day cpu load / % gpu load / % notes
a 160 100 80
b -ntomp 8 160 100 80
c -ntomp 16 140 40 70 gromacs warning: over provisioning
d -pin on 160 100 80
e -update gpu 170 100 90

The following environment variables need to be set with Gromacs2020 when using multiple GPUs: It is not necessary to set these variables for Gromacs2021 onwards; they are already included and setting them explicitly might actually decrease performance again.

export GMX_GPU_PME_PP_COMMS=true
export GMX_GPU_DD_COMMS=true
export GMX_GPU_FORCE_UPDATE_DEFAULT_GPU=true
  • doku/gromacs.1657884973.txt.gz
  • Last modified: 2022/07/15 11:36
  • by msiegel