This is an old revision of the document!
GROMACS
GPU Partition
First you have to decide on which hardware GROMACS should run, we call
this a partition
, described in detail at . On a login node, type
sinfo
to get a list of the available partitions. Be aware that each
setup has different hardware, for example the partition
gpu_gtx1080single
on VSC3 has 1 GPU and single socket à 4 cores,
with 2 hyperthreads each core, listed at GPU Partitions on VSC3.
The partition has to be set in the batch script, see the example
below. Thus here it makes sense to let GROMACS run on 8 threads
(-ntomp 8
), yet it makes little sense to force more threads than
that, as this would lead to oversubscribing. GROMACS decides mostly on
its own how it wants to work, so don't be surprised if it ignores
settings like environment variables.
Batch Script
In order to be scheduled efficiently with SLURM, one writes a shell
script
(see the text file myscript.sh
below) consisting of:
- some SLURM parameters: the
#BATCH …
part) - exporting environment variables:
export CUDA_VISIBLE_DEVICES=0
- cleaning the environment:
module purge
- loading modules:
load gcc/7.3 …
- last but not least starting the program in question:
gmx_mpi …
- myscript.sh
#!/bin/bash #SBATCH --job-name=myname #SBATCH --partition=gpu_gtx1080single #SBATCH --gres=gpu:1 #SBATCH --nodes=1 unset OMP_NUM_THREADS export CUDA_VISIBLE_DEVICES=0 module purge module load gcc/7.3 nvidia/1.0 cuda/10.1.168 cmake/3.15.4 openmpi/4.0.5 python/3.7 gromacs/2021.2_gtx1080 gmx_mpi mdrun -s topol.tpr
Type sbatch myscript.sh
to submit such a batch script to SLURM. you
get the job id, and your job will be scheduled, and executed
automatically.
CPU / GPU Load
There is a whole page dedicated to monitoring the CPU and GPU, for GROMACS the relevant sections are section Live Live and GPU.
Performance
As an example we ran gmx_mpi mdrun -s topol.tpr
with different
options, where topol.tpl
is just some sample topology, we don't
actually care about the result. Without any options GROMACS already
runs fine (a). Setting the number of tasks (b,c) is not needed; if set
wrong can even slow the calculation down significantly (over
provisioning)! Enforcing pinning also does not show any effects (d),
we assume that the tasks are pinned automatically already. The only
improvement we had was using the -update gpu
option, which puts more
load on the GPU. This might not work however if we use more than one
GPU.
# | cmd | ns / day | cpu load / % | gpu load / % | notes |
---|---|---|---|---|---|
a | – | 160 | 100 | 80 | |
b | -ntomp 8 | 160 | 100 | 80 | |
c | -ntomp 16 | 140 | 40 | 70 | gromacs warning: over provisioning |
d | -pin on | 160 | 100 | 80 | |
e | -update gpu | 170 | 100 | 90 |