Differences

This shows you the differences between two versions of the page.

--- doku:gromacs [2014/11/04 15:35] – sh
+++ doku:gromacs [2022/07/15 11:35] – added recommendations msiegel
@@ Line 1: / Line 1: @@
-===== GROMACS 5.0.1 =====
+====== GROMACS ======
-==== Installation VSC-1 (MPI-parallel) ====
-. Make sure we use a recent version of the Intel MPI toolchain.
+Our recommendation: Follow these four steps --- in this order! --- to get the fastest program.
-<code>
-mpi-selector --query
-mpi-selector --list
-... in case
-mpi-selector --set intel_mpi_intel64-4.1.1.036
-exit
-relogin
-</code>
-. Follow [[http://www.gromacs.org/Documentation/Installation_Instructions_4.5/Cmake|Instructions-Cmake]]
+  - Use the most recent version of GROMACS that we provide or build your own.
-and prepare for installation, ie create two separate directories, one for building the other for the actual installation.
+  - Use the newest Hardware: the partitions ''gpu_a40dual'' or ''gpu_gtx1080single'' on VSC3 have plenty nodes available.
-<code>
+  - Read our article on multi GPU setup and do some performance analysis.
-cd /opt/sw/
+  - Run on multiple nodes with MPI; each with 1 GPU
-wget ftp://ftp.gromacs.org/pub/gromacs/gromacs-5.0.1.tar.gz
+  - Additionally use multiple GPUs per node
-gunzip ./gromacs-5.0.1.tar.gz
-tar xvf ./gromacs-5.0.1.tar
-gzip gromacs-5.0.1.tar
-mv ./gromacs-5.0.1 ./gromacs-5.0.1_mpi
-chown -R root ./gromacs-5.0.1_mpi
-chgrp -R root ./gromacs-5.0.1_mpi
-rm -rf ./gromacs-5.0.1_mpi_build   ( delete all previous attempts and restart from scratch )
-mkdir  ./gromacs-5.0.1_mpi_build
-cd ./gromacs-5.0.1_mpi_build
-</code>
-. Download the latest version of ''fftw'' and install single and
-double precision versions of it. For future CUDA compatibility
-''gcc'' is preferred over ''icc'' for this step.
-<code>
-cd /opt/sw/fftw
-cp from/where/ever/it/is/fftw-3.3.4.tar.gz  ./
-gunzip ./fftw-3.3.4.tar.gz
-tar xvf ./fftw-3.3.4.tar
-gzip fftw-3.3.4.tar
-cd ./fftw-3.3.4
-./configure --help
-export CC=gcc
-export F77=gfortran
-./configure --prefix=/opt/sw/fftw/fftw-3.3.4 --enable-shared --enable-single --enable-openmp  --enable-sse2
-make clean
-make
-make install
-</code>
-. Do the CMAKE build,
+===== GPU Partition =====
-<code>
-cd /opt/sw/gromacs-5.0.1_mpi_build
-export CMAKE_PREFIX_PATH=/opt/sw/fftw/fftw-3.3.4
-export FFTW_LOCATION=/opt/sw/fftw/fftw-3.3.4
-export CC=mpiicc
-export CXX=mpiicpc
-/opt/sw/cmake/2.8.11/bin/cmake -DGMX_GPU=OFF -DFFTW3F_INCLUDE_DIR=$FFTW_LOCATION/include -DFFTW3F_LIBRARIES=$FFTW_LOCATION/lib/libfftw3f.a -DCMAKE_INSTALL_PREFIX=/opt/sw/gromacs-5.0.1_mpi -DGMX_X11=OFF -DCMAKE_C_COMPILER=mpiicc -DCMAKE_CXX_COMPILER=mpiicpc -DGMX_MPI=ON -DGMX_PREFER_STATIC_LIBS=ON ../gromacs-5.0.1_mpi  >&   ./cmake.log
-make
-make install   ( only this step creates all the executables in /opt/sw/gromacs-5.0.1_mpi/bin, as symbolic links though )
-</code>
+First you have to decide on which hardware GROMACS should run, we call
+this a ''partition'', described in detail at [[doku:slurm |
+SLURM]]. On any login node, type ''sinfo'' to get a list of the
+available partitions. The partition has to be set in the batch script,
+see the example below. Be aware that each partition has different
+hardware, for example the partition ''gpu_gtx1080single'' on VSC3 has
+GPU and a single socket à 4 cores, with 2 hyperthreads each core,
+listed at [[doku:vsc3gpuqos | GPU Partitions on VSC3]].  Thus here it
+makes sense to let GROMACS run on 8 threads (''-ntomp 8''), yet it
+makes little sense to force more threads than that, as this would lead
+to oversubscribing. GROMACS decides mostly on its own how it wants to
+work, so don't be surprised if it ignores settings like environment
+variables.
+===== Batch Script =====
+Write a ''batch script'' (example below) including:
+  * some SLURM parameters: the ''#BATCH ...'' part)
+  * exporting environment variables: e.g. ''export CUDA_VISIBLE_DEVICES=0''
+  * cleaning the environment: ''module purge''
+  * loading modules: ''load gcc/7.3 ...''
+  * last but not least starting the program in question: ''gmx_mpi ...''
+<code bash mybatchscript.sh>
+#!/bin/bash
+#SBATCH --job-name=myname
+#SBATCH --partition=gpu_gtx1080single
+#SBATCH --gres=gpu:1
+#SBATCH --nodes=1
+unset OMP_NUM_THREADS
+export CUDA_VISIBLE_DEVICES=0
+module purge
+module load gcc/7.3 nvidia/1.0 cuda/10.1.168 cmake/3.15.4 openmpi/4.0.5 python/3.7 gromacs/2021.2_gtx1080
+gmx_mpi mdrun -s topol.tpr
+</code>
-==== GROMACS with gpu support ====
+Type ''sbatch myscript.sh'' to submit such your batch script to
+[[doku:SLURM]]. you get the job id, and your job will be scheduled and
+executed automatically.
-up to version 4.5.x openmm is needed for gpu support. For versions from 4.6.x on gpu support is included in gromaces source already.
-On VSC-1 Gromacs was built with this cmake command (specifying mpiiccc and mpiicpc did not work here, -DGMX_MPI=ON is enough for compiling with MPI support):
+===== CPU / GPU Load =====
-<code>
+There is a whole page dedicated to [[doku:monitoring]] the CPU and
+GPU, for GROMACS the relevant sections are section
+[[doku:monitoring#Live]] and [[doku:monitoring#GPU]].
-CC=icc CXX=icpc /opt/sw/cmake/2.8.11/bin/cmake -DGMX_GPU=ON -DGMX_MPI=ON -DCMAKE_INSTALL_PREFIX=/opt/sw/gromacs/4.6.5_gpu -DFFTWF_LIBRARY=/opt/sw/fftw/3.3.3_sp/lib/libfftw3f.so -DFFTWF_INCLUDE_DIR=/opt/sw/fftw/3.3.3_sp/include ..
-</code>
+===== Performance =====
-There might be a problem coming from CUDA, where the intel compiler version is checked. In this case uncomment this part in the /opt/sw/cuda-5.5/include/host_config.h  file
+There is a whole article about the [[doku:gromacs_multi_gpu|Performance of GROMACS on multi GPU systems]].
-<code>
+As a short example we ran ''gmx_mpi mdrun -s topol.tpr'' with
-//#if defined(__ICC)
+different options, where ''topol.tpl'' is just some sample topology,
+we don't actually care about the result. Without any options GROMACS
+already runs fine (a). Setting the number of tasks (b) is not needed;
+if set wrong can even slow the calculation down significantly ( c ) due
+to over provisioning! We would advice to enforce pinning, in our
+example it does not show any effects though (d), we assume that the
+tasks are pinned automatically already. The only further improvement
+we could get was using the ''-update gpu'' option, which puts more
+load on the GPU (e).
-//#if !(__INTEL_COMPILER == 9999 && __INTEL_COMPILER_BUILD_DATE == 20110811) || !defined(__GNUC__) || !defined(__LP64__)
+^ # ^ cmd         ^ ns / day ^ cpu load / % ^ gpu load / % ^ notes                               ^
+| a | --          | 160    | 100   | 80        |                                    |
+| b | -ntomp 8    | 160    | 100   | 80        |                                    |
+| c | -ntomp 16   | 140    | 40    | 70        | gromacs warning: over provisioning |
+| d | -pin on     | 160    | 100   | 80        |                                    |
+| e | -update gpu | 170    | 100   | 90        |                                    |
-//#error -- unsupported ICC configuration! Only ICC 12.1 on Linux x86_64 is supported!
-//#endif /* !(__INTEL_COMPILER == 9999 && __INTEL_COMPILER_BUILD_DATE == 20110811) || !__GNUC__ || !__LP64__ */
+==== GROMACS2020 ====
-//#endif /* __ICC */
+The following environment variables need to be set with Gromacs2020
-</code>
+when using multiple GPUs: It is not necessary to set these variables
+for Gromacs2021 onwards; they are already included and setting them
+explicitly might actually decrease performance again.
+<code bash>
+export GMX_GPU_PME_PP_COMMS=true
+export GMX_GPU_DD_COMMS=true
+export GMX_GPU_FORCE_UPDATE_DEFAULT_GPU=true
+</code>