Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revisionBoth sides next revision
doku:gromacs [2014/11/04 15:35] shdoku:gromacs [2022/07/15 11:35] – added recommendations msiegel
Line 1: Line 1:
-===== GROMACS 5.0.1 ===== +====== GROMACS ======
-==== Installation VSC-1 (MPI-parallel) ====+
  
-1. Make sure we use a recent version of the Intel MPI toolchain. +Our recommendation: Follow these four steps --- in this order! --- to get the fastest program.
-<code> +
-mpi-selector --query +
-mpi-selector --list  +
-... in case  +
-mpi-selector --set intel_mpi_intel64-4.1.1.036 +
-exit +
-relogin +
-</code>+
  
-2. Follow [[http://www.gromacs.org/Documentation/Installation_Instructions_4.5/Cmake|Instructions-Cmake]] +  Use the most recent version of GROMACS that we provide or build your own
-and prepare for installation, ie create two separate directories, one for building the other for the actual installation+  - Use the newest Hardwarethe partitions ''gpu_a40dual'' or ''gpu_gtx1080single'' on VSC3 have plenty nodes available
-<code> +  Read our article on multi GPU setup and do some performance analysis
-cd /opt/sw/ +  Run on multiple nodes with MPI; each with GPU 
-wget ftp://ftp.gromacs.org/pub/gromacs/gromacs-5.0.1.tar.gz +  - Additionally use multiple GPUs per node
-gunzip ./gromacs-5.0.1.tar.gz +
-tar xvf ./gromacs-5.0.1.tar +
-gzip gromacs-5.0.1.tar +
-mv ./gromacs-5.0.1 ./gromacs-5.0.1_mpi +
-chown -R root ./gromacs-5.0.1_mpi  +
-chgrp -R root ./gromacs-5.0.1_mpi +
-rm -rf ./gromacs-5.0.1_mpi_build   ( delete all previous attempts and restart from scratch ) +
-mkdir  ./gromacs-5.0.1_mpi_build +
-cd ./gromacs-5.0.1_mpi_build +
-</code>+
  
-3. Download the latest version of ''fftw'' and install single and 
-double precision versions of it. For future CUDA compatibility   
-''gcc'' is preferred over ''icc'' for this step. 
-<code> 
-cd /opt/sw/fftw 
-cp from/where/ever/it/is/fftw-3.3.4.tar.gz  ./ 
-gunzip ./fftw-3.3.4.tar.gz 
-tar xvf ./fftw-3.3.4.tar 
-gzip fftw-3.3.4.tar 
-cd ./fftw-3.3.4 
-./configure --help 
-export CC=gcc 
-export F77=gfortran 
-./configure --prefix=/opt/sw/fftw/fftw-3.3.4 --enable-shared --enable-single --enable-openmp  --enable-sse2 
-make clean 
-make 
-make install 
-</code> 
  
-4. Do the CMAKE build, +===== GPU Partition =====
-<code> +
-cd /opt/sw/gromacs-5.0.1_mpi_build +
-export CMAKE_PREFIX_PATH=/opt/sw/fftw/fftw-3.3.4 +
-export FFTW_LOCATION=/opt/sw/fftw/fftw-3.3.4 +
-export CC=mpiicc +
-export CXX=mpiicpc +
-/opt/sw/cmake/2.8.11/bin/cmake -DGMX_GPU=OFF -DFFTW3F_INCLUDE_DIR=$FFTW_LOCATION/include -DFFTW3F_LIBRARIES=$FFTW_LOCATION/lib/libfftw3f.a -DCMAKE_INSTALL_PREFIX=/opt/sw/gromacs-5.0.1_mpi -DGMX_X11=OFF -DCMAKE_C_COMPILER=mpiicc -DCMAKE_CXX_COMPILER=mpiicpc -DGMX_MPI=ON -DGMX_PREFER_STATIC_LIBS=ON ../gromacs-5.0.1_mpi  >&   ./cmake.log +
-make +
-make install   ( only this step creates all the executables in /opt/sw/gromacs-5.0.1_mpi/bin, as symbolic links though ) +
-</code>+
  
 +First you have to decide on which hardware GROMACS should run, we call
 +this a ''partition'', described in detail at [[doku:slurm |
 +SLURM]]. On any login node, type ''sinfo'' to get a list of the
 +available partitions. The partition has to be set in the batch script,
 +see the example below. Be aware that each partition has different
 +hardware, for example the partition ''gpu_gtx1080single'' on VSC3 has
 +1 GPU and a single socket à 4 cores, with 2 hyperthreads each core,
 +listed at [[doku:vsc3gpuqos | GPU Partitions on VSC3]].  Thus here it
 +makes sense to let GROMACS run on 8 threads (''-ntomp 8''), yet it
 +makes little sense to force more threads than that, as this would lead
 +to oversubscribing. GROMACS decides mostly on its own how it wants to
 +work, so don't be surprised if it ignores settings like environment
 +variables.
  
 +===== Batch Script =====
  
 +Write a ''batch script'' (example below) including:
 + 
 +  * some SLURM parameters: the ''#BATCH ...'' part)
 +  * exporting environment variables: e.g. ''export CUDA_VISIBLE_DEVICES=0''
 +  * cleaning the environment: ''module purge''
 +  * loading modules: ''load gcc/7.3 ...''
 +  * last but not least starting the program in question: ''gmx_mpi ...''
  
 +<code bash mybatchscript.sh>
 +#!/bin/bash
 +#SBATCH --job-name=myname
 +#SBATCH --partition=gpu_gtx1080single
 +#SBATCH --gres=gpu:1
 +#SBATCH --nodes=1
  
 +unset OMP_NUM_THREADS
 +export CUDA_VISIBLE_DEVICES=0
  
 +module purge
 +module load gcc/7.3 nvidia/1.0 cuda/10.1.168 cmake/3.15.4 openmpi/4.0.5 python/3.7 gromacs/2021.2_gtx1080
  
 +gmx_mpi mdrun -s topol.tpr
 +</code>
  
-==== GROMACS with gpu support ====+Type ''sbatch myscript.sh'' to submit such your batch script to 
 +[[doku:SLURM]]. you get the job id, and your job will be scheduled and 
 +executed automatically.
  
-up to version 4.5.x openmm is needed for gpu support. For versions from 4.6.x on gpu support is included in gromaces source already. 
  
-On VSC-1 Gromacs was built with this cmake command (specifying mpiiccc and mpiicpc did not work here, -DGMX_MPI=ON is enough for compiling with MPI support):+===== CPU / GPU Load =====
  
-<code>+There is a whole page dedicated to [[doku:monitoring]] the CPU and 
 +GPU, for GROMACS the relevant sections are section 
 +[[doku:monitoring#Live]] and [[doku:monitoring#GPU]].
  
-CC=icc CXX=icpc /opt/sw/cmake/2.8.11/bin/cmake -DGMX_GPU=ON -DGMX_MPI=ON -DCMAKE_INSTALL_PREFIX=/opt/sw/gromacs/4.6.5_gpu -DFFTWF_LIBRARY=/opt/sw/fftw/3.3.3_sp/lib/libfftw3f.so -DFFTWF_INCLUDE_DIR=/opt/sw/fftw/3.3.3_sp/include ..  
-</code> 
  
 +===== Performance =====
  
-There might be problem coming from CUDA, where the intel compiler version is checked. In this case uncomment this part in the /opt/sw/cuda-5.5/include/host_config.h  file+There is whole article about the [[doku:gromacs_multi_gpu|Performance of GROMACS on multi GPU systems]].
  
-<code> +As a short example we ran ''gmx_mpi mdrun -s topol.tpr'' with 
-//#if defined(__ICC)+different options, where ''topol.tpl'' is just some sample topology, 
 +we don't actually care about the result. Without any options GROMACS 
 +already runs fine (a). Setting the number of tasks (b) is not needed; 
 +if set wrong can even slow the calculation down significantly ( c ) due 
 +to over provisioning! We would advice to enforce pinning, in our 
 +example it does not show any effects though (d), we assume that the 
 +tasks are pinned automatically already. The only further improvement 
 +we could get was using the ''-update gpu'' option, which puts more 
 +load on the GPU (e).
  
-//#if !(__INTEL_COMPILER == 9999 && __INTEL_COMPILER_BUILD_DATE == 20110811) || !defined(__GNUC__) || !defined(__LP64__)+^ # ^ cmd         ^ ns day ^ cpu load % ^ gpu load / % ^ notes                               ^ 
 +| a | --          | 160    | 100   | 80        |                                    | 
 +| b | -ntomp 8    | 160    | 100   | 80        |                                    | 
 +| c | -ntomp 16   | 140    | 40    | 70        | gromacs warning: over provisioning | 
 +| d | -pin on     | 160    | 100   | 80        |                                    | 
 +| e | -update gpu | 170    100   90                                           |
  
-//#error -- unsupported ICC configuration! Only ICC 12.1 on Linux x86_64 is supported! 
  
-//#endif /* !(__INTEL_COMPILER == 9999 && __INTEL_COMPILER_BUILD_DATE == 20110811) || !__GNUC__ || !__LP64__ */+==== GROMACS2020 ====
  
-//#endif /* __ICC */  +The following environment variables need to be set with Gromacs2020 
-</code>+when using multiple GPUs: It is not necessary to set these variables 
 +for Gromacs2021 onwards; they are already included and setting them 
 +explicitly might actually decrease performance again.
  
 +<code bash>
 +export GMX_GPU_PME_PP_COMMS=true
 +export GMX_GPU_DD_COMMS=true
 +export GMX_GPU_FORCE_UPDATE_DEFAULT_GPU=true
 +</code>
  • doku/gromacs.txt
  • Last modified: 2023/11/23 12:27
  • by msiegel