Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
doku:gromacs [2023/05/17 13:05] – msiegel | doku:gromacs [2023/11/23 12:27] (current) – [Many ranks on many nodes with many GPUs] msiegel | ||
---|---|---|---|
Line 3: | Line 3: | ||
Our recommendation: | Our recommendation: | ||
- | - Use the most **recent version** of GROMACS that we provide or build your own. | + | - Use the **most recent version** of GROMACS that we provide or build your own. |
- Use the newest Hardware: use **1 GPU** on the partitions '' | - Use the newest Hardware: use **1 GPU** on the partitions '' | ||
- Do some **performance analysis** to decide if a single GPU Node (likely) or multiple CPU Nodes via MPI (unlikely) better suits your problem. | - Do some **performance analysis** to decide if a single GPU Node (likely) or multiple CPU Nodes via MPI (unlikely) better suits your problem. | ||
Line 9: | Line 9: | ||
In most cases it does not make sense to run on multiple GPU nodes with MPI; Whether using one or two GPUs per node. | In most cases it does not make sense to run on multiple GPU nodes with MPI; Whether using one or two GPUs per node. | ||
- | ===== GPU Partition ===== | + | ===== CPU or GPU Partition? ===== |
First you have to decide on which hardware GROMACS should run, we call this a '' | First you have to decide on which hardware GROMACS should run, we call this a '' | ||
Line 15: | Line 15: | ||
===== Installations ===== | ===== Installations ===== | ||
- | We provide the following GROMACS installations: | + | Type '' |
- | * '' | + | Because of the low efficiency of GROMACS on many nodes with many GPUs via MPI, we do not provide |
- | * '' | + | |
+ | We provide the following GROMACS variants: | ||
- | Type '' | + | ==== GPU but no MPI ==== |
- | Because of the low efficiency of GROMACS on many nodes with many GPUs via MPI, we do not provide | + | We recommend GPU Nodes, use the '' |
+ | **cuda-zen**: | ||
+ | * Gromacs +cuda ~mpi, all compiled with **GCC** | ||
+ | |||
+ | Since the '' | ||
+ | |||
+ | ==== MPI but no GPU ==== | ||
+ | |||
+ | For Gromacs on CPU only but with MPI, use '' | ||
+ | |||
+ | **zen**: | ||
+ | * Gromacs +openmpi +blas +lapack ~cuda, all compiled with **GCC** | ||
+ | * Gromacs +openmpi +blas +lapack ~cuda, all compiled with **AOCC** | ||
+ | * | ||
+ | **skylake**: | ||
+ | * Gromacs +**open**mpi +blas +lapack ~cuda, all compiled with **GCC** | ||
+ | * Gromacs +**open**mpi +blas +lapack ~cuda, all compiled with **Intel** | ||
+ | * Gromacs +**intel**mpi +blas +lapack ~cuda, all compiled with **GCC** | ||
+ | * Gromacs +**intel**mpi +blas +lapack ~cuda, all compiled with **Intel** | ||
+ | |||
+ | In some of these packages, there is no '' | ||
===== Batch Script ===== | ===== Batch Script ===== | ||
Line 81: | Line 101: | ||
benchmark various scenarios: | benchmark various scenarios: | ||
- | - a VSC users test case (??? atoms) | ||
- R-143a in hexane (20,248 atoms) with very high output rate | - R-143a in hexane (20,248 atoms) with very high output rate | ||
- a short RNA piece with explicit water (31,889 atoms) | - a short RNA piece with explicit water (31,889 atoms) | ||
- a protein inside a membrane surrounded by explicit water (80,289 atoms) | - a protein inside a membrane surrounded by explicit water (80,289 atoms) | ||
+ | - a VSC users test case (50,897 atoms) | ||
- a protein in explicit water (170,320 atoms) | - a protein in explicit water (170,320 atoms) | ||
- a protein membrane channel with explicit water (615,924 atoms) | - a protein membrane channel with explicit water (615,924 atoms) | ||
Line 254: | Line 274: | ||
}, { | }, { | ||
name: 'Test 3', | name: 'Test 3', | ||
- | data: [ 0, 0, 0, 0, 0, 0 ] | + | data: [ 94.069, 99.788, 97.9, 100.509, 95.666, 83.485 |
}, { | }, { | ||
name: 'Test 4', | name: 'Test 4', | ||
- | data: [ 0, 0, 0, 0, 0, 0 ] | + | data: [ 115.179, 117.999, 115.028, 114.967, 103.8, 0 ] |
}, { | }, { | ||
name: 'Test 5', | name: 'Test 5', | ||
Line 279: | Line 299: | ||
}, | }, | ||
title: { | title: { | ||
- | text: ' | + | text: ' |
}, | }, | ||
xaxis: { | xaxis: { | ||
Line 303: | Line 323: | ||
} | } | ||
</ | </ | ||
+ | |||
+ | Note: the computation timed out for 4 with 32 nodes, before gromacs was able to estimate a performance. We can safely assume this example case is going to be less performant on 32 than on fewer nodes too. | ||
Line 362: | Line 384: | ||
}, | }, | ||
title: { | title: { | ||
- | text: ' | + | text: ' |
}, | }, | ||
xaxis: { | xaxis: { | ||
Line 387: | Line 409: | ||
} | } | ||
</ | </ | ||
+ | |||
+ | ===== Links ===== | ||
+ | |||
+ | The benchmarks are based on three articles of NHR@FAU, featuring in | ||
+ | depth analysis on GROMACS Performance on various GPU systems, multi | ||
+ | GPU setups and comparison with CPU: | ||
+ | |||
+ | https:// | ||
+ | |||
+ | https:// | ||
+ | |||
+ | https:// | ||
+ | |||