Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revisionBoth sides next revision | ||
doku:gromacs [2023/05/17 15:56] – msiegel | doku:gromacs [2023/11/23 12:18] – [Many nodes with many GPUs] msiegel | ||
---|---|---|---|
Line 3: | Line 3: | ||
Our recommendation: | Our recommendation: | ||
- | - Use the most **recent version** of GROMACS that we provide or build your own. | + | - Use the **most recent version** of GROMACS that we provide or build your own. |
- Use the newest Hardware: use **1 GPU** on the partitions '' | - Use the newest Hardware: use **1 GPU** on the partitions '' | ||
- Do some **performance analysis** to decide if a single GPU Node (likely) or multiple CPU Nodes via MPI (unlikely) better suits your problem. | - Do some **performance analysis** to decide if a single GPU Node (likely) or multiple CPU Nodes via MPI (unlikely) better suits your problem. | ||
Line 9: | Line 9: | ||
In most cases it does not make sense to run on multiple GPU nodes with MPI; Whether using one or two GPUs per node. | In most cases it does not make sense to run on multiple GPU nodes with MPI; Whether using one or two GPUs per node. | ||
- | ===== GPU Partition ===== | + | ===== CPU or GPU Partition? ===== |
First you have to decide on which hardware GROMACS should run, we call this a '' | First you have to decide on which hardware GROMACS should run, we call this a '' | ||
Line 15: | Line 15: | ||
===== Installations ===== | ===== Installations ===== | ||
- | We provide the following GROMACS installations: | + | Type '' |
- | * '' | + | Because of the low efficiency of GROMACS |
- | * '' | + | |
- | Type '' | + | We provide the following GROMACS variants: |
+ | |||
+ | ==== GPU but no MPI ==== | ||
+ | |||
+ | We recommend GPU Nodes, use the '' | ||
+ | |||
+ | **cuda-zen**: | ||
+ | * Gromacs +cuda ~mpi, all compiled with **GCC** | ||
+ | |||
+ | Since the '' | ||
+ | |||
+ | ==== MPI but no GPU ==== | ||
+ | |||
+ | For Gromacs on CPU only but with MPI, use '' | ||
- | Because of the low efficiency of GROMACS on many nodes with many GPUs via MPI, we do not provide '' | + | **zen**: |
+ | * Gromacs +openmpi +blas +lapack ~cuda, all compiled | ||
+ | * Gromacs +openmpi +blas +lapack ~cuda, all compiled with **AOCC** | ||
+ | * | ||
+ | **skylake**: | ||
+ | * Gromacs | ||
+ | * Gromacs | ||
+ | * Gromacs +**intel**mpi +blas +lapack ~cuda, all compiled with **GCC** | ||
+ | * Gromacs +**intel**mpi +blas +lapack ~cuda, all compiled with **Intel** | ||
+ | In some of these packages, there is no '' | ||
===== Batch Script ===== | ===== Batch Script ===== | ||
Line 80: | Line 101: | ||
benchmark various scenarios: | benchmark various scenarios: | ||
- | - a VSC users test case (??? atoms) | ||
- R-143a in hexane (20,248 atoms) with very high output rate | - R-143a in hexane (20,248 atoms) with very high output rate | ||
- a short RNA piece with explicit water (31,889 atoms) | - a short RNA piece with explicit water (31,889 atoms) | ||
- a protein inside a membrane surrounded by explicit water (80,289 atoms) | - a protein inside a membrane surrounded by explicit water (80,289 atoms) | ||
+ | - a VSC users test case (50,897 atoms) | ||
- a protein in explicit water (170,320 atoms) | - a protein in explicit water (170,320 atoms) | ||
- a protein membrane channel with explicit water (615,924 atoms) | - a protein membrane channel with explicit water (615,924 atoms) | ||
Line 232: | Line 253: | ||
In most cases one node is **better** than more nodes. | In most cases one node is **better** than more nodes. | ||
- | In some cases, for example a large molecule like Test 7, you might want to run GROMACS on multiple nodes in parallel using MPI, with multiple GPUs (one each node). We strongly encourage you to test if you actually benefit from running with GPUs on many nodes. GROMACS can perform worse on many nodes in parallel than on a single one, even considerably! | + | In some cases, for example a large molecule like Test 7, you might want to run GROMACS on multiple nodes in parallel using MPI, with multiple |
Run GROMACS on multiple nodes with: | Run GROMACS on multiple nodes with: | ||
Line 253: | Line 274: | ||
}, { | }, { | ||
name: 'Test 3', | name: 'Test 3', | ||
- | data: [ 0, 0, 0, 0, 0, 0 ] | + | data: [ 94.069, 99.788, 97.9, 100.509, 95.666, 83.485 |
}, { | }, { | ||
name: 'Test 4', | name: 'Test 4', | ||
- | data: [ 0, 0, 0, 0, 0, 0 ] | + | data: [ 115.179, 117.999, 115.028, 114.967, 103.8, 0 ] |
}, { | }, { | ||
name: 'Test 5', | name: 'Test 5', | ||
Line 302: | Line 323: | ||
} | } | ||
</ | </ | ||
+ | |||
+ | Note: the computation timed out for 4 with 32 nodes, before gromacs was able to estimate a performance. We can safely assume this example case is going to be less performant on 32 than on fewer nodes too. | ||
Line 386: | Line 409: | ||
} | } | ||
</ | </ | ||
+ | |||
+ | ===== Links ===== | ||
+ | |||
+ | The benchmarks are based on three articles of NHR@FAU, featuring in | ||
+ | depth analysis on GROMACS Performance on various GPU systems, multi | ||
+ | GPU setups and comparison with CPU: | ||
+ | |||
+ | https:// | ||
+ | |||
+ | https:// | ||
+ | |||
+ | https:// | ||
+ | |||