Differences
This shows you the differences between two versions of the page.
doku:slurm_corepinning [2024/03/04 16:10] – created amelic | doku:slurm_corepinning [2024/03/04 16:11] (current) – CorePinning amelic | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | Pinning | + | ====== |
+ | Various tools and applications, | ||
+ | |||
+ | ===== Need for processor affinity and/or pinning ===== | ||
+ | |||
+ | To improve job performance, | ||
+ | |||
+ | - < | ||
+ | - < | ||
+ | |||
+ | To optimize program parallelization, | ||
+ | ====Cluster- compute nodes and cores==== | ||
+ | |||
+ | ==VSC 4== | ||
+ | Physical cores for processes/ threads of Socket 0 are numbered from 0 to 23. | ||
+ | Physical cores for processes/ threads of Socket 1 are numbered from 24 to 47. | ||
+ | Virtual cores for processes/ threads are numbered from 48 to 95. | ||
+ | ==VSC 5 (Cascade Lake)== | ||
+ | Physical cores for processes/ threads of Socket 0 are numbered from 0 to 47. | ||
+ | Physical cores for processes/ threads of Socket 1 are numbered from 48 to 95. | ||
+ | Virtual cores for processes are numbered from 96 to 191. | ||
+ | ==VSC 5 (Zen)== | ||
+ | Physical cores for processes/ threads of Socket 0 are numbered from 0 to 63. | ||
+ | Physical cores for processes/ threads of Socket 1 are numbered from 64 to 127. | ||
+ | Virtual cores are numbered from 128 to 255. | ||
+ | |||
+ | |||
+ | **Environment variables :** | ||
+ | For MPI, OpenMP, and hybrid job applications, | ||
+ | |||
+ | |||
+ | ===== Types of parallel jobs ===== | ||
+ | |||
+ | - pure OpenMP jobs | ||
+ | - pure MPI jobs | ||
+ | - hybrid jobs | ||
+ | |||
+ | ==== 1. Pure OpenMP jobs ===== | ||
+ | |||
+ | **OpenMP threads** are pinned with < | ||
+ | |||
+ | |||
+ | ==Compiler examples supporting OpenMP== | ||
+ | The spack compilers command lists the available compilers. However, a more common practice is to use the module avail gcc or module avail intel commands and then load the desired compiler: | ||
+ | |||
+ | **ICC Example** | ||
+ | < | ||
+ | module load intel-oneapi-compilers/ | ||
+ | icc -fopenmp -o myprogram myprogram.c | ||
+ | </ | ||
+ | **GCC Example** | ||
+ | < | ||
+ | module load --auto gcc/ | ||
+ | gcc -fopenmp -o myprogram myprogram.c | ||
+ | </ | ||
+ | Note the flag -fopenmp, necessary to instruct the compiler to enable OpenMP functionality. | ||
+ | |||
+ | **Example Job Script for ICC** | ||
+ | |||
+ | < | ||
+ | # | ||
+ | #SBATCH -J pureOMP | ||
+ | #SBATCH -N 1 | ||
+ | |||
+ | export OMP_NUM_THREADS=4 | ||
+ | export KMP_AFFINITY=" | ||
+ | |||
+ | ./ | ||
+ | </ | ||
+ | |||
+ | **Example Job Script for GCC** | ||
+ | |||
+ | < | ||
+ | # | ||
+ | #SBATCH -J pureOMP | ||
+ | #SBATCH -N 1 | ||
+ | |||
+ | export OMP_NUM_THREADS=4 | ||
+ | export GOMP_CPU_AFFINITY=" | ||
+ | ./ | ||
+ | </ | ||
+ | **OMP PROC BIND AND OMP PLACES** | ||
+ | |||
+ | < | ||
+ | # Example: Places threads on cores in a round-robin fashion | ||
+ | export OMP_PLACES=" | ||
+ | |||
+ | # Specify whether threads may be moved between CPUs using OMP_PROC_BIND | ||
+ | # " | ||
+ | export OMP_PROC_BIND=true | ||
+ | </ | ||
+ | |||
+ | OMP_PLACES is set to specify the placement of threads. In this example, each thread is assigned to a specific core in a round-robin fashion. | ||
+ | OMP_PROC_BIND is set to " | ||
+ | The rest of your Batch script should remain the same. | ||
+ | Note that you might need to adjust the OMP_PLACES configuration based on your specific hardware architecture and the desired thread placement strategy. | ||
+ | |||
+ | Make sure to check the OpenMP documentation and your system' | ||
+ | ==== 2. Pure MPI jobs ===== | ||
+ | |||
+ | **MPI processes** | ||
+ | :In a distributed computing environment, | ||
+ | There are several MPI implementations, | ||
+ | |||
+ | To choose the optimal MPI implementation for your parallelized application, | ||
+ | |||
+ | * Understand Your Application' | ||
+ | * Explore Available MPI Implementations: | ||
+ | * Check Compatibility: | ||
+ | * Experiment with Basic Commands: After selecting an MPI implementation, | ||
+ | * Seek Assistance: Don't hesitate to seek help if you have questions or face challenges. | ||
+ | * Additional Resources: Explore MPI tutorials at: [[https:// | ||
+ | |||
+ | |||
+ | The default pin processor list is given by < | ||
+ | |||
+ | ==== Examples ==== | ||
+ | ===Compatibility and Compilers=== | ||
+ | Various MPI compilers and implementations exist catering to different programming languages such as C, C++, and Fortran for e.g. MPI implementations: | ||
+ | |||
+ | **OpenMPI** | ||
+ | * C: mpicc | ||
+ | * C++: mpic++ oder mpiCC | ||
+ | * Fortran: mpifort oder mpif77 für Fortran 77, mpif90 für Fortran 90 | ||
+ | |||
+ | **Intel MPI** | ||
+ | * C: mpiicc | ||
+ | * C++: mpiicpc | ||
+ | * Fortran: mpiifort | ||
+ | |||
+ | **MPICH** | ||
+ | * C: mpicc | ||
+ | * C++: mpic++ | ||
+ | * Fortran: mpifort | ||
+ | |||
+ | Use the ' | ||
+ | Following are a few Slurm script examples written for C applications with various compiler versions. These examples provide a glimpse into writing batch scripts and serve as a practical guide for creating scripts. | ||
+ | Note that environment variables differ for different MPI implementations (OpenMPI, Intel MPI, and MPICH), and the Slurm scripts also vary between srun and mpiexec. Adjust your Slurm scripts accordingly | ||
+ | ===OPENMPI=== | ||
+ | **SRUN** | ||
+ | < | ||
+ | # | ||
+ | # | ||
+ | #SBATCH -N 2 | ||
+ | #SBATCH --ntasks-per-node 4 | ||
+ | #SBATCH --ntasks-per-core 1 | ||
+ | |||
+ | NUMBER_OF_MPI_PROCESSES=8 | ||
+ | |||
+ | module purge | ||
+ | module load openmpi/ | ||
+ | |||
+ | mpicc -o openmpi openmpi.c | ||
+ | srun -n $NUMBER_OF_MPI_PROCESSES --mpi=pmi2 --cpu_bind=map_cpu: | ||
+ | </ | ||
+ | Note: The // | ||
+ | |||
+ | **MPIEXEC** | ||
+ | < | ||
+ | # | ||
+ | # | ||
+ | #SBATCH -N 2 | ||
+ | #SBATCH --ntasks-per-node 4 | ||
+ | #SBATCH --ntasks-per-core 1 | ||
+ | |||
+ | NUMBER_OF_MPI_PROCESSES=8 | ||
+ | export OMPI_MCA_hwloc_base_binding_policy=core | ||
+ | export OMPI_MCA_hwloc_base_cpu_set=0, | ||
+ | |||
+ | module purge | ||
+ | module load openmpi/ | ||
+ | |||
+ | mpicc -o openmpi openmpi.c | ||
+ | mpiexec -n $NUMBER_OF_MPI_PROCESSES ./openmpi | ||
+ | </ | ||
+ | |||
+ | ===INTELMPI=== | ||
+ | **SRUN** | ||
+ | < | ||
+ | # | ||
+ | # | ||
+ | #SBATCH -M vsc5 | ||
+ | #SBATCH -N 2 | ||
+ | #SBATCH --ntasks-per-node 4 | ||
+ | #SBATCH --ntasks-per-core 1 | ||
+ | |||
+ | export I_MPI_DEBUG=4 | ||
+ | NUMBER_OF_MPI_PROCESSES=8 | ||
+ | export I_MPI_PIN_PROCESSOR_LIST=0, | ||
+ | |||
+ | module purge | ||
+ | module load intel/ | ||
+ | module load intel-mpi/ | ||
+ | |||
+ | mpiicc | ||
+ | srun -n $NUMBER_OF_MPI_PROCESSES --cpu_bind=map_cpu: | ||
+ | </ | ||
+ | **MPIEXEC** | ||
+ | < | ||
+ | # | ||
+ | # | ||
+ | #SBATCH -N 2 | ||
+ | #SBATCH --ntasks-per-node 4 | ||
+ | #SBATCH --ntasks-per-core 1 | ||
+ | |||
+ | |||
+ | export I_MPI_DEBUG=4 | ||
+ | NUMBER_OF_MPI_PROCESSES=8 | ||
+ | export I_MPI_PIN_PROCESSOR_LIST=0, | ||
+ | |||
+ | module purge | ||
+ | module load intel/ | ||
+ | module load intel-mpi/ | ||
+ | |||
+ | mpiicc | ||
+ | mpiexec -n $NUMBER_OF_MPI_PROCESSES ./ | ||
+ | |||
+ | ===MPICH=== | ||
+ | **SRUN** | ||
+ | < | ||
+ | # | ||
+ | # | ||
+ | #SBATCH -N 2 | ||
+ | #SBATCH --ntasks-per-node 4 | ||
+ | #SBATCH --ntasks-per-core 1 | ||
+ | |||
+ | NUMBER_OF_MPI_PROCESSES=8 | ||
+ | |||
+ | module purge | ||
+ | module load --auto mpich/ | ||
+ | |||
+ | mpicc -o mpich mpich.c | ||
+ | srun -n $NUMBER_OF_MPI_PROCESSES --cpu_bind=map_cpu: | ||
+ | </ | ||
+ | |||
+ | Note: The flag - -auto; loads all dependencies | ||
+ | |||
+ | |||
+ | ==== 3. Hybrid jobs ===== | ||
+ | |||
+ | MPI (Message Passing Interface) is utilized for facilitating communication between processes across multiple nodes. Executing OpenMP on each respective node can be advantageous, | ||
+ | |||
+ | < | ||
+ | # | ||
+ | # | ||
+ | #SBATCH -J mapCPU | ||
+ | #SBATCH -N 3 | ||
+ | #SBATCH -n 3 | ||
+ | #SBATCH --ntasks-per-node=1 | ||
+ | #SBATCH --cpus-per-task=3 | ||
+ | #SBATCH --time=00: | ||
+ | |||
+ | export I_MPI_DEBUG=1 | ||
+ | NUMBER_OF_MPI_PROCESSES=3 | ||
+ | export OMP_NUM_THREADS=3 | ||
+ | |||
+ | module load intel/ | ||
+ | module load intel-mpi/ | ||
+ | mpiicc -qopenmp -o myprogram myprogram.c | ||
+ | |||
+ | srun -n $NUMBER_OF_MPI_PROCESSES --cpu_bind=mask_cpu: | ||
+ | </ | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ |