Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
doku:slurm_multisite_admin [2022/12/22 23:00] fsattaridoku:slurm_multisite_admin [2023/06/23 14:54] (current) – [Overview of Multi-Clustering] fsattari
Line 1: Line 1:
 ====== Overview of Multi-Clustering ====== ====== Overview of Multi-Clustering ======
 +  In a multi cluster environment, It’s possible to share resources such as computing nodes among the clusters. This maximizes resource utilization and reduces idle time.
 +  
   * Multi-Clustering allows running several slurm clusters from the same control node.   * Multi-Clustering allows running several slurm clusters from the same control node.
   * In this case, different slurmctld daemons will be running on the same machine, and the system users can target commands to any (or all) of the clusters.   * In this case, different slurmctld daemons will be running on the same machine, and the system users can target commands to any (or all) of the clusters.
Line 22: Line 24:
 </code> </code>
  
 +{{:doku:vsc4-vsc5-multiclustering.png?600|}}
 ==== Node allocation policy ==== ==== Node allocation policy ====
  
-  * The multi-cluster functionality requires the use of the SlurmDBD.+  * To enable the multi-cluster functionality the use of SlurmDBD and  MUNGE or authentication keys is required
   * When sbatch, salloc or srun is invoked with a cluster list, Slurm submits the job to the cluster that offers the earliest start time considering its queue of pending and running jobs   * When sbatch, salloc or srun is invoked with a cluster list, Slurm submits the job to the cluster that offers the earliest start time considering its queue of pending and running jobs
   * BUT Slurm will make no subsequent effort to migrate the job to a different cluster whose resources become available when running jobs finish before their scheduled end times.   * BUT Slurm will make no subsequent effort to migrate the job to a different cluster whose resources become available when running jobs finish before their scheduled end times.
Line 120: Line 123:
  
 </code> </code>
 +{{:doku:federation.png?600|}}
 ==== Federation Job Submission ==== ==== Federation Job Submission ====
  
Line 166: Line 169:
  
 Burst-Buffer plugin adds a layer between the compute nodes and the parallel file system to improve network performance, I/O, and data staging. Burst-Buffer plugin adds a layer between the compute nodes and the parallel file system to improve network performance, I/O, and data staging.
 +
 +{{:doku:bb-process.png?700|}}
  • doku/slurm_multisite_admin.txt
  • Last modified: 2023/06/23 14:54
  • by fsattari