Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
doku:slurm_multisite [2022/12/22 22:55] – [Federated Slurm] fsattari | doku:slurm_multisite [2022/12/22 23:01] (current) – fsattari | ||
---|---|---|---|
Line 21: | Line 21: | ||
</ | </ | ||
- | + | | |
- | ==== Node allocation policy ==== | + | |
- | + | ||
- | * The multi-cluster functionality requires the use of the SlurmDBD. | + | |
- | * When sbatch, salloc or srun is invoked with a cluster list, Slurm submits the job to the cluster that offers the earliest start time considering its queue of pending and running jobs | + | |
- | * BUT Slurm will make no subsequent effort to migrate the job to a different cluster whose resources become available when running jobs finish before their scheduled end times. | + | |
- | * Originally, job IDs are not unique across multiple clusters. | + | |
- | + | ||
- | {{: | + | |
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
| | ||
Line 110: | Line 98: | ||
* A cluster can only be part of one federation at a time | * A cluster can only be part of one federation at a time | ||
* Embed cluster ID within the originally 32-bit job ID | * Embed cluster ID within the originally 32-bit job ID | ||
- | {{: | + | {{: |
< | < | ||
Line 146: | Line 134: | ||
</ | </ | ||
- | |||
- | |||
- | ==== Slurm Federation Workflow ==== | ||
- | {{: | ||
Line 158: | Line 142: | ||
- | |||
- | ===== Slurm Burst-Buffer ===== | ||
- | |||
- | I/O components are much slower than the compute parts of a supercomputer, | ||
- | |||
- | The data staging derives large scale of traffic on a network connecting computing nodes for moving input and output data between the computing nodes. In this network, the traffic of inter-process communication also flows and consequently mutual interference between both types of traffic may degrade network performance. For example, burst traffic derived from the data staging increases delay in inter-process communication. Also, both types of traffic compete network bandwidth and consequently communication time is increased. | ||
- | |||
- | Burst-Buffer plugin adds a layer between the compute nodes and the parallel file system to improve network performance, |