Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revisionLast revisionBoth sides next revision | ||
doku:slurm_multisite [2022/12/22 22:28] – fsattari | doku:slurm_multisite [2022/12/22 23:01] – fsattari | ||
---|---|---|---|
Line 70: | Line 70: | ||
CLUSTER: vsc4 | CLUSTER: vsc4 | ||
| | ||
- | | + | |
- | | + | |
. | . | ||
. | . | ||
Line 95: | Line 95: | ||
< | < | ||
- | [username@node ~]$ sbatch -M vscdev/vscdev2 | + | [username@node ~]$ sbatch -M vsc4/vsc5 job.sh |
</ | </ | ||
- | To submit a job to a specific cluster (here vscdev | + | To submit a job to a specific cluster (here vsc4 or vsc5) |
===== Federated Slurm ===== | ===== Federated Slurm ===== | ||
Line 110: | Line 110: | ||
* A cluster can only be part of one federation at a time | * A cluster can only be part of one federation at a time | ||
* Embed cluster ID within the originally 32-bit job ID | * Embed cluster ID within the originally 32-bit job ID | ||
- | + | {{:doku:orig-federated-jobid.png?600|}} | |
- | {{:doku:slurmfederation.png?700|}} | + | |
< | < | ||
[...]# sacctmgr list cluster withfed | [...]# sacctmgr list cluster withfed | ||
- | | + | |
---------- --------------- ------------ ----- -------- ------- ------------- ------- ------- ------------- -------- ----------- -------------------- | ---------- --------------- ------------ ----- -------- ------- ------------- ------- ------- ------------- -------- ----------- -------------------- | ||
- | vscdev | + | vsc4 |
- | vscdev2 | + | vsc5 |
</ | </ | ||
Line 130: | Line 129: | ||
< | < | ||
[...]# squeue -M vscdev, | [...]# squeue -M vscdev, | ||
- | CLUSTER: | + | CLUSTER: |
- | | + | |
- | ** 67109080** test test | + | |
- | CLUSTER: | + | |
- | | + | CLUSTER: |
- | ** 134217981** | + | |
+ | | ||
</ | </ | ||
Line 142: | Line 142: | ||
[root@node]# | [root@node]# | ||
Federation: vscdev_fed | Federation: vscdev_fed | ||
- | Self: vscdev2:X.X.X.X:X ID:2 FedState: | + | Self: vsc4:X.X.X.X:X ID:2 FedState: |
- | Sibling: | + | Sibling: |
</ | </ | ||
- | |||
- | |||
- | ==== Slurm Federation Workflow ==== | ||
- | {{: | ||
===== Multi-Cluster vs Federation implementation ===== | ===== Multi-Cluster vs Federation implementation ===== | ||
- | {{: | ||
On a basic approach, multi-cluster is one unique interface to submit jobs to multiple separated Slurm clusters and the Slurm database can be unique or can be dedicated to each Slurm cluster while federation is a way to federate the job and scheduling information as one and the Slurm database must be unique. | On a basic approach, multi-cluster is one unique interface to submit jobs to multiple separated Slurm clusters and the Slurm database can be unique or can be dedicated to each Slurm cluster while federation is a way to federate the job and scheduling information as one and the Slurm database must be unique. | ||
- | |||
- | ===== Slurm Burst-Buffer ===== | ||
- | |||
- | I/O components are much slower than the compute parts of a supercomputer, | ||
- | |||
- | The data staging derives large scale of traffic on a network connecting computing nodes for moving input and output data between the computing nodes. In this network, the traffic of inter-process communication also flows and consequently mutual interference between both types of traffic may degrade network performance. For example, burst traffic derived from the data staging increases delay in inter-process communication. Also, both types of traffic compete network bandwidth and consequently communication time is increased. | ||
- | |||
- | Burst-Buffer plugin adds a layer between the compute nodes and the parallel file system to improve network performance, |