Differences

This shows you the differences between two versions of the page.

--- doku:slurm_multisite [2022/12/22 22:49] – [Federation Job Submission] fsattari
+++ doku:slurm_multisite [2022/12/22 23:01] – fsattari
@@ Line 110: / Line 110: @@
   * A cluster can only be part of one federation at a time
   * Embed cluster ID within the originally 32-bit job ID
-{{ :doku:orig-federated-jobid.png?600 |}}
+{{:doku:orig-federated-jobid.png?600|}}
 <code>
@@ Line 131: / Line 131: @@
 CLUSTER: vsc4
              JOBID          PARTITION     NAME     USER   ST    TIME  NODES  NODELIST(REASON)
-           '' 67109080''      skylake_0 V_0.3_U_     nobody PD    0:05     3   n4905-025,n4906-020
+             67109080       skylake_0 V_0.3_U_     nobody PD    0:05     3   n4905-025,n4906-020
 CLUSTER: vsc5
              JOBID          PARTITION     NAME     USER   ST    TIME  NODES  NODELIST(REASON)
-             //134217981//      zen3_2048              nobody PD    0:25     8   n3511-[011-013,015-020]
+             134217981      zen3_2048              nobody PD    0:25     8   n3511-[011-013,015-020]
 </code>
@@ Line 142: / Line 142: @@
 [root@node]# scontrol show fed --sibling job
 Federation: vscdev_fed
-Self:       vscdev2:X.X.X.X:X ID:2 FedState:ACTIVE Features:synced:yes
+Self:       vsc4:X.X.X.X:X ID:2 FedState:ACTIVE Features:synced:yes
-Sibling:    vscdev:X.X.X.X:X  ID:1 FedState:ACTIVE Features:synced:yes PersistConnSend/Recv:Yes/Yes Synced:Yes
+Sibling:    vsc5:X.X.X.X:X ID:1 FedState:ACTIVE Features:synced:yes PersistConnSend/Recv:Yes/Yes Synced:Yes
 </code>
-==== Slurm Federation Workflow ====
-{{:doku:federationworkflow.png?700|}}
 ===== Multi-Cluster vs Federation implementation =====
-{{:doku:multiclustervsfederation.png?500|}}
 On a basic approach, multi-cluster is one unique interface to submit jobs to multiple separated Slurm clusters and the Slurm database can be unique or can be dedicated to each Slurm cluster while federation is a way to federate the job and scheduling information as one and the Slurm database must be unique.
-===== Slurm Burst-Buffer =====
-I/O components are much slower than the compute parts of a supercomputer, therefore they can create bottlenecks if the bandwidth is saturated.
-The data staging derives large scale of traffic on a network connecting computing nodes for moving input and output data between the computing nodes. In this network, the traffic of inter-process communication also flows and consequently mutual interference between both types of traffic may degrade network performance. For example, burst traffic derived from the data staging increases delay in inter-process communication. Also, both types of traffic compete network bandwidth and consequently communication time is increased.
-Burst-Buffer plugin adds a layer between the compute nodes and the parallel file system to improve network performance, I/O, and data staging.