pandoc:introduction-to-mul-cluster:01_introduction:06_background_info [VSC Wiki]

This version (2020/10/20 08:09) is a draft.
Approvals: 0/1

On the current cluster (“smmpmech.unileoben.ac.at”):

X server (VNC) runs on head node
X clients (=applications, e.g. Fluent) run on compute nodes
X clients communicate with X server
- over “physical” network (Infiniband)
- with the inefficient X protocol
- many clients with one server

Problems with the current method:

X clients (=applications) die when connection is lost
- therefore head node can not be rebooted without killing jobs
- somtimes VNC crashes or gets stuck
many clients are displayed on one server
- one misbehaving client can block the server
communication (X server ↔ X client)
- takes a lot of CPU power on head node
- can slow down the application (experienced up to 60% performance loss with Fluent)
- that is the reason why minimizing Fluent window helps (no graphic updates ⇒ no communication)

On the new cluster we have:

X servers (Xpra) run on compute nodes
- one server per application
X clients (applications) run on compute nodes
client and server communicate directly on the same machine
- each client with its own server
- no “physical” network involved
to see the actual output you must attach to the Xpra server with an Xpra client
- use sbatch+display to submit and display a job
- use display-all to display graphical output of all your jobs

Solved problems with this method:

X clients (=applications) no longer die when connection is lost
- login nodes can be booted any time
- simply detach the Xpra connection when you are not watching it in order not to slow down the application (e.g. Fluent)
misbehaving X clients can only block their own server
communication (X server ↔ X client) stays on the comput node ⇒ fast
communication Xpra server ↔ Xpra client
- can be detached and reattached any time
- uses efficient Xpra protocol

everything is easier for the admins
no jobs are getting in the way of each other on the same node
memory is given implicitly by number and kind of nodes
- no need for the user to specify memory in job script
no fragmentation and problems related to fragmentation
- e.g. partial nodes are free but the user needs a whole node

single core jobs are more complicated for the user
- user must manage the execution of many small jobs on one node himself/herself
if a cluster consists of relatively few nodes its utilization will be worse

Therefore we have decided on allowing use of partial nodes.

as long as there are free resources (e.g. cores): no effect can be seen
as soon as jobs have to compete for resources:
- history of user / group comes into play
- scheduler preferes jobs of users/groups which have not had their fair share yet
in the long run this allocates resources according to the predefined percentages

not only cores but also memory can “block” nodes
e.g. 1 core and 120 GB of RAM block an entire E5-2690v4 node and are in this way equivalent to 28 cores
therefore memory must also count

Xpra vs. VNC (1)