This version (2022/06/20 09:01) was approved by msiegel.
This is an old revision of the document!
Out of memory
A node ran out of memory during the execution of a user job.
An E-Mail was sent to the user, with a content line similar to
Sep 11 10:28:16 r01n08 (mpirun) Starting on r01n09: Sep 11 08:46:43 jz r01n09 254681 3
the parts of this line are
'Sep 11 10:28:16
' is the time of the 'out of memory event'.'r01n08
' is the node of the 'out of memory event'. This may be a slave node of the job.'(mpirun)
' is the executable which was killed. All children of this executable have died, too. (Only one message is sent per node, i.e., for the first executable killed.)'Starting on r01n09:
' the master node of the job.'Sep 11 08:46:43
' is the time the job was started.'jz
' is the user id.'r01n09
' is again the master node.'254681
' is the job id.'3
' is the task id (if applicable).
If your job requires more memory per core, you might consider using only 8 or even 4 core per node.
Please direct further questions to the system administration.