This version is outdated by a newer approved version.DiffThis version (2022/06/20 09:01) was approved by msiegel.

This is an old revision of the document!


Monitor where threads/processes are running

There are several ways to monitor your job, either in live time directly on the compute node or by modifying the job script or the application code:

  • <html><font color=#cc3300>live </font> <span style=“color:blue;font-size:100%;”>&dzigrarr;</span> submit the job and connect with the compute node</html>

[xy@l32]$ sbatch job.sh
[xy@l32]$ squeue -u xy
JOBID    PARTITION  NAME      USER  ST  TIME  NODES  NODELIST(REASON)
1098917  mem_0096   vasp_xx   xy    R   0:02    1    n408-041
[xy@l32]$ ssh n408-041
[xy@nn408-041]$ top     

When typing <html><span style=“color:blue;font-size:100%;”>&dzigrarr;</span> <font color=#cc3300>1</font></html> during the top call, per-core-information is obtained, e.g., about the cpu-usage, compare with the picture to the right. The user can select the parameters to be shown from a list displayed when typing <html><span style=“color:blue;font-size:100%;”>&dzigrarr;</span> <font color=#cc3300>f</font></html>.

The columns VIRT and RES indicate the virtual, resident memory usage of each process (per default in kB). The column COMMAND lists the name of the command or application.

  • <html><font color=#cc3300>batch script</font> (Intel-MPI) <span style=“color:blue;font-size:100%;”>&dzigrarr;</span> set: I_MPI_DEBUG=4 </html>
  • <html><font color=#cc3300>code</font> <span style=“color:blue;font-size:100%;”>&dzigrarr;</span> via library functions information about the locality of processes and threads can be obtained (libraries: mpi.h or in C-code hwloc.h (hardware locality) or sched.h (scheduling parameters)) </html>
#include "mpi.h"
...
MPI_Get_processor_name(processor_name, &namelen);
#include <sched.h>
...
CPU_ID = sched_getcpu();
#include <hwloc.h>
...
    hwloc_topology_t topology;
    hwloc_cpuset_t cpuset;
    hwloc_obj_t obj;
    hwloc_topology_init ( &topology);
    hwloc_topology_load ( topology);
    hwloc_bitmap_t set = hwloc_bitmap_alloc();
    hwloc_obj_t pu;
    err = hwloc_get_proc_cpubind(topology, getpid(), set, HWLOC_CPUBIND_PROCESS);
    pu = hwloc_get_pu_obj_by_os_index(topology, hwloc_bitmap_first(set));
    int my_coreid = hwloc_bitmap_first(set);
    hwloc_bitmap_free(set);
    hwloc_topology_destroy(topology);

//  compile: mpiicc -qopenmp -o ompMpiCoreIds ompMpiCoreIds.c -lhwloc
  • doku/monitoring.1621000339.txt.gz
  • Last modified: 2024/10/24 10:21
  • (external edit)