Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
doku:monitoring [2022/06/22 16:53] – rewrite in markdown msiegel | doku:monitoring [2024/07/18 14:25] (current) – [Live] grokyta | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | # Monitoring Processes & Threads | + | ====== |
- | ## CPU Load | + | ===== CPU Load ===== |
There are several ways to monitor the threads CPU load distribution of | There are several ways to monitor the threads CPU load distribution of | ||
- | your job, either | + | your job, either |
- | the job script, or the application code. | + | the [[doku: |
- | ### Live | + | ==== Live ==== |
- | So we assume your program runs, but could it be faster? SLURM gives you | + | So we assume your program runs, but could it be faster? |
- | a `Job ID`, type `squeue --job myjobid` to find out on which node your | + | a '' |
- | job runs; say n372-007. Type `ssh n372-007`, to connect to the given | + | job runs; say n4905-007. Type '' |
- | node. Type `top` to start a simple task manager: | + | node. Type '' |
- | ``` | + | <code sh> |
- | [myuser@l32]$ sbatch job.sh | + | [myuser@l42]$ sbatch job.sh |
- | [myuser@l32]$ squeue -u myuser | + | [myuser@l42]$ squeue -u myuser |
JOBID PARTITION | JOBID PARTITION | ||
- | 1098917 | + | 1098917 |
- | [myuser@l32]$ ssh n372-007 | + | [myuser@l42]$ ssh n4905-007 |
- | [myuser@n372-007]$ top | + | [myuser@n4905-007]$ top |
- | ``` | + | </ |
- | Within | + | Within |
should be able to see the load on all the available CPUs, as an | should be able to see the load on all the available CPUs, as an | ||
example: | example: | ||
- | ``` | + | < |
top - 16:31:51 up 181 days, 1:04, 3 users, | top - 16:31:51 up 181 days, 1:04, 3 users, | ||
Threads: 239 total, | Threads: 239 total, | ||
Line 52: | Line 52: | ||
18810 root 20 | 18810 root 20 | ||
... | ... | ||
- | ``` | + | </ |
In our example all 8 threads are utilised; which is good. The opposite | In our example all 8 threads are utilised; which is good. The opposite | ||
Line 58: | Line 58: | ||
most CPUs! | most CPUs! | ||
- | The columns | + | The columns |
- | *resident* memory usage of each process (unless noted otherwise in | + | //resident// memory usage of each process (unless noted otherwise in |
- | kB). The column | + | kB). The column |
application. | application. | ||
- | In the following screenshot we can see stats for all 32 threads of a compute node running | + | In the following screenshot we can see stats for all 32 threads of a compute node running |
{{ : | {{ : | ||
- | ### Job Script | + | ==== Job Script |
+ | |||
+ | If you are using '' | ||
- | If you are using `Intel-MPI` you might include this option in your batch script: | + | |
- | ``` | + | |
- | I_MPI_DEBUG=4 | + | |
- | ``` | + | |
- | ### Application Code | + | ==== Application Code ==== |
- | If your application code is in `C`, information about the locality of | + | If your application code is in '' |
processes and threads can be obtained via library functions using either | processes and threads can be obtained via library functions using either | ||
of the following libraries: | of the following libraries: | ||
- | #### mpi.h | + | === mpi.h === |
- | ``` | + | |
+ | <code cpp> | ||
#include " | #include " | ||
... MPI_Get_processor_name(processor_name, | ... MPI_Get_processor_name(processor_name, | ||
- | ``` | + | </ |
- | #### sched.h (scheduling parameters) | + | === sched.h (scheduling parameters) |
- | ``` | + | |
+ | <code c++> | ||
#include < | #include < | ||
... CPU_ID = sched_getcpu(); | ... CPU_ID = sched_getcpu(); | ||
- | ``` | + | </ |
- | #### hwloc.h (Hardware locality) | + | === hwloc.h (Hardware locality) |
- | ``` | + | |
+ | <code cpp> | ||
#include < | #include < | ||
... | ... | ||
Line 111: | Line 113: | ||
// compile: mpiicc -qopenmp -o ompMpiCoreIds ompMpiCoreIds.c -lhwloc | // compile: mpiicc -qopenmp -o ompMpiCoreIds ompMpiCoreIds.c -lhwloc | ||
- | ``` | + | </ |
+ | |||
+ | ===== GPU Load ===== | ||
+ | |||
+ | We assume you program uses a GPU, and your program runs as expected, | ||
+ | so could it be faster? On the same node where your job runs (see CPU | ||
+ | load section), maybe in a new terminal, type '' | ||
+ | start a simple task manager for the graphics card. '' | ||
+ | repeats a command every 2 seconds, acts as a live monitor for the | ||
+ | GPU. In our example below the GPU utilisation is around 80% the most | ||
+ | time, which is very good already. | ||
+ | |||
+ | < | ||
+ | Every 2.0s: nvidia-smi | ||
+ | Wed Jun 22 16:42:52 2022 | ||
+ | +-----------------------------------------------------------------------------+ | ||
+ | | NVIDIA-SMI 460.32.03 | ||
+ | |-------------------------------+----------------------+----------------------+ | ||
+ | | GPU Name Persistence-M| Bus-Id | ||
+ | | Fan Temp Perf Pwr: | ||
+ | | | ||
+ | |===============================+======================+======================| | ||
+ | | | ||
+ | | 36% | ||
+ | | | ||
+ | +-------------------------------+----------------------+----------------------+ | ||
+ | |||
+ | +-----------------------------------------------------------------------------+ | ||
+ | | Processes: | ||
+ | | GPU | ||
+ | | ID | ||
+ | |=============================================================================| | ||
+ | | 0 | ||
+ | +-----------------------------------------------------------------------------+ | ||
+ | </ | ||