This version (2022/06/20 09:01) was approved by msiegel.

There are two nodes (r28n41, r28n42) installed in VSC-1 each equipped with two Intel Xeon Phi cards. The card models are from the Intel Xeon Phi Coprocessor 5100 Series with 60 cores running at 1.053GHz with four threads each (240 Threads in total). More detailed specifications can be found here: http://ark.intel.com/products/71992/Intel-Xeon-Phi-Coprocessor-5110P-8GB-1_053-GHz-60-core

The host systems r28n41 and r28n42 have two Cpu sockets each, equipped with 8-core sandy bridge E5-2680 @ 2.70GHz and 256GB of memory.

  1. You need to be member of the 'mic' group. Please contact system administration specifying the username which should be added.
  2. In order to access the Intel MIC cards you need to generate a ssh key as e.g. described in Generating a ssh key pair. It is not possible to access the MICs with a password.
  3. When submitting a job script, you need to specify the project 'mic' and the queue 'mic.q' by adding the following lines:
    #$ -P mic
    #$ -q mic.q
    #$ -pe mpich 16 (or 8)

    then submit your job using the qsub.py wrapper script:

    $ qsub.py jobscript
    you are member of requested group mic
    {number of slots is 16}
    {Parallel job requests a multiple of 8 slots.}
    {Host exclusive access granted !}
    {using project: mic}
    {no job runtime limit found, assuming default value Inf days}
    {using runtime limit of INFINITY seconds}
    Your job 123456 ("Jobname") has been submitted
  4. Using qlogin or qrsh you have to specify the options on the command line:
    $ qlogin.py -q mic.q -P mic -pe mpich 16
    local configuration l01 not defined - using global configuration
    JSV "/opt/sge/default/common/shared-util/jsv.tcl" has been started
    {number of slots is 16}
    {Parallel job requests a multiple of 8 slots.}
    {Host exclusive access granted !}
    {using project: mic}
    {no job runtime limit found, assuming default value Inf days}
    {using runtime limit of INFINITY seconds}
    JSV "/opt/sge/default/common/shared-util/jsv.tcl" has been stopped
    Your job 123456 ("QLOGIN") has been submitted
    waiting for interactive job to be scheduled ...
    Your interactive job 123456 has been successfully scheduled.
    Establishing builtin session to host r28n41 ...
    username@r28n41 ~ $
  5. Now you are connected to the host system which is equipped with two Intel Xeon Phi Cards called 'mic0' and 'mic1'. Connect to either of the cards with your ssh key via:
    username@r28n41 ~ $ ssh mic0
    username@r28n41-mic0 ~ $

General information about the MIC cards:

$ /opt/intel/mic/bin/micinfo

Status information about many parameters like core frequency, cpu utilization, temperature etc.:

$ /opt/intel/mic/bin/micsmc -a

Current status of MIC cards; the cards can only be used if status is 'online':

$ micctrl -s
mic0: online (mode: linux image: /lib/firmware/mic/uos.img)
mic1: online (mode: linux image: /lib/firmware/mic/uos.img)

Using the intel compiler on the host system with the '-mmic' option:

user@r28n41 $ ifort -align array64byte -openmp -vec-report=3 -O3 -mmic helloflops3.f90 -o helloflops3f_xphi
helloflops3.f90(45): (col. 3) remark: LOOP WAS VECTORIZED.
helloflops3.f90(45): (col. 3) remark: PEEL LOOP WAS VECTORIZED.
helloflops3.f90(45): (col. 3) remark: REMAINDER LOOP WAS VECTORIZED.
helloflops3.f90(65): (col. 11) remark: LOOP WAS VECTORIZED.
helloflops3.f90(62): (col. 7) remark: loop was not vectorized: not inner loop.
helloflops3.f90(56): (col. 4) remark: loop was not vectorized: not inner loop.

SSH to one of the MIC cards, set appropriate environment variables, and run the code there:

user@r28n41 $ ssh mic0
user@r28n41-mic0 $ export LD_LIBRARY_PATH=/opt/intel/composerxe/lib/mic:$LD_LIBRARY_PATH
user@r28n41-mic0 $ export OMP_NUM_THREADS=240
user@r28n41-mic0 $ export KMP_AFFINITY=scatter  
user@r28n41-mic0 $ ./helloflops3f_xphi
 Initializing
 Starting Compute on          240  threads
GFlops =   6144.000 Secs =      5.947 GFlops per sec =   1033.107

Setting the LD_LIBRARY_PATH variable is essential as the mics require special libraries for execution which are located at /opt/intel/composerxe/lib/mic. The number of threads is optimally set to 4*(number of cores) but you can experiment with it, and the KMP_AFFINITY variable can also take the values balanced or compact as detailed here: http://software.intel.com/en-us/node/463210 or here: http://software.intel.com/en-us/node/463446.

Supplementary information on the Xeon Phi platform and on the various programming models can be found e.g. at the following locations:

  • doku/xeonphi.txt
  • Last modified: 2014/10/22 15:03
  • by malexand