===== Running jobs on Intel Xeon Phi =====
==== Xeon Phi model specifications ====
There are two nodes (r28n41, r28n42) installed in VSC-1 each equipped with two Intel Xeon Phi cards.
The card models are from the Intel Xeon Phi Coprocessor 5100 Series with 60 cores running at 1.053GHz with four threads each (240 Threads in total). More detailed specifications can be found here: http://ark.intel.com/products/71992/Intel-Xeon-Phi-Coprocessor-5110P-8GB-1_053-GHz-60-core
The host systems r28n41 and r28n42 have two Cpu sockets each, equipped with 8-core sandy bridge E5-2680 @ 2.70GHz and 256GB of memory.
==== How to submit jobs to the mic.q on VSC-1 ====
- You need to be member of the '**mic**' group. Please **[[doku:contact|contact system administration]]** specifying the username which should be added.
- In order to access the Intel MIC cards you need to generate a ssh key as e.g. described in [[doku:sshkeygen|Generating a ssh key pair]]. It is not possible to access the MICs with a password.
- When submitting a job script, you need to specify the project 'mic' and the queue 'mic.q' by adding the following lines: #$ -P mic
#$ -q mic.q
#$ -pe mpich 16 (or 8)
then submit your job using the ''qsub.py'' wrapper script:$ qsub.py jobscript
you are member of requested group mic
{number of slots is 16}
{Parallel job requests a multiple of 8 slots.}
{Host exclusive access granted !}
{using project: mic}
{no job runtime limit found, assuming default value Inf days}
{using runtime limit of INFINITY seconds}
Your job 123456 ("Jobname") has been submitted
- Using ''qlogin'' or ''qrsh'' you have to specify the options on the command line:$ qlogin.py -q mic.q -P mic -pe mpich 16
local configuration l01 not defined - using global configuration
JSV "/opt/sge/default/common/shared-util/jsv.tcl" has been started
{number of slots is 16}
{Parallel job requests a multiple of 8 slots.}
{Host exclusive access granted !}
{using project: mic}
{no job runtime limit found, assuming default value Inf days}
{using runtime limit of INFINITY seconds}
JSV "/opt/sge/default/common/shared-util/jsv.tcl" has been stopped
Your job 123456 ("QLOGIN") has been submitted
waiting for interactive job to be scheduled ...
Your interactive job 123456 has been successfully scheduled.
Establishing builtin session to host r28n41 ...
username@r28n41 ~ $
- Now you are connected to the host system which is equipped with two Intel Xeon Phi Cards called 'mic0' and 'mic1'. Connect to either of the cards with your ssh key via:username@r28n41 ~ $ ssh mic0
username@r28n41-mic0 ~ $
==== How to gather information about the mic cards ====
General information about the MIC cards:
$ /opt/intel/mic/bin/micinfo
Status information about many parameters like core frequency, cpu utilization, temperature etc.:
$ /opt/intel/mic/bin/micsmc -a
Current status of MIC cards; the cards can only be used if status is 'online':
$ micctrl -s
mic0: online (mode: linux image: /lib/firmware/mic/uos.img)
mic1: online (mode: linux image: /lib/firmware/mic/uos.img)
==== How to compile and run jobs on the mic cards ====
Using the intel compiler on the host system with the '-mmic' option:
user@r28n41 $ ifort -align array64byte -openmp -vec-report=3 -O3 -mmic helloflops3.f90 -o helloflops3f_xphi
helloflops3.f90(45): (col. 3) remark: LOOP WAS VECTORIZED.
helloflops3.f90(45): (col. 3) remark: PEEL LOOP WAS VECTORIZED.
helloflops3.f90(45): (col. 3) remark: REMAINDER LOOP WAS VECTORIZED.
helloflops3.f90(65): (col. 11) remark: LOOP WAS VECTORIZED.
helloflops3.f90(62): (col. 7) remark: loop was not vectorized: not inner loop.
helloflops3.f90(56): (col. 4) remark: loop was not vectorized: not inner loop.
SSH to one of the MIC cards, set appropriate environment variables, and run the code there:
user@r28n41 $ ssh mic0
user@r28n41-mic0 $ export LD_LIBRARY_PATH=/opt/intel/composerxe/lib/mic:$LD_LIBRARY_PATH
user@r28n41-mic0 $ export OMP_NUM_THREADS=240
user@r28n41-mic0 $ export KMP_AFFINITY=scatter
user@r28n41-mic0 $ ./helloflops3f_xphi
Initializing
Starting Compute on 240 threads
GFlops = 6144.000 Secs = 5.947 GFlops per sec = 1033.107
Setting the LD_LIBRARY_PATH variable is essential as the mics require special libraries for execution which are located at ''/opt/intel/composerxe/lib/mic''. The number of threads is optimally set to 4*(number of cores) but you can experiment with it, and the KMP_AFFINITY variable can also take the values ''balanced'' or ''compact'' as detailed here: http://software.intel.com/en-us/node/463210 or here: http://software.intel.com/en-us/node/463446.
==== Where to find additional information and examples =====
Supplementary information on the Xeon Phi platform and on the various programming models can be found e.g. at the following locations:
* http://go-parallel.com/
* http://software.intel.com/en-us/intel-parallel-universe-magazine (page 29 in the pdf)
* http://www.prace-ri.eu/Best-Practice-Guide-Intel-Xeon-Phi-HTML