This is an old revision of the document!
Running jobs on Intel Xeon Phi
Xeon Phi model specifications
There are two nodes (r28n41, r28n42) installed in VSC-1 each equipped with two Intel Xeon Phi cards. The card models are from the Intel Xeon Phi Coprocessor 5100 Series with 60 cores running at 1.053GHz with four threads each (240 Threads in total). More detailed specifications can be found here: http://ark.intel.com/products/71992/Intel-Xeon-Phi-Coprocessor-5110P-8GB-1_053-GHz-60-core
The host systems r28n41 and r28n42 have two Cpu sockets each, equipped with 8-core sandy bridge E5-2680 @ 2.70GHz and 256GB of memory.
How to submit jobs to the mic.q on VSC-1
- You need to be member of the 'mic' group. Please contact system administration specifying the username which should be added.
- In order to access the Intel MIC cards you need to generate a ssh key as e.g. described in Generating a ssh key pair. It is not possible to access the MICs with a password.
- When submitting a job script, you need to specify the project 'mic' and the queue 'mic.q' by adding the following lines:
#$ -P mic #$ -q mic.q #$ -pe mpich 16 (or 8)
then submit your job using the
qsub.py
wrapper script:$ qsub.py jobscript you are member of requested group mic {number of slots is 16} {Parallel job requests a multiple of 8 slots.} {Host exclusive access granted !} {using project: mic} {no job runtime limit found, assuming default value Inf days} {using runtime limit of INFINITY seconds} Your job 123456 ("Jobname") has been submitted
- Using
qlogin
orqrsh
you have to specify the options on the command line:$ qlogin.py -q mic.q -P mic -pe mpich 16 local configuration l01 not defined - using global configuration JSV "/opt/sge/default/common/shared-util/jsv.tcl" has been started {number of slots is 16} {Parallel job requests a multiple of 8 slots.} {Host exclusive access granted !} {using project: mic} {no job runtime limit found, assuming default value Inf days} {using runtime limit of INFINITY seconds} JSV "/opt/sge/default/common/shared-util/jsv.tcl" has been stopped Your job 123456 ("QLOGIN") has been submitted waiting for interactive job to be scheduled ... Your interactive job 123456 has been successfully scheduled. Establishing builtin session to host r28n41 ... username@r28n41 ~ $
- Now you are connected to the host system which is equipped with two Intel Xeon Phi Cards called 'mic0' and 'mic1'. Connect to either of the cards with your ssh key via:
username@r28n41 ~ $ ssh mic0 username@r28n41-mic0 ~ $
How to gather information about the mic cards
General information about the MIC cards:
$ /opt/intel/mic/bin/micinfo
Status information about many parameters like core frequency, cpu utilization, temperature etc.:
$ /opt/intel/mic/bin/micsmc -a
Current status of MIC cards; the cards can only be used if status is 'online':
$ micctrl -s mic0: online (mode: linux image: /lib/firmware/mic/uos.img) mic1: online (mode: linux image: /lib/firmware/mic/uos.img)
How to compile and run jobs on the mic cards
Using the intel compiler on the host system with the '-mmic' option:
user@r28n41 $ ifort -align array64byte -openmp -vec-report=3 -O3 -mmic helloflops3.f90 -o helloflops3f_xphi helloflops3.f90(45): (col. 3) remark: LOOP WAS VECTORIZED. helloflops3.f90(45): (col. 3) remark: PEEL LOOP WAS VECTORIZED. helloflops3.f90(45): (col. 3) remark: REMAINDER LOOP WAS VECTORIZED. helloflops3.f90(65): (col. 11) remark: LOOP WAS VECTORIZED. helloflops3.f90(62): (col. 7) remark: loop was not vectorized: not inner loop. helloflops3.f90(56): (col. 4) remark: loop was not vectorized: not inner loop.
SSH to one of the MIC cards, set appropriate environment variables, and run the code there:
user@r28n41 $ ssh mic0 user@r28n41-mic0 $ export LD_LIBRARY_PATH=/opt/intel/composerxe/lib/mic:$LD_LIBRARY_PATH user@r28n41-mic0 $ export OMP_NUM_THREADS=240 user@r28n41-mic0 $ export KMP_AFFINITY=scatter user@r28n41-mic0 $ ./helloflops3f_xphi Initializing Starting Compute on 240 threads GFlops = 6144.000 Secs = 5.947 GFlops per sec = 1033.107
Setting the LD_LIBRARY_PATH variable is essential as the mics require special libraries for execution which are located at /opt/intel/composerxe/lib/mic
. The number of threads is optimally set to 4*(number of cores) but you can experiment with it, and the KMP_AFFINITY variable can also take the values balanced
or compact
as detailed here: http://software.intel.com/en-us/node/463210 or here: http://software.intel.com/en-us/node/463446.
Where to find additional information and examples
Supplementary information on the Xeon Phi platform and on the various programming models can be found e.g. at the following locations:
- http://software.intel.com/en-us/intel-parallel-universe-magazine (page 29 in the pdf)