There are two nodes (r28n41, r28n42) installed in VSC-1 each equipped with two Intel Xeon Phi cards. The card models are from the Intel Xeon Phi Coprocessor 5100 Series with 60 cores running at 1.053GHz with four threads each (240 Threads in total). More detailed specifications can be found here: http://ark.intel.com/products/71992/Intel-Xeon-Phi-Coprocessor-5110P-8GB-1_053-GHz-60-core
The host systems r28n41 and r28n42 have two Cpu sockets each, equipped with 8-core sandy bridge E5-2680 @ 2.70GHz and 256GB of memory.
#$ -P mic #$ -q mic.q #$ -pe mpich 16 (or 8)
then submit your job using the qsub.py
wrapper script:
$ qsub.py jobscript you are member of requested group mic {number of slots is 16} {Parallel job requests a multiple of 8 slots.} {Host exclusive access granted !} {using project: mic} {no job runtime limit found, assuming default value Inf days} {using runtime limit of INFINITY seconds} Your job 123456 ("Jobname") has been submitted
qlogin
or qrsh
you have to specify the options on the command line:$ qlogin.py -q mic.q -P mic -pe mpich 16 local configuration l01 not defined - using global configuration JSV "/opt/sge/default/common/shared-util/jsv.tcl" has been started {number of slots is 16} {Parallel job requests a multiple of 8 slots.} {Host exclusive access granted !} {using project: mic} {no job runtime limit found, assuming default value Inf days} {using runtime limit of INFINITY seconds} JSV "/opt/sge/default/common/shared-util/jsv.tcl" has been stopped Your job 123456 ("QLOGIN") has been submitted waiting for interactive job to be scheduled ... Your interactive job 123456 has been successfully scheduled. Establishing builtin session to host r28n41 ... username@r28n41 ~ $
username@r28n41 ~ $ ssh mic0 username@r28n41-mic0 ~ $
General information about the MIC cards:
$ /opt/intel/mic/bin/micinfo
Status information about many parameters like core frequency, cpu utilization, temperature etc.:
$ /opt/intel/mic/bin/micsmc -a
Current status of MIC cards; the cards can only be used if status is 'online':
$ micctrl -s mic0: online (mode: linux image: /lib/firmware/mic/uos.img) mic1: online (mode: linux image: /lib/firmware/mic/uos.img)
Using the intel compiler on the host system with the '-mmic' option:
user@r28n41 $ ifort -align array64byte -openmp -vec-report=3 -O3 -mmic helloflops3.f90 -o helloflops3f_xphi helloflops3.f90(45): (col. 3) remark: LOOP WAS VECTORIZED. helloflops3.f90(45): (col. 3) remark: PEEL LOOP WAS VECTORIZED. helloflops3.f90(45): (col. 3) remark: REMAINDER LOOP WAS VECTORIZED. helloflops3.f90(65): (col. 11) remark: LOOP WAS VECTORIZED. helloflops3.f90(62): (col. 7) remark: loop was not vectorized: not inner loop. helloflops3.f90(56): (col. 4) remark: loop was not vectorized: not inner loop.
SSH to one of the MIC cards, set appropriate environment variables, and run the code there:
user@r28n41 $ ssh mic0 user@r28n41-mic0 $ export LD_LIBRARY_PATH=/opt/intel/composerxe/lib/mic:$LD_LIBRARY_PATH user@r28n41-mic0 $ export OMP_NUM_THREADS=240 user@r28n41-mic0 $ export KMP_AFFINITY=scatter user@r28n41-mic0 $ ./helloflops3f_xphi Initializing Starting Compute on 240 threads GFlops = 6144.000 Secs = 5.947 GFlops per sec = 1033.107
Setting the LD_LIBRARY_PATH variable is essential as the mics require special libraries for execution which are located at /opt/intel/composerxe/lib/mic
. The number of threads is optimally set to 4*(number of cores) but you can experiment with it, and the KMP_AFFINITY variable can also take the values balanced
or compact
as detailed here: http://software.intel.com/en-us/node/463210 or here: http://software.intel.com/en-us/node/463446.
Supplementary information on the Xeon Phi platform and on the various programming models can be found e.g. at the following locations: