Differences

no way to compare when less than two revisions

This shows you the differences between two versions of the page.

@@ Line 1: / Line 1: @@
+====== GPU computing and visualization =======
+The following GPU devices are available:
+^  Tesla c2050 (fermi)  ^^
+|  Total amount of global memory|2687 MBytes |
+|  (14) Multiprocessors, ( 32) CUDA Cores/MP|448 CUDA Cores |
+|  GPU Clock rate|1147 MHz |
+|  Maximum number of threads per block|1024 |
+|  Device has ECC support|Enabled |
+^  Tesla k20m (kepler)  ^^
+|  Total amount of global memory|4742 MBytes |
+|  (13) Multiprocessors, (192) CUDA Cores/MP|2496 CUDA Cores |
+|  GPU Clock rate|706 MHz |
+|  Maximum number of threads per block|1024 |
+|  Device has ECC support|Enabled |
+^  Tesla m60 (maxwell)  ^^
+|  Total amount of global memory|8114 MBytes |
+|  (16) Multiprocessors, (128) CUDA Cores/MP|2048 CUDA Cores |
+|  GPU Clock rate|1.18 GHz |
+|  Maximum number of threads per block|1024 |
+|  Device has ECC support|Disabled |
+^  Consumer grade GeForce GTX 1080 (pascal)  ^^
+|  Total amount of global memory|8113 MBytes |
+|  (20) Multiprocessors, (128) CUDA Cores/MP|2560 CUDA Cores |
+|  GPU Clock rate|1.73 GHz |
+|  Maximum number of threads per block|1024 |
+|  Device has ECC support|Disabled |
+  * One node, n25-009, equipped with two Tesla c2050 (fermi) GPUs. The host system includes two Intel Xeon X5650  @ 2.67GHz CPUs with 6 cores each and 24GB of RAM.
+  * Two nodes, n25-[005,006], with two Tesla k20m (kepler) GPUs each. Host systems are equipped with two Intel Xeon E5-2680 0 @ 2.70GHz each with 8 cores and 256GB of RAM.
+  * <html><font color=#cc3300>One node, n25-007,</font color=#cc3300></html> with two Tesla m60 (maxwell) GPUs. n25-007 is equipped with 2 Intel Xeon E5-2650 v3 @ 2.30GHz, each with 10 cores and a host memory of 256GB RAM, while n25-010 features 2 Intel Xeon E5-2660 v3 @ 2.60GHz, again both with 10 cores and a host memory of 128GB RAM.
+  * Ten nodes, n25-[011-020], with single GPUs of type GTX-1080 (pascal), where host systems are single socket 4-core Intel Xeon E5-1620 @ 3.5 GHz with 64GB RAM.
+  * Two shared-private nodes, n25-[021-022], each equipped with 8 GTX-1080 (pascal) devices hosted on dual socket 4-core Intel Xeon E5-2623 systems @ 2.6 GHz with 128GB RAM.
+<html><font color=#cc3300><sup>*</sup>effective September 22, 2017</font color=#cc3300></html>
+[[https://github.com/NVIDIA/gdrcopy|gdrdrv]] is loaded by default (also see [[https://devtalk.nvidia.com/default/topic/919381/gdrcpy-problem/|notes]] regarding gtx1080 cards).
+-------
+==== Slurm integration ====
+There is one partition called ''gpu'' which includes all available gpu nodes
+<code>[user@l31 ~]$ sinfo -p gpu
+PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
+gpu          up   infinite     10   alloc n25-[011-020]
+gpu          up   infinite      5   idle n25-[005-007,009-010]
+</code>
+and which needs to be specified via:
+<code>#SBATCH --partition=gpu</code>
+GPU nodes are selected via the **generic resource (--gres=)** and **constraints (-C,--constraint=)** options:
+  * c2050 (fermi) GPU node: <code>#SBATCH -C c2050
+#SBATCH --gres=gpu:2
+</code>
+  * k20m (kepler) GPU nodes: <code>#SBATCH -C k20m
+#SBATCH --gres=gpu:2
+</code>
+  * m60 (maxwell) GPU nodes: <code>#SBATCH -C m60
+#SBATCH --gres=gpu:2
+</code>
+  * gtx1080 (pascal) GPU nodes: <code>#SBATCH -C gtx1080
+#SBATCH --gres=gpu:1
+</code>
+  * at idle times of private-shared gtx1080 (pascal) GPU nodes: <code>#SBATCH --partition=p70971_gpu
+#SBATCH -C gtx1080
+#SBATCH --gres=gpu:8
+</code>
+To use a gpu node for computing purposes the quality of service (QoS) ''gpu_compute'' is available which provides a **maximum runtime of three days**:
+<code>#SBATCH --qos=gpu_compute</code>
+For visualization the QoS ''gpu_vis'' has to be used where a gpu node can be occupied for up to **twelve hours** for interactive visualization:
+<code>#SBATCH --qos=gpu_vis</code>
+When a job is submitted within the ''gpu_vis'' QoS, an X server is started on the gpu node.
+--------------------------------------------------
+===== Visualization ======
+To make use of a gpu node for visualization you need to perform the following steps.
+  - set a vnc password, this is needed when connecting to the vnc server, **this has to be done only once**: <code>module load TurboVNC/2.0.1
+mkdir ${HOME}/.vnc
+vncpasswd
+Password: ******
+Warning: password truncated to the length of 8.
+Verify:   ******
+Would you like to enter a view-only password (y/n)? n
+</code>
+  - allocate gpu nodes with this script: <code>sviz -a</code>
+  - start vnc server: <code>sviz -r</code>
+  - follow the instructions on the screen and connect from your local machine with a vncvier: <code>vncviewer -via <user>@vsc3.vsc.ac.at <node>::<port></code>
+All options for sviz:
+<code>
+sviz -h
+usage: /opt/sw/x86_64/generic/bin/sviz
+Parameters:
+-h      print this help
+-a      allocate gpu nodes
+-r      start vnc server on allocated nodes
+options for allocating:
+-t      set gpu type; default=gtx1080
+-n      set gpu count; default=1
+options for vnc server:
+-g      set geometry; default=1920x1080
+</code>
+==== Linux/Windows - Tightvnc ====
+On your local (Linux) workstation you can use any vnc client which supports a gateway parameter (usually, there is a ''-via'' option), e.g. TightVNC. You will be first asked for your VSC cluster password (and possibly your OTP if it has not been entered within the last 12 hours), and then for your VNC password which you entered in the previous step:
+<code>user@localhost:~$ vncviewer -via user@vsc3.vsc.ac.at n25-001::5901
+Password:
+Connected to RFB server, using protocol version 3.8
+Enabling TightVNC protocol extensions
+Performing standard VNC authentication
+Password:
+Authentication successful
+Desktop name "TurboVNC: n25-001:1 (user)"
+VNC server default format:
+bits per pixel.
+  Least significant byte first in each pixel.
+  True colour: max red 255 green 255 blue 255, shift red 16 green 8 blue 0
+Warning: Cannot convert string "-*-helvetica-bold-r-*-*-16-*-*-*-*-*-*-*" to type FontStruct
+Using default colormap which is TrueColor.  Pixel format:
+bits per pixel.
+  Least significant byte first in each pixel.
+  True colour: max red 255 green 255 blue 255, shift red 16 green 8 blue 0
+Tunneling active: preferring tight encoding</code>
+You should now see a desktop like this:
+{{:doku:tightvnc_screenshot.png?700|}}
+Windows versions of TightVNC are also available.
+==== OS X/Linux/Windows - TuboVNC ====
+Under OS X it is suggested to use the [[http://sourceforge.net/projects/turbovnc/files/|TurboVNC]] client; but it may be used under Linux or Windows as well.
+This is how you can setup the client connection to the VNC server:
+  - Setup the connection: {{:doku:turbovnc:turbovnc_setup.png?600|}}
+  - Enter your cluster password: {{:doku:turbovnc:turbovnc_clusterpw.png?300|}}
+  - Enter your OTP: {{:doku:turbovnc:turbovnc_otp.png?300|}}
+  - Enter your VNC password: {{:doku:turbovnc:turbovnc_vncpasswd.png?300|}}
+A desktop will be displayed on your screen:
+{{:doku:turbovnc:turbovnc_desktop.png|?650}}
+==== VirtualGL ====
+Load the module
+<code>
+module load VirtualGL/2.5.2
+</code>
+The following variables need to be set:
+<code>
+export VGL_DISPLAY=:0
+export DISPLAY=:1
+</code>
+To make use of VirtualGL your application needs to be started with ''vglrun'':
+<code>[user@n25-001 ~]$ vglrun <path_to_your_X_application></code>
+---------------------
+===== GPU computing =====
+==== CUDA ====
+Cuda toolkits are available in version 5.5, 7.5, 8.0.27 and 8.0.61 (which provide e.g. the nvcc compiler) and are accessible by loading the corresponding cuda module:
+<code>module load cuda/5.5</code>
+or
+<code>module load cuda/7.5</code>
+or
+<code>module load cuda/8.0.27</code>
+or
+<code>module load cuda/8.0.61</code>
+==== Batch jobs ====
+To submit batch jobs, a sample job script ''my_gpu_job_script'' is:
+<code>#!/bin/sh
+#SBATCH -J gpucmp
+#SBATCH -N 1
+#SBATCH --partition=gpu
+#SBATCH --qos=gpu_compute
+#SBATCH --time=00:10:00
+#SBATCH --gres=gpu:2
+#SBATCH -C k20m
+./my_code_which_runs_on_a_gpu
+</code>
+Submit the job:
+<code>[user@l31 ~]$ sbatch my_gpu_job_script</code>
+==== Controlling GPU utilization with nvidia-smi ====
+The standard way to check whether an application actually makes use of GPUs and to what extent is by calling
+<code>
+nvidia-smi
+</code>
+or better within a separate terminal
+<code>
+watch nvidia-smi
+</code>
+For further details please see the man page of ''nvidia-smi''
+==== CUDA C References ====
+{{:doku:cuda-docu:cuda_c_best_practices_guide.pdf|CUDA C best practices}}\\
+{{:doku:cuda-docu:cuda_c_programming_guide.pdf|CUDA C programming guide}}\\
+{{:doku:cuda-docu:cuda-gdb.pdf|CUDA GDB user manual}}\\
+==== CUDA Libraries References ====
+{{:doku:cuda-docu:cublas_library.pdf|CUBLAS library}}\\
+{{:doku:cuda-docu:cufft_library.pdf|CUFFT library}}\\
+{{:doku:cuda-docu:curand_library.pdf|CURAND library}}\\
+{{:doku:cuda-docu:cusparse_library.pdf|CUSPARSE library}}\\
+==== Additional Docu ====
+More details found in dir ''...cuda/8.0.61/doc/pdf'' upon loading the corresponding cuda module.