Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
doku:ai_intro [2024/03/11 12:50] – mpfister | doku:ai_intro [2024/05/16 15:06] (current) – PuTTY and WinSCP instruction for jump host mpfister | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== AI on VSC ====== | + | ====== |
- | < | + | VSC5 is a high performance cluster that consists of different kind of nodes. When you login using SSH, you reach one of the login nodes that can be used to prepare the software environment, |
- | This page is still a work in progress. | + | |
- | </box> | + | |
- | VSC5 is a high performance cluster that consists | + | If you are outside |
+ | |||
+ | {{: | ||
+ | |||
+ | To copy files to and from VSC, you can use SFTP to connect to '' | ||
+ | |||
+ | {{: | ||
At VSC5, SLURM is used as scheduler for queuing jobs. You can find an introduction to SLURM in the course material of the VSC introduction course at: | At VSC5, SLURM is used as scheduler for queuing jobs. You can find an introduction to SLURM in the course material of the VSC introduction course at: | ||
Line 12: | Line 16: | ||
[[https:// | [[https:// | ||
- | But to make things easier, | + | But to make things easier, a summary of the most important commands |
Every SLURM job needs a job description file. These files are kind of shell scripts, but with some additional boilerplate at the top. Here is an example file: | Every SLURM job needs a job description file. These files are kind of shell scripts, but with some additional boilerplate at the top. Here is an example file: | ||
<file bash gpu_job_template.slurm> | <file bash gpu_job_template.slurm> | ||
#!/bin/bash | #!/bin/bash | ||
- | # | + | |
- | #SBATCH --job-name=GPU_job | + | ## Specify job name: |
+ | #SBATCH --job-name=GPU_job | ||
+ | |||
+ | ## Specify GPU: | ||
## For Nvidia A40: | ## For Nvidia A40: | ||
- | #SBATCH --partition=zen2_0256_a40x2 | + | ##SBATCH --partition=zen2_0256_a40x2 |
- | #SBATCH --qos=zen2_0256_a40x2 | + | ##SBATCH --qos=zen2_0256_a40x2 |
## For Nvidia A100: | ## For Nvidia A100: | ||
- | ##SBATCH --partition=zen3_0512_a100x2 | + | #SBATCH --partition=zen3_0512_a100x2 |
- | ##SBATCH --qos=zen3_0512_a100x2 | + | #SBATCH --qos=zen3_0512_a100x2 |
- | #SBATCH --time=0-01: | + | |
+ | ## Specify | ||
+ | ## Note: Job will be killed once the run time limit is reached. | ||
+ | ## Shorter values might reduce queuing time. | ||
+ | #SBATCH --time=3-00: | ||
+ | |||
+ | ## Specify number of GPUs (1 or 2): | ||
#SBATCH --gres=gpu: | #SBATCH --gres=gpu: | ||
- | ##Optional: Get notified via mail when the job runs and finishes: | + | |
+ | ## Optional: Get notified via mail when the job runs and finishes: | ||
##SBATCH --mail-type=ALL | ##SBATCH --mail-type=ALL | ||
- | ##SBATCH --mail-user=martin.pfister@tuwien.ac.at | + | ##SBATCH --mail-user=user@example.com |
+ | |||
+ | # Start in a clean environment | ||
+ | module purge | ||
+ | # List available GPUs: | ||
nvidia-smi | nvidia-smi | ||
+ | |||
+ | # Load conda: | ||
module load miniconda3 | module load miniconda3 | ||
eval " | eval " | ||
+ | |||
+ | # Load a conda environment with Python 3.11.6, PyTorch 2.1.0, TensorFlow 2.13.1 and other packages: | ||
+ | export XLA_FLAGS=--xla_gpu_cuda_data_dir=/ | ||
conda activate / | conda activate / | ||
+ | |||
+ | # Run AI scripts: | ||
python -c " | python -c " | ||
</ | </ |