====== Python / Conda at VSC ======
Previously we recommended using spack python packages, however we decided to **deprecate** this approach since it is much easier and user friendly to use '''conda''' which is already widely known.
Have a look at the new **quickstart guide** below or the material from the python4hpc training: [[https://gitlab.tuwien.ac.at/vsc-public/training/python4hpc|python4hpc public repo]].
The old approach is now //deprecated// but kept on this page for reference.
===== Quickstart: Using Python with Conda =====
To install a new environment, start with creating an **environment yaml file** and store it in a safe location (ideally in your source code repository).
This step is optional but we really **recommend** it so you and everyone in your team is able to produce the same python environment.
To find valid conda package names look at [[https://anaconda.org/search|Anaconda repo package search]].
Example (add your own packages below dependencies):
name: my-env
channels:
- conda-forge
dependencies:
- python=3.10
- tensorflow=2.15.0
You can the install your environment by using the following commands:
# load the miniconda package
$ module load miniconda3
# executes bash hooks for the tool to function
# and enters the "base" environment
$ eval "$(conda shell.bash hook)"
# create your environment via environment file
# this will place the environment in your ~/.conda/envs folder
# note:
# - make sure to use "conda >env< create" - "conda create" is a different command
# - the name is taken from the yaml file; a custom name can be specified with "-n my-custom-name"
(base) $ conda env create --file my_env.yaml
# after creation you can activate the environment to run code in it
(base) $ conda activate my_env
# test python version to make sure we have the right one
(my_env) $ python --version
Python 3.10.11
You can now start developing and even use the environment in a slurm script
#!/bin/bash
#SBATCH --job-name=slurm_conda_example
#SBATCH --time=00-00:05:00
#SBATCH --ntasks=2
#SBATCH --mem=2GB
# modify SBATCH options according to needs
# see "Setup Conda" above or consult "module avail miniconda3" to get the right package name
module load miniconda3
eval "$(conda shell.bash hook)"
conda activate myenv
# print out some info of the python executable in use
# this should point to the python version from "myenv"
which python
python --version
==== Submitting jobs from JupyterHub ====
Due to the nature of how JupyterHub & Jupyter Notebooks are working, submitting a job in a Jupyter Terminal from our JupyterHub instance requires a bit more work. Jupyter itself is a Python application and requires a certain setup to see our slurm infrastructure. In addition the whole jupyter server is running inside a Slurm Job, so you have to unload the current (conda) environment & unset the slurm environment variables to successfully submit a Job without interference.
We recommend to create a script (e.g. "unload_jupyter_env.sh") and put the following code inside it
#!/usr/bin/env bash
##
# source this script from a jupyter terminal or notebook cell
# to unset all jupyter related env variables and functions
##
if conda -V >/dev/null 2>&1; then
eval "$(conda shell.bash hook)"
for i in $(seq ${CONDA_SHLVL}); do
conda deactivate
done
echo "deactivated all conda envs ..."
else
echo "no conda found."
fi
PREVIOUS_IFS="$IFS"
IFS=$'\n'
SLURM_VARS=$( env | sort | grep -E "^SLURM_.*=" | sed "s/=.*//g" )
for var in $SLURM_VARS; do
unset $var
done
echo "unset all SLURM_* env variables ..."
IFS="$PREVIOUS_IFS"
You have to source this script only once from a Jupyter Terminal before using "sbatch"
$ source unload_jupyter_env.sh
$ sbatch my_job_script.sh
===== More info about Conda =====
For more information about conda check-out the conda notebook of the python4HPC training material: [[https://gitlab.tuwien.ac.at/vsc-public/training/python4hpc/-/blob/main/D1_02_env_03_conda.ipynb|python4HPC Conda Environments]]
===== FAQ =====
* **'''eval "$(conda shell.bash hook)"''' fails with '''CommandNotFoundError ...'''**
Make sure to use the currently provided '''miniconda3''' module via '''module load miniconda3'''
* **I dont get the right python version and packages in my slurm batch file environment** - **What is wrong>?**
Make sure that one of the first things in your sbatch script is loading the miniconda3 package '''module load miniconda3''' and that you execute '''eval "$(conda shell.bash hook)"''' before using any other conda commands.
* **I don't get GPU/CUDA enabled packages when installing a conda environment** / **I get conda installation errors when i select GPU/CUDA enabled builds for my conda environment***
In order to install packages from conda-forge that require CUDA on a machine that does not have GPUs you have to set an environment variable before creating the conda environment:
name: my-pytorch-gpu-env
channels:
- pytorch
- conda-forge
dependencies:
- python=3.12
- pytorch=2.2.*=*cuda11.8*
# for example if you use cuda 11.8
CONDA_OVERRIDE_CUDA="11.8" conda env create -n my-pytorch-gpu-env --file my-pytorch-gpu-env.yaml
====== Deprecated Information (work in progress) ======
===== Python Installations =====
Python is comparatively fast evolving programming language, so different versions behave very differently. We provide multiple varieties of ''python'' installations, please always use [[doku:spack]] to find and load them.
===== Python Packages =====
The VSC team makes sure to have the most used packages readily available via spack. The installed python packages are always named in the following format ''py-mypackagename'' e.g. ''py-numpy'' or ''py-scipy''. If you can, please always consider using those packages first. See [[doku:spack]] on how to find and load them.
There are many additional python packages, some with long dependency chains. Because of this we simply cannot install all of them for all the different python versions we provide. If you need a specific package and its a very popular one, consider dropping us a mail so we can make it generally available.
===== Using virtual environments =====
Apart from loading packages via ''spack'' you should always consider creating a **virtual environment** for your project. This way it will be easier to install other packages / specific package versions and its also possible to exactly track them to produce consistent results. For most of the python packages this is the easiest way to get you up and running in no time.
Note: Before you start, make sure that you have loaded the python version you need! The virtual environment will be created using this version.
cd my_project_folder
python -m venv venv --system-site-packages
source venv/bin/activate
pip install autopep8
The above commands create a new virtual environment in the folder 'venv' (including the system provided packages), activate it and install the package ''autopep8'' into it.
To be able to reproduce the venv, consider specifying the exact versions of the packages as well as tracking your packages in a ''requirements.txt'' file (also see [[https://gitlab.tuwien.ac.at/vsc-public/training/python4hpc/-/blob/main/D1_08_CPU_Development_Tools_Lecture.ipynb|python4HPC Development Tools Lecture]] and [[https://packaging.python.org/en/latest/guides/installing-using-pip-and-virtual-environments/#creating-a-virtual-environment|Installing packages using pip and virtual environments]] for more information).
===== Using Conda =====
Sometimes, especially in a scientific context, there will be cases were you cannot use ''pip'' e.g. some packages need to be compiled. Again this creates the problem that we simply cannot each and every package and package version in our infrastructure.
In this case you can use ''conda'' (see [[https://anaconda.org/|Anaconda]]) instead of ''pip'' to set up a consistent local python environment. Conda provides ready made binary distributions for many scientific packages and can thus be used to circumvent this problem.
==== Setup Conda ====
To use the ''conda'' tool on our clusters search for the ''miniconda3'' package. At the time of writing the ''miniconda3'' package is available in all environments on both VSC-4 & VSC-5 and can be loaded with
module load miniconda3
If you plan to use conda more frequently you can simple add the load statement to your ''~/.bashrc'' file to have it loaded automatically after logging in.
=== Optional: execute conda runtime hooks on login ===
To fully utilize the conda command it needs to load runtime hooks. If you want to have conda completely initialized directly after logging in you can execute the following statements to add the conda startup code to your ''~/.bashrc'' file as well.
conda init bash --dry-run --verbose | grep "# >>> conda initialize" -A 100 | grep "# <<< conda initialize" -B 100 | sed 's/+//g' >> ~/.bashrc
source ~/.bashrc
After executing these steps you will see that your prompt changed to ''(base) [myname@l51 ~]$'' which signifies that conda is active and the ''base'' environment is active.
If you already have an environment you can also add ''conda activate myenv'' to your ''~/.bashrc'' file
==== Channels ====
The ''defaults'' conda channel points to anacondas repository which **requires licensing (since 2024)** and does not always contain the latest packages. There are a number of community driven channels (e.g. ''conda-forge'') that have many more packages and most of the time the newer versions readily available.
Popular channels
- ''conda-forge'' - [[https://conda-forge.org/feedstock-outputs/]]
- ''bioconda'' - [[https://bioconda.github.io/conda-package_index.html]]
To use e.g. ''conda-forge'' you need to specify ''--channel conda-forge'' when executing conda install commands.
If you want to set e.g. ''conda-forge'' as default for your user you can achieve this by executing the following statement:
conda config –remove channels defaults
conda config --add channels conda-forge
==== Create your own custom conda environment ====
=== Create conda env using environment files ===
This is the recommended method to create new conda environments since it makes environment creation reproducible. The file can be easily shared or e.g. added to your version control system
First decide which python version and packages you need. In case you don't know the exact versions upfront you can also just create a first draft of the env-file without pinned version to get the latest libraries. With that information in mind we now write our environment file called "myenv.yml"
name: myenv
channels:
- conda-forge
dependencies:
- python=3.10
- pytorch=1.13.1
After this we run the conda solver to create and install the new environment
conda env create -f myenv.yml
Conda will now take its time and solve the environment and then download and install the packages. After this has been done the environment can be activated with
conda activate myenv
=== Create conda env from commandline ===
**Note:** this method is only for illustrating how conda works and is not recommended since it does not create a reproducible environment specification
In order to create your own user environment you need to do the following steps. To also give a short example for a package which we do not provide via spack we will install ''phono3py'' (available on[[https://anaconda.org/conda-forge|conda forge]]) into our conda environment ''(myenv)'' with conda:
# create conda env 'myenv', set conda-forge channel as default and use the latest python 3.11
conda create --name myenv --channel conda-forge python=3.11
conda activate myenv
conda install --channel conda-forge numpy phono3py
With the above statements conda will create a new environment, activate it and install the requested packages into it. You should see that your prompt now changed to
''(myenv) [myname@l51 ~]$'''
The following commands provide a bit of introspection to make sure that everything is setup as expected:
(myenv) [myname@l51 ~]$ which python
~/.conda/envs/myenv/bin/python
(myenv) [myname@l51 ~]$ python --version
Python 3.11.0
(myenv) [myname@l51 ~]$ which phono3py
~/.conda/envs/myenv/bin/phono3py
Starting python in this conda environment ''(myenv)'' and loading the packages also works:
(myenv) myname@l51:~$ python
Python 3.11.0 | packaged by conda-forge | (main, Jan 14 2023, 12:27:40) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> import phono3py
>>> exit()
=== Pytorch ===
To install the latest pytorch with conda please follow the instructions on this page:
[[https://pytorch.org/get-started/locally/|Latest Pytorch installation]]
For older and specific combinations of pytorch and cuda please have a look at this page:
[[https://pytorch.org/get-started/previous-versions/#v182-with-lts-support|Previous Pytorch versions]]
Eg.: to install v1.13.1 with cuda 11.6 use:
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.6 -c pytorch -c nvidia
As of 2023-08-25: you can also install the current pytorch version with an older cuda version via:
conda install pytorch torchvision torchaudio cudatoolkit=11.6 -c pytorch -c nvidia
===== SLURM =====
See the following minimal example to use conda with slurm in your batch script. See [[doku:slurm]] for detailed information about slurm in general.
#!/bin/bash
#SBATCH --job-name=slurm_conda_example
#SBATCH --time 00-00:05:00
#SBATCH --ntasks=2
#SBATCH --mem=2GB
# modify SBATCH options according to needs
# see "Setup Conda" above or consult "module avail miniconda3" to get the right package name
module load MINICONDA3_PACKAGE_NAME
eval "$(conda shell.bash hook)"
conda activate myenv
# print out some info of the python executable in use
# this should point to the python version from "myenv"
which python
python --version
===== JupyterHub =====
In case you need visualization capabilities or you need to do some preprocessing also consider using our JupyterHub service [[doku:jupyterhub]].
Please note that you should still use slurm and batch processing for actual computation runs since JupyterHub is mainly reserved for interactive use and runs on shared nodes.