====== Linux Primer ====== * Article written by Balázs Lengyel (VSC Team)
(last update 2017-10-11 by bl). ==== Note ==== Please use the ''%%Table of Contents%%'' or your browsers ''%%quick search%%'' to find what you’re looking for, as this document is auto-generated from a presentation and context may not always be recognizable without the corresponding talk. ---- ===== Filesystems ===== ==== Filesystem 101 ==== The job of filesystems is to keep data in a structured way. Every filesystem has a filesystem root, directories and files. The filesystem root is a special object which acts as an entry-point for using that filesystem. All other objects (directories and files) are structured below the root. Two main concepts emerged for using more than one filesystem on a single machine:
  1. Special handling (a.k.a Drive letters)

    This is how Windows handles filesystems. It explicitly shows which drive you’re working on.

  2. Virtual Filesystem

    This is a more subtle approach, used by Linux, where you specify in the beginning (booting) which filesystem will be used for which part of the virtual filesystem.

Linux’ virtual filesystem tree represents all the files and directories that are reachable from the system. The nice part is that you can work on a Linux machine and not care about whether your file is on the network or on a local filesystem. The main difference for users is the performance delivered by different filesystems. This is how the (virtual) filesystem looks on Linux: - Everything starts at the root * the root is a directory * “''%%/%%''” denotes the root directory - the filesystem has different kinds of objects - files - directories * containers for multiple objects - links to objects, which either * add a second name for the same object * point to a position in the filesystem - objects can be referenced by their path * absolute: ''%%/dir1/dir2/object%%'' * relative: ''%%dir2/object%%'' - special objects in directories: * “''%%.%%''” is a reference to the directory itself * “''%%..%%''” is a reference to the parent directory - the system may consist of multiple filesystems * filesystems may be mounted at any (empty) directory ==== Further concepts ==== Next to basic storage and organization of data filesystems have different properties and functionality. Most filesystems provide a way to store and access attributes, different kinds of special files and some filesystems provide various advanced features: * Attributes * Ownership * Access rights * Filesystem limits * Size * Timestamps * Special files * device * fifo pipe * socket * Advanced FS features * data integrity * device management * sub-volume support ==== Filesystem tree ==== {{.:linux_directories.jpg}} ==== Special filesystems used on VSC ==== === NFS === * old and reliable network filesystem * much slower than any local filesystem * simultaneous usage possible === TMPFS === * very fast filesystem * uses RAM instead of other media * lost at shutdown The home directory of the user is located on an NFS filesystem, which ensures that all parts of the cluster have a consistent view of files. The filesystem behind the ''%%$SCRATCH%%'' variable is located on a tmpfs filesystem, which is a double-edged sword. On one hand it’s fast, but since it uses RAM as a storage device it does limit the amount of memory available for programs. Also you can only use data stored in a tmpfs only on the host itself. ===== Shell ===== ==== Prompt ==== > This is how the prompt looks by default: [myname@l3_ ~]$ * tells you: * who you are * which computer you’re on * which directory you’re in * can be configured * variable ''%%$PS1%%'' * default: ''%%echo $PS1%%'' > Ways to get help when you’re stuck: Most of the time a command doesn’t act as expected, it shows an error message. From this point you have multiple approaches: * Think about why the program failed - maybe you (un-)intentionally tried to force the program to do something it’s not intended for? * Just copy that message into your favorite search engine and don’t forget to remove the parts that are specific to your environment (e.g. directory and user names). * Most programs supports a ''%%-h%%''/''%%--help%%'' flag * ''%%man %%'' will be available for most programs too, if not ''%%man -K %%'' will search all man-pages that contain the keyword. (FYI: press ‘q’ to quit) * An alternative to ''%%man%%'' is ''%%info %%'', which is like a browser from the ’80s * Ask colleagues for help ==== Execution ==== To execute a program, we call it: gcc FizzBuzz.c -o FizzBuzz ./FizzBuzz false echo $? The examples show compiling a program, executing the result, trying to load a module on our cluster and checking if the previous command succeeded. * ''%%gcc%%'' is a program that is in a directory specified by the ''%%$PATH%%'' variable and will be found without specifying its exact location. * ''%%./FizzBuzz%%'' is a newly compiled executable, which is not found by looking at ''%%$PATH%%'', so we explicitly add ''%%./%%'', to show that we want to execute it from the current directory * ''%%module load non-existent-module%%'' fails, as the module command can’t find ''%%non-existent-module%%''. Whenever a command fails, its ''%%return value%%'' is set to a value other than zero. The manual for some commands has a map from return-value to error-description to aid the user debugging. * ''%%echo $?%%'' is a command that prints the return value of the previous command. === History === > Your shell keeps a log of all the commands you executed. * the ''%%history%%'' command is used to access this history * for fast reuse of commands try the ''%%-R%%'' keys or the ''%%%%'' ==== Parameters ==== > The default way to apply parameters to a program is to write a space separated list of parameters after the program when calling it. These parameters are either - Single-character - Multi-character - Strings where some parameters also take additional arguments. == Combining parameters == For most commands you can combine multiple single-character parameters. This doesn’t change the meaning of the parameters, but is limited to single-character parameters which don’t take extra arguments. COMMAND -j 2 -a -b -c COMMAND -j 2 -abc == Ordering parameters == One thing to look out for is the order of parameters. Most of the time no specific order is required, but you should look out for things like copying the target over the source file. Also watch out to keep parameters and their arguments together. COMMAND # OK COMMAND # PROBABLY WRONG COMMAND -j 2 --color auto # OK COMMAND -j auto --color 2 # PROBABLY WRONG ==== Escapes & Quotes ==== Whenever a parameter has to contain a character that is either unprintable or reserved for the shell, you can use: - Backslash escape: * Escapes a character, that would have a special meaning * Can be used inside of quotes - Double Quotes: * Similar to escaping all whitespace characters - Single Quotes: * Additionally prevents expansion of variables COMMAND This\ is\ a\ single\ parameter COMMAND "This is a single parameter" COMMAND 'This is a single parameter' ==== Aliases ==== You can define aliases in your shell. These are usually used to shorten names for commands which are used often with a fixed set of parameters or where you have to be careful to get things right. These aliases are accessable as if they were commands. alias ll='ls -alh' alias rm='rm -i' alias myProject='cd $ProjectDir; testSuite; compile && testSuite; cd -' After that, you can use the aliases synonymously. ll # Same as 'ls -alh' rm # Same as 'rm -i' myProject # Same as 'cd $ProjectDir; testSuite; compile && testSuite; cd -' ==== Patterns ==== Patterns are an easy way of defining multiple arguments, which are mostly the same. The pattern will match anything in it’s place. The other concept is a expansion. In this case only defined patterns will be matched. * the most important patterns are: * **?** — matches one character * ***** — matches any character sequence * the most important expansions are: * **A{1,9}Z** — expands to A1Z A9Z * **A{1..9}Z** — expands to A1Z A2Z … A9Z You can try these commands and see what they do. These are all totally safe, even if you modify the arguments. ls file.??? ls *.* echo {{A..Z},{a..z}} echo {{A,B},{X,Y}} echo {A..Z},{a..z} echo {A,B}{X,Y} ==== Regular Expressions ==== Often you need to specify some string, but patterns and expansions aren’t enough, to cover all possibilities. In these cases you can use a regular expression also known as regex. These regexes are used by editors for search and replace, the ''%%egrep%%'' command for filtering through files and inside many scripts to validate parameters. .+ # Match any character, once or more \. # match a dot (A|a)p{2}le # apple, Apple ^[^aeiouAEIOU]+$ # any line of only non-vowels For a detailed explanation see [[https://en.wikipedia.org/wiki/Regular_expression|Wikipedia]], [[https://www.regular-expressions.info/|Regular-expressions.info]] or try [[https://regex101.com/|regex101]]. If you want to challenge yourself, try [[https://regexcrossword.com/|Regex Crossword]]! ==== Control Flow ==== In the shell language there are a few ways to organize the execution path. The most important ones are: - chaining of commands - looping constructs - conditionals (if/case) === Chaining Commands === The simplest mechanism for control flow is to chain commands together in a simple ''%%if COMMAND then NEXTCOMMAND else ERRORCOMMAND%%''. Since this would be cumbersome to write, most shells provide simple syntax for this: ''%%COMMAND && NEXTCOMMAND || ERRORCOMMAND%%'' and if a command should be run without relying on the return value of its predecessor it’s written: ''%%COMMAND; NEXTCOMMAND%%''. And if you only want to execute further commands in one case (but not the other), you don’t even have to specify both branches. false ; echo "Should I be Printed?" false && echo "Should I be Printed?" false || echo "Should I be Printed?" Should I be Printed? Should I be Printed? === Loops === The other way to execute commands conditionally are loops. You can loop over files, numeric arguments, until a either the loop condition is false or a break is encountered. for i in * do mv $i{,.bak} done while true do echo "Annoying Hello World" sleep 3 done for i in *; do mv $i{,.bak}; done while true; do echo "Annoying Hello World"; sleep 3; done === Conditionals === ''%%If%%'' is similar to the previous chaining of commands, except that it is more verbose and nicer to read if you have many commands to execute one branch of the decision. For more conditions the ''%%elif%%'' (else if) statement can be used. If you use a lot of ''%%elif%%''s and you only check one variable with them, you should consider using a ''%%case%%'' statement. if [ $VARIABLE1 ] then COMMAND1 elif [ $VARIABLE2 ] COMMAND2 else COMMAND3 fi ''%%Case%%'' statements are for querying all states of a single variable and making a decision based on that. It can match some simple expansions, which do NOT follow the general syntax of bash expansions. Also it processes alternative matches when seperated with ''%%|%%'' (pipe character). case $VARIABLE in [0-9] | [1-2][0-9]) COMMAND1 ;; *) COMMAND2 ;; esac ==== Streams ==== === Redirects === > Write **output** to a **file** or **file-descriptor** ^Command^Redirect ^Append ^Description ^ |program|''%%> std.log%%''|''%%>> std.log%%''|redirect ''%%stdout%%'' to a file | |program|''%%2> err.log%%''|''%%2>> err.log%%''|redirect ''%%stderr%%'' to a file | |program|''%%2>&1%%'' | |redirect ''%%stderr%%'' to ''%%stdout%%''| === Pipes === > Write **output** into the **input**-stream of another process ^Command^Pipe ^Description ^ |program|''%%| grep -i foo%%'' |pipe ''%%stdout%%'' into ''%%grep%%''| |program|''%%| tee file1 file2%%''|overwrite files and ''%%stdout%%'' | |program|''%%| tee -a file%%'' |append to files and ''%%stdout%%'' | ===== Environment Variables ===== ==== Setting, getting and unsetting ==== === Set === LANG=en_US.UTF-8 bash export LANG=en_US.UTF-8 === Get === env echo ${LANG} echo $PWD === Unset === unset LANG env -u LANG ==== Use cases ==== > Some variables that could affect you are: $EDITOR # the default editor for the CLI $PAGER # utility to read long streams $PATH # program paths, in priority order > if you’re aiming for programming, these could be more interesting: $LIBRARY_PATH # libraries to link by the compiler $LD_LIBRARY_PATH # libraries to link at runtime $CC # sometimes used to set default C compiler $CFLAGS # default flags for compiling C

> if you have a lot of self-compiled binaries: export PATH="./:$HOME/bin/:$PATH"
===== Scripting ===== ==== Ownership and Permissions ==== > Just to ensure that you are able to run your scripts === chown === > Change the owner of files and directories by: # only works with root privileges chown user file chown -R user:group dirs files === chmod === > Change the mode of files and directories by: chmod -R u=rwx,g+w,o-rwx dirs files chmod 640 files chmod 750 dirs chmod 750 executables * there are three bits for: * read (4) * write (2) * execute (1) * three times for: * user * group * other ==== Shebang ==== A little test program, which we mark as executable and hand it over to the corresponding interpreter: cat << EOF > test.sh echo "${LANG}" echo "${PATH}" EOF chmod +x test.sh bash test.sh > Don’t we have an OS, capable of executing everything it recognizes as an executable? > Yes, we do! cat << EOF > test.sh #!/bin/bash echo "${LANG}" echo "${PATH}" EOF chmod +x test.sh ./test.sh ==== Functions (more like procedures) ==== Programming in bash would be cumbersome without functions, so here we go: allNumbersFromTo () { echo "1 2 3" } > This isn’t good, as were only getting a fixed amount of numbers. Let’s try a recursive approach: allNumbersFromTo () { num=$1 max=$2 echo "${num}" if [ $num -lt $max ]; then allNumbersFromTo "$(($num + 1))" $max fi } allNumbersFromTo () { min=$1 max=$2 for num in $(seq $min $max) do echo "${num}" done } allNumbersFromTo 1 10 ===== .bashrc ===== ==== .bashrc ==== # .bashrc # Source global definitions if [ -f /etc/bashrc ]; then . /etc/bashrc fi # User specific aliases and functions alias sq='squeue -u $USER' alias rm='rm -i' export PATH="./:$HOME/bin:$PATH"