Scripts, functions, and variables

Shell scripts

We now know a lot of UNIX commands! Wouldn’t it be great if we could save certain commands so that we could run them later or not have to type them out again? As it turns out, this is extremely easy to do. Saving a list of commands to a file is called a “shell script”. These shell scripts can be run whenever we want, and are a great way to automate our work.

$ cd ~/Desktop/data-shell/molecules
$ nano process.sh
	#!/bin/bash         # this is called sha-bang; can be omitted for generic (bash/csh/tcsh) commands
	echo Looking into file octane.pdb
	head -15 octane.pdb | tail -5       # what does it do?
$ bash process.sh   # the script ran!

Alternatively, you can change file permissions:

$ chmod u+x process.sh
$ ./process.sh

Let’s pass an arbitrary file to it:

$ nano process.sh
	#!/bin/bash
	echo Looking into file $1       # $1 means the first argument to the script
    head -15 $1 | tail -5
$ ./process cubane.pdb
$ ./process propane.pdb

head -15 “$1” | tail -5 # placing in double-quotes lets us pass filenames with spaces
head $2 $1 | tail $3 # what will this do?
$# holds the number of command-line arguments
$@ means all command-lines arguments to the script (words in a string)

Question `file permissions`

Let’s talk more about file permissions.

Question 34

In the molecules directory (download link mentioned here), create a shell script called scan.sh containing the following:

#!/bin/bash
head -n $2 $1
tail -n $3 $1

While you are in that current directory, you type the following command (with space between two 1s):

./scan.sh  '*.pdb'  1  1

What output would you expect to see?

All of the lines between the first and the last lines of each file ending in .pdb in the current directory
The first and the last line of each file ending in .pdb in the current directory
The first and the last line of each file in the current directory
An error because of the quotes around *.pdb

You can watch a video for this topic after the workshop.

If statements

Let’s write and run the following script:

$ nano check.sh
    for f in $@
    do
      if [ -e $f ]      # make sure to have spaces around each bracket!
      then
        echo $f exists
      else
        echo $f does not exist
      fi
    done
$ chmod u+x check.sh
$ ./check.sh a b c check.sh

Full syntax is:

if [ condition1 ]
then
  command 1
  command 2
  command 3
elif [ condition2 ]
then
  command 4
  command 5
else
  default command
fi

Some examples of conditions (make sure to have spaces around each bracket!):

[ $myvar == 'text' ] checks if variable is equal to ’text'
[ $myvar == number ] checks if variable is equal to number
[ -e fileOrDirName ] checks if fileOrDirName exists
[ -d name ] checks if name is a directory
[ -f name ] checks if name is a file
[ -s name ] checks if file name has length greater than 0

Question 23

Write a script that complains when it does not receive arguments.

Variables

We already saw variables that were specific to scripts ($1, $@, …) and to loops ($file). Variables can be used outside of scripts:

$ myvar=3        # no spaces permitted around the equality sign!
$ echo myvar     # will print the string 'myvar'
$ echo $myvar    # will print the value of myvar

Sometimes you can see the notation:

$ export myvar=3

Using ’export’ will make sure that all inherited processes of this shell will have access to this variable. Try defining the variable newvar without/with ’export’ and then running the script:

$ nano process.sh
	#!/bin/bash
    echo $newvar

You can assign a command’s output to a variable to use in another command (this is called command substitution) – we’ll see this later when we play with ‘find’ command.

$ printenv    # print all declared variables
$ env         # same
$ unset myvar   # unset a variable

Question `using a variable inside a string`

var="sun"
echo $varshine
echo ${var}shine
echo "$var"shine

Question `variable manipulation`

myvar="hello"
echo $myvar
echo ${myvar:offset}
echo ${myvar:offset:length}
echo ${myvar:2:3}    # 3 characters starting from character 2
echo ${myvar/l/L}    # replace the first match of a pattern
echo ${myvar//l/L}   # replace all matches of a pattern

Environment variables are those that affect the behaviour of the shell and user interface:

$ echo $HOME
$ echo $PATH
$ echo $PWD
$ echo $PS1

It is best to define custom environment variables inside your ~/.bashrc file. It is loaded every time you start a new shell.

Question 22

Play with variables and their values. Change the prompt, e.g. PS1="\u@\h \w> ".

You can watch a video for this topic after the workshop.

Functions

Functions are similar to scripts, but there are some differences. A bash script is an executable file sitting at a given path. A bash function is defined in your environment. Therefore, when running a script, you need to prepend its path to its name, whereas a function – once defined in your environment – can be called by its name without a need for a path. Both scripts and functions can take command-line arguments.

A convenient place to put all your function definitions is ~/.bashrc file which is run every time you start a new shell (local or remote).

Like in any programming language, in bash a function is a block of code that you can access by its name. The syntax is:

functionName() {
  command 1
  command 2
  ...
}

Inside functions you can access its arguments with variables $1 $2 … $# $@ – exactly the same as in scripts. Functions are very convenient because you can define them inside your ~/.bashrc file. Alternatively, you can place them into a file and then source them whenever needed:

$ source allMyFunctions.sh

Here is our first function:

greetings() {
  echo hello
}

Let’s write a function ‘combine()’ that takes all the files we pass to it, copies them into a randomly-named directory and prints that directory to the screen:

combine() {
  if [ $# -eq 0 ]; then
    echo "No arguments specified. Usage: combine file1 [file2 ...]"
    return 1        # return a non-zero error code
  fi
  dir=$RANDOM$RANDOM
  mkdir $dir
  cp $@ $dir
  echo look in the directory $dir
}

Question `swap file names`

Write a function to swap two file names. Add a check that both files exist, before renaming them.

Question `archive()`

Write a function archive() to replace directories with their gzipped archives.

$ ls -F
chapter1/  chapter2/  notes/
$ archive chapter* notes/
$ ls
chapter1.tar.gz  chapter2.tar.gz  notes.tar.gz

Question `countfiles()`

Write a function countfiles() to count files in all directories passed to it as arguments (need to loop through all arguments). At the beginning add the check:

    if [ $# -eq 0 ]; then
        echo "No arguments given. Usage: countfiles dir1 dir2 ..."
        return 1
    fi

You can watch a video for this topic after the workshop.

Scripts in other languages

As a side note, it possible to incorporate scripts in other languages into your bash code, e.g. consider this:

function test() {
    randomFile=${RANDOM}${RANDOM}.py
    cat << EOF > $randomFile
#!/usr/bin/python3
print("do something in Python")
EOF
    chmod u+x $randomFile
    ./$randomFile
    /bin/rm $randomFile
}

Here EOF is a random delimiter string, and << tells bash to wait for the delimiter to end input. For example, try the following:

cat << the_end
This text will be
printed in the terminal.
the_end