4  Working on the Farm

Learning Objectives
  • Use different software tools to work on a remote server: terminal, text editor and file transfer software.
  • Login to the farm and navigate its filesystems.
  • Edit scripts on the HPC using Nano.
  • Move files in and out of the farm using Filezilla or rsync/scp.

Useful tools for working on the farm or any remote HPC server. The terminal is used to login to the HPc and interact with it (e.g. submit jobs, navigate the filesystem). Nano is a text editor that can be used from the terminal. Visual Studio Code is an alternative text editor that has the ability to connect to a remote server so that we can edit scripts stored on the HPC. Filezilla is an FTP application, which can be used to transfer files between the HPC and your local computer.

4.1 Connecting to the HPC

All interactions with the farm happen via the terminal (or command line). To connect to the HPC we use the program ssh. The syntax is:

ssh your-hpc-username@hpc-address

To log onto the Sanger farm, you’ll need to add “-login” to the name of the HPC, using this command:

ssh sanger-username@farm5-login

The Sanger service desk has set up your laptop to assume your Sanger ID, so you can use the simplified command to access the farm:

ssh farm5-login

The first time you connect to an HPC, you may receive a message about the ECDSA key fingerprint. By typing yes you’ll add the ‘fingerprint’ of this HPC to your local computer’s saved list of approved hosts.

After running this ssh command and approving any ECDSA key questions, you will be asked for your Sanger password and after typing it you will be logged in to the farm.

We will be using a test server called gen3 in this course. It is a small HPC, similar in structure to the larger farm5, run by the Sanger for scientists to learn about the farm and test LSF scripts. Everyone with a Sanger ID has access to gen3, but farm5, the main HPC, is only accessible once you complete the Farm Induction course.

Login to HPC using the terminal. 1) Use the ssh program to login to gen3. 2) If prompted, approve ECDSA key fingerprint. 3) When you type the command you will be asked for your password. Note that as you type the password nothing shows on the screen, but that’s normal. 4) You will receive a login message and notice that your terminal will now indicate your HPC username and the name of the HPC server.

4.1.1 Exercise

Exercise 1 - Connecting to the HPC

Q1. Connect to gen3 using ssh

Q2. Take some time to explore your home directory to identify what files and folders are in there. Can you identify and navigate through the scratch (Lustre) and NFS directories?

Q3. Print the path to your home directory.

Q4. Create a directory called hpc_workshop in your own home directory.

Q5. Use the commands free -h (available RAM memory) and nproc --all (number of CPU cores available) to check the capabilities of the login node of our HPC. Check how many people are logged in to the HPC login node using the command who.

A1.

To login to the HPC we run the following from the terminal:

ssh USERNAME@gen3

(replace “USERNAME” with your Sanger username)

A2.

We can get a detailed list of the files on our home directory:

ls -l

Further, we can explore the NFS directory using:

ls /nfs

And check out the scratch folders available in lustre:

ls -l /lustre

A3.

To find the path of your home directory, move to it and then use the pwd command to print the entire path:

cd
pwd

It should be /nfs/users/nfs_[first_initial_of_username]/[username]

A4. Once we are in the home directory, we can use mkdir to create our workshop sub-directory:

cd
mkdir hpc_workshop

A5.

We run these commands to investigate how much memory and CPUs the login node that we connected to at the moment has. Usually, the login node is not very powerful, and we should be careful not to run any analysis on it.

To see how many people are currently on the login node we can combine the who and wc commands:

# pipe the output of `who` to `wc`
# the `-l` flag instructs `wc` to count "lines" of its input
who | wc -l

You should notice that several people are using the same login node as you. This is why we should never run resource-intensive applications on the login node of a HPC.

4.2 Editing Scripts Remotely

Most of the work you will be doing on a HPC is editing script files. These may be scripts that you are developing to do a particular analysis or simulation, for example (in Python, R, Julia, etc.). But also - and more relevant for this course - you will be writing shell scripts containing the commands that you want to be executed on the compute nodes.

There are several possibilities to edit text files on a remote server. A simple one is to use the program Nano directly from the terminal. This is a simple text editor available on most linux distributions, and what we will use in this course.

Although Nano is readily available and easy to use, it offers limited functionality and is not as user friendly as a full-featured text editor. You can use other more full-featured text editors from the command line such as vim, but it does come with a steeper learning curve. Alternatively, we recommend Visual Studio Code, which is an open-source software with a wide range of functionality and several extensions, including an extension for working on remote servers.

4.2.1 Nano

Nano text editor logo

To create a file with Nano you can run the command:

nano test.sh

This opens a text editor, where you can type the code that you want to save in the file. Once we’re happy with our code, we can press Ctrl+O to write our data to disk. We’ll be asked what file we want to save this to: press Enter to confirm the filename. Once our file is saved, we can use Ctrl+X to quit the editor and return to the shell.

We can check with ls that our new file is there.

Screenshot of the command line text editor Nano. In this example, we also included !#/bin/bash in the first line of the script. This is called a shebang and is a way to inform that this script uses the program bash to run the script.

Note that because we saved our file with .sh extension (the conventional extension used for shell scripts), Nano does some colouring of our commands (this is called syntax highlighting) to make it easier to read the code.

4.2.2 Exercise

Exercise 2 - Editing scripts
  1. Create a new script file called check_hostname.sh. Copy the code shown below into this script and save it.
  2. From the terminal, run the script using bash.
#!/bin/bash
echo "This job is running on:"
hostname

A1.

To create a new script in Nano we use the command:

nano check_hostname.sh

This opens the editor, where we can copy/paste our code. When we are finished we can click Ctrl+X to exit the program, and it will ask if we would like to save the file. We can type “Y” (Yes) followed by Enter to confirm the file name.

A2.

We can run the script from the terminal using:

bash test.sh

Which should print the result (your hostname might vary slightly from this answer):

This job is running on:
gen3-head1

(the output might be slightly different if you were assigned to a different login node of the HPC)

4.3 Summary

Key Points
  • The terminal is used to connect and interact with the HPC.
    • To connect to the HPC we use ssh username@remote-hostname.
  • Nano is a text editor that is readily available on HPC systems.
    • To create or edit an existing file we use the command nano path/to/filename.sh.
    • Keyboard shortcuts are used to save the file (Ctrl + O) and to exit (Ctrl + X).