- Review shell commands
- Review HPC concepts
Let’s look at some of the basic HPC concepts with the first few slides in this slide deck, then we will come back and get some practice.
Connecting to a login node on O2
Type in the following command with your username to login:
You will receive a prompt for your password, and you should type in your associated password; note that the cursor will not move as you type in your password.
A warning might pop up the first time you try to connect to a remote machine, type “Yes” or “Y”.
Once logged in, you should see the O2 icon, some news, and the command prompt:
sshstands for secure shell. All of the information (like your password) going between your computer and the O2 login computer is encrypted when using
A “node” on a cluster is essentially a computer in the cluster of computers.
A login node’s only function is to enable users to log in to a cluster, it is not meant to be used for any actual work/computing.
Connecting to a compute node on O2
There are multiple ways to connect with, and do work on, a compute node; a compute node is where all work should be performed. To connect to a compute node, users have to interact with a job scheduler like slurm using commands like
sbatch, and by specifying what resources they require.
sruncommand with a few mandatory parameters will create an “interactive session” on O2. This is essentially a way for us to do work on the compute node directly from the terminal. If the connectivity to the cluster is lost in the middle of a command being run that work will be lost in an interactive session.
sbatchcommand with a few mandatory parameters + a specialized shell script will result in the script being run on a compute node. This “job” will not be accessible directly from the Terminal and will run in the background. Users do not need to remain connected to the cluster when such a “batch job” is running.
You will get practice with running batch jobs, for now we are going to start an interactive session on O2 using
$ srun --pty -p interactive -t 0-8:00 --mem 1G --reservation=HBC2 /bin/bash
In the above command the parameters we are using are requesting specific resources:
--pty- Start an interactive session
-p interactive- on the “partition” called “interactive” (a partition is a group of computers dedicated to certain types of jobs, interactive, long, short, high-memory, etc.)
-t 0-8:00- time needed for this work: 0 days, 8 hours, 0 minutes.
--mem 1G- memory needed - 1 gigabyte
--reservation=HBC2- this is only for this workshop, make sure you don’t use it in the future with your own accounts
/bin/bash- You want to interact with the compute node using the bash shell
These resources are listed slightly differently in the specialized script that is submitted directly using
sbatch. We will be reviewing the arguments above and what that specialized script looks like at the end of this lesson.
Make sure that your command prompt is now preceded by a character string that contains the word “compute”:
Copying example data folder
Your accounts were erased after the command-line workshop, so we are starting fresh this time, let’s copy over the same data folder we worked with in the shell workshop to our home directories:
$ cp -r /n/groups/hbctraining/unix_lesson/ .
- In the above command, what does the
.at the end mean? Is it essential?
- Why did we have to run the
- Is the path to the
unix_lesson/directory a “full” path or a “relative” path?
Reviewing shell commands
We are going to start this review with more exercises, this time hands on! Remember, there are likely multiple ways to do the same thing and we will try to cover at least a few.
- Change directory into the
- Use the
treecommand to get a directory structure of
- Take a quick look at the
unix_lesson/, without changing directories.
- Move up to your home directory (parent of
- With a single command change directories to the
- With a shortest possible command change directories back to the home directory.
- What does the
~in the command prompt mean?
- What is the full path to your home directory?
- List, in long listing format, the contents of
/n/groups/hbctraining/intro_rnaseq_hpc/full_dataset/using tab completion.
- Modify the above command using the
*wildcard to only list those files that have “oe” in their names.
- How many and which commands have you run so far today?
Loops and shell scripts
- Use the
forloop to iterate over each FASTQ file in
~/unix_lesson/raw_fastq/and do the following:
- Print the name of the current file
- Dump out the first 40 lines into a new file that will be saved in
- Place the above
forloop into a shell script using
vimand run it.
- Display the contents of the
- Use the
whichcommand to check where the executable file for the
pwdcommand lives in the directory structure.
- How does shell know where to find the executable file for the
- Display the contents of the variable that stores the various paths to folders containing executable command files.
- Can you run the
bowtie2command? What do you think you might need to do to run this command?
- Load the
- Load the
- List the modules that are loaded.
Some setting up for the rest of the workshop
Add a path to
We need to use one tool that is unavailable as a module on O2, but it is available in a folder on O2, so we are going to add it to our $PATH. If we just add it using the
export command, it will only be available to us in this specific interactive session. However, if we place that export command in a script that is run everytime a new interactive session is started, it is more efficient.
- Add the following line at the end of the file
- Save and quit out of
Resources on O2 and asking Slurm for them
Finally, let’s review some of the information for O2 and slurm in the rest of the slides
This lesson has been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.