Skip to the content.

Learning Objectives

Logging into O2

For this workshop we will be using our own personal accounts to log into O2. If you do not have your own personal account, you can use this workshop as a demo and/or bookmark these materials for future reference. The O2 cluster is managed by the HMS Research Computing (HMS-RC) team and this workshop is created in collaboration with them.

If you are interested in getting your own personal account on O2, please follow the instructions provided here after this module.

Let’s get started with the hands-on component by typing in the following command to log into our command-line:

ssh username@o2.hms.harvard.edu

You will receive a prompt for your password, and you should type in your associated password.

Note: that the cursor will not move as you type in your password.

A warning might pop up the first time you try to connect to a remote machine, type Yes or Y, then hit Enter/Return.

You will be given the option for how you would like to two-factor authenticate:

Duo two-factor login for <username>

Enter a passcode or select one of the following options:

 1. Duo Push to XXX-XXX-XXXX
 2. Phone call to XXX-XXX-XXXX
 3. SMS passcodes to XXX-XXX-XXXX

Passcode or option (1-3): 

Select either 1, 2 or 3 depending on your preferred method to authenticate and hit Enter/Return.

Once logged in, you should see the O2 icon, some news, and the command prompt, e.g. [username@login01 ~]$.

Note 1: ssh stands for secure shell. All of the information (like your password) going between your computer and the O2 login computer is encrypted when using ssh.

Logging into an interactive node

Now that we have logged in to the login node, you will need to start an interactive session. A login node’s primary function is to enable users to log in to a cluster, it is not meant to be used for any actual work/computing. Since we will be doing some work, let’s get on to a compute node:

$ srun --pty -p interactive -t 0-3:00 --mem 1G  /bin/bash

Make sure that your command prompt is now preceded by a character string that contains the word compute.

scratch space

During the course of your analyses, you might find that you you create many large intermediate files, such as SAM files. Oftentimes, these files are purely intermediary, but can take up lots of space on the cluster (>100Gb each). These intermediate files can quickly fill your allotted space on the cluster and therefore it is recommended that you utilize “scratch” space on the cluster.

What is scratch?

scratch space on the cluster is much like scratch paper that you may use on the exam. It is a space when you can do your “scratch” work. The files on scratch are not backed up and will be deleted in 45 days. However, you can be allocated ~25TB of space which is great for intermediate large files. We will be using the scratch space extensively today, but we will not be using it for large files for the sake of not needlessly consuming space on the cluster.

Creating your own scratch space

While on the login node, we will create our space on /n/scratch/. In order to do so, we will need to run a script provided by the HMS Research Computing team. You MUST be on a login node in order to create a space on /n/scratch3, so we will actually exit our interactive node, even though that’s usually where we want to perform any computing work.

 exit # if you are on an interactive node, you need to exit to the login node
 sh /n/cluster/bin/scratch_create_directory.sh 

It will prompt you with the following:

Do you want to create a scratch directory under /n/scratch/users? [y/N]> 

To this you will respond y, then hit Enter/Return.

Next, it will prompt you with:

By typing 'YES' I will comply with HMS RC guidelines for using Scratch.
I also confirm that I understand that files in my scratch directory WILL NOT BE BACKED UP IN ANY WAY.
I also understand that 45 DAYS after I last modify a given file or directory in my scratch directory,
it will be DELETED with NO POSSIBILITY of retrieval.

I understand HMS RC guidelines for using Scratch: 

Type YES, then hit Enter/Return.

It should return:

Your scratch directory was created at /n/scratch/users/<users_first_letter>/<username>.
This has a limit of 25TiB of storage and 2.5 million files.
You can check your scratch quota using the quota-v2 command.

Note: You might notice that your storage limit is 25TiB and might be confused by this unit. Generally speaking, you can think of a KiB =~ kB, MiB =~ MB, GiB =~ GB and TiB =~ TB. This nonmenclature comes from the difference that computers measure space in binary (base 2), while the prefixes are derived from a metric (base 10) system. So, a KiB is actually 1024 bytes worth of space, while a KB is 1000 bytes worth of space. The table below can help further demonstrate these differences.

Unit Size in Bytes Unit Size in Bytes
Kilobyte (kB) 1,0001 = 1,000 Kibibyte (KiB) 1,0241 = 1,024
Megabyte (MB) 1,0002 = 1,000,000 Mebibyte (MiB) 1,0242 = 1,048,576
Gigabyte (GB) 1,0003 = 1,000,000,000 Gibibyte (GiB) 1,0243 = 1,073,741,824
Terabyte (TB) 1,0004 = 1,000,000,000,000 Tebibyte (TiB) 1,0244 = 1,099,511,627,776
Petabyte (PB) 1,0005 = 1,000,000,000,000,000 Pebibyte (TiB) 1,0245 = 1,125,899,906,842,620

We can navigate to our newly created scratch space using this command:

cd /n/scratch/users/${USER:0:1}/${USER}/

Writing to Scratch

Just like any other storage area on O2, we can copy data to scratch. For example, let’s copy some scripts over that we will be using in some later exercises:

cp -r /n/groups/hbctraining/sleep_scripts . 

Note: ${USER} is just an environment variable in bash that hold your username and ${USER:0:1} just some shorthand to get the first letter of your username.

Aliases and .bashrc profile

Now that we have created a space on scratch, let’s log back in to an interactive node:

srun --pty -p interactive -t 0-3:00 --mem 1G  /bin/bash

With the scratch folder in place, you might be interested in having a shortcut to getting there just like you have a shortcut to get to your home directory by using:

cd ~

where the ~ is a shorthand for your home directory.

This is where aliases can be very helpful. Aliases are shortcuts that you might employ to make common, long commands easier to use. Let’s go ahead and make an alias to help us change directories to our scratch directory.

alias cd_scratch='cd /n/scratch/users/${USER:0:1}/${USER}/'

The alias command let’s us make an alias, then we name our alias cd_scratch and then set the alias for what we want it to be shorthand for, cd /n/scratch/users/${USER:0:1}/${USER}/. Currently, we should be in our home directory, and we can confirm that with:

pwd

The return should look like /home/your_username. Now we can use our newly created alias to change directories to our scratch space:

cd_scratch

Now we should be able to see that we are within our scratch space by using:

pwd

And now it should say that we are within /n/scratch/users/<first_letter_of_username>/<username>/

This is great, but this alias is not saved anywhere but the currently computing node that you are using. If we exit the computing node with:

exit

And now we try our alias again:

cd_scratch

It will return:

-bash: cd_scratch: command not found

Ideally, we would like to find a way to save our aliases and that is one way we can use our .bashrc profile!

Before moving on, let’s log back into the interactive node again; it may be repetitive for this module, but it is best practice to work on an interactive node:

srun --pty -p interactive -t 0-3:00 --mem 1G  /bin/bash

.bashrc profile

Much like one might have a routine when coming home, like taking their shoes and jacket off, when you log onto your computer or log into any computing cluster, the computer will look for a file with a set of preferences that you have, called the .bashrc. This file is located in your home directory and is preceded by a .. This period means that it is a “hidden file”. You can you the -a option with ls in order to see all files:

# Back to home directory
cd ~

# List all files
ls -a

You can see that there are a number of these hidden files that are responsiable for various things. However, you will see one called .bashrc and this is the one that we will be adding some preferences too.

.bashrc versus .bash_profile

You might notice a file also called .bash_profile. .bash_profile is executed for login shells, while .bashrc is executed for interactive non-login shells. When you login (type username and password) to O2 the .bash_profile is executed. So if you want the alias available only when you login (for example an srun command to start an interactive session), you will want to put it in your .bash_profile.

So let’s open up our .bashrc using vim:

vim .bashrc

Enter insert mode by typing i, then we can add our alias anywhere in our .bashrc:

# Alias for navigating to our scratch space
alias cd_scratch='cd /n/scratch/users/${USER:0:1}/${USER}/'

Now, we can exit vim by pressing ESC and the typing : + w + q + Enter/return. Next, we need to do one of two things in order for our alias to work:

1) We could log out and log back into O2 and since we are “walking into our house”, O2 would automatically source this file, or 2) If we would like to stay logged in, we can run the source command:

source .bashrc

Now if we type:

cd_scratch

We can see that it take us to our scratch space, which we can confirm with:

pwd

Now you can create lots of aliases. For example, one that can be useful is:

# Requests an interactive node on O2 for 12 hours and 4GB of memory
alias o2i='srun --pty -p interactive -t 0-12:00 --mem 4G /bin/bash'

This alias allows the user to request an interactive job on O2 for 12 hours and allocate 4GB of memory. This way you don’t need to remember all of the option that you need for an interactive job. We can test out the alias o2i a little bit later in the lesson!

Let’s once again exit our interactive node:

exit

What else do people put in their .bashrc profile?

Placing aliases within a .bashrc file is quite common, but it isn’t the only thing that people often place within a .bashrc file. For example, some people will specifiy the location of their R libraries if they use R on the O2 cluster:

# DO NOT ADD TO BASHRC - EXAMPLE ONLY
# Example of how people might specify the location of their R packages on O2 
echo 'R_LIBS_USER="~/R-4.1.2/library"' >  $HOME/.Renviron
export R_LIBS_USER="~/R-VersionSelected/library"

Sometimes people will want to always have some software that they installed in their home directory availible to them, so they will add the path to that software’s bin directory to their $PATH variable:

# DO NOT ADD TO BASHRC - EXAMPLE ONLY
# Example of how to add a path to your $PATH variable
PATH=${PATH}:/home/${USER}/my_software_package/bin/
export PATH

These are just a few examples of items that one might commonly see in other people’s .bashrc profiles.

Now let’s try and get back on an interactive node using our alias:

source ~/.bashrc

o2i

Checking Quotas

Now that we have logged into an interactive node and created our scratch space on O2 let’s discuss how we can know how much space we are using in our various directories.

quota-v2

The first helpful command is unique to O2, but it does an excellent job summarizing a user’s disk usage on the cluster. This command is quota-v2 and it will outline your disk usage in your home directory, scratch directory and any groups that you belong to. Let’s try it out:

quota-v2

An example output might look like:

                   Active Compute usage for <USER> (As of 2024-05-06 09:00:00)                    
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓
┃ path                     ┃ type  ┃ username ┃ usage      ┃ storage limit ┃ last update         ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩
│ /home                    │ user  │ <USER>   │ 90.96 GiB  │ 100.00 GiB    │ 2024-05-06 09:10:03 │
├──────────────────────────┼───────┼──────────┼────────────┼───────────────┼─────────────────────┤
│ /n/app                   │ user  │ <USER>   │ 122.70 KiB │               │ 2024-05-06 09:10:02 │
├──────────────────────────┼───────┼──────────┼────────────┼───────────────┼─────────────────────┤
│ /n/data1/cores/bcbio     │ user  │ <USER>   │ 7.94 TiB   │ 140.00 TiB    │ 2024-05-06 09:00:00 │
│                          │ group │          │ 120.32 TiB │               │                     │
├──────────────────────────┼───────┼──────────┼────────────┼───────────────┼─────────────────────┤
│ /n/data1/cores/bpf-bcbio │ group │          │ 30.24 GiB  │ 1.00 TiB      │ 2024-05-06 09:00:00 │
├──────────────────────────┼───────┼──────────┼────────────┼───────────────┼─────────────────────┤
│ /n/groups/bcbio          │ group │          │ 1.94 TiB   │ 5.00 TiB      │ 2024-05-06 09:00:00 │
├──────────────────────────┼───────┼──────────┼────────────┼───────────────┼─────────────────────┤
│ /n/groups/hbctraining    │ user  │ <USER>   │ 61.62 GiB  │ 1.00 TiB      │ 2024-05-06 09:00:00 │
│                          │ group │          │ 962.09 GiB │               │                     │
└──────────────────────────┴───────┴──────────┴────────────┴───────────────┴─────────────────────┘
                            Scratch usage for <USER> (As of 2024-05-06 09:00:00)                             
┏━━━━━━━━━━━━┳━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓
┃ path       ┃ type ┃ username ┃ usage    ┃ used inodes ┃ storage limit ┃ inode limit ┃ last update         ┃
┡━━━━━━━━━━━━╇━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩
│ /n/scratch │ user │ <USER>   │ 2.11 TiB │ 131         │ 25.00 TiB     │ 2500000     │ 2024-05-06 09:08:48 │
└────────────┴──────┴──────────┴──────────┴─────────────┴───────────────┴─────────────┴─────────────────────┘

In this example, we can see that this user is using 90.98GiB of their allotted 100GiB within the home directory and 2.11TiB of their allotted 25TiB in their scratch space.

Note: If you have trouble with quota-v2 and getting an error message like: bash: quota-v2: command not found You need to make sure that /n/cluster/bin/ is within your $PATH variable. You can check your $PATH variable by:

# ONLY RUN IF YOUR quota-v2 IS NOT WORKING
echo $PATH

If it does not have /n/cluster/bin/, then you can add it with:

# ONLY RUN IF YOUR quota-v2 IS NOT WORKING
PATH=${PATH}:/n/cluster/bin/
export PATH

However, you will need to do this each time you log onto the cluster. The easier way will be to put the above commands into your .bashrc profile.

Summarizing storage limits with du

The quota-v2 command is a very useful summary of your storage usage and your storage limits. However, it comes with two caveats:

1) It is O2-specific. This command is something that the folks at HMS-RC have written to help us. If you are on other clusters, they may have a similar command or they may not, however it will almost certainly look different than this. If you plan on working on other computing clusters, then this command will not likely be very exportable.

2) It only gives the user a broad overview of how their disk usage allocation is being used. If you want to dig further into which directories/files within a given directory are taking up the most space then we will need to use a different command.

The du command stands for “disk usage” and it will traverse the directories and subdirectories of the directory that you are currently using located in and tell you the files sizes of the files. This command on it’s own has two drawbacks:

1) If you are within a directory that has large file system underneath it, it maybe take a while to run and it will be telling you the size (in bytes) of thousands of files, which will be difficult to sort through.

2) It is telling you the size in bytes, which is not always the most intuitative way to visualize data sizes

Fortunately, there are a few options within the du command that are extremely useful:

Let’s go ahead and try this a bit and look at the size usage of our current scratch directory:

du -h --max-depth=1 .

This should give you a good idea of what directories within your scratch directory are consuming the most space. Now let’s consider if we wanted to see how much space our scratch directory was using:

du -sh .

We can change the directory that we would like to run du on but providing it a path to the directory instead of using .. For example, we can summarize the of our home directory with:

du -sh /home/${USER}

For more information on options to use for du we recommend using the man command.

Retrieving a backup with .snapshot

When we discussed .bashrc profiles, we discussed hidden files. In addition to hidden files, O2 has a hidden directory, which you can’t even see with ls -a, that contains back-ups of your file system. These back-ups occur in only the /home/, /n/data1/, /n/data2/ and /n/groups/ file systems and these back-ups will occur everyday for the past 14 days and every week for the past 60 days. Importantly, these back-ups DO NOT occur in /n/scratch/.

In order to see what is in .snapshot, let’s go to our home directory and type:

ls .snapshot/

This should return 20 directories that look like:

o2_home_<date>_<time>

If we looking inside of one of these directories:

# Edit the data and time appropriate for your files
ls -lh .snapshot/o2_home_<date>_<time>/

This should look like what your home directory looked like at the data and time you selected.

If you have a file you’d like to recover, you can simply copy it to the present like:

# EXAMPLE CODE: DO NOT RUN
cp .snapshot/old_copy_of_file filename_to_be_brought_to_present

It can be noted that each directory within the file systems that have snapshot back-ups, so you can either travel, for example, from your home directory to the file that you are looking for in 3 subdirectories down within .snapshot or you can travel those three subdirectories down and then use .snapshot within that directory to retrieve a file.


Next Lesson »

Back to Schedule


This lesson has been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.