Approximate time: 40 minutes
Learning Objectives
- Grant or restrict access to files on a multi-user UNIX system
- View “Environment Variables” in Shell
- Describe the $PATH variable and how to append it
Permissions
Unix controls who can read, modify, and run files by conferring permissions to files and directories. As you can imagine, on a shared system it is important to protect each user’s data. This is done by use of permissions.
In this lesson, we are going to learn more about how those permissions are set and how they can be modified.
To start, every file and directory on a Unix computer belongs to one owner (referred to as “user”) and one group. Along with each file’s content, the operating system stores the information about the user and group that own it, i.e. the “metadata” for a given file.
Users of a multi-user Unix system (i.e. the O2 cluster) can belong to any number of groups.
Let’s see what groups we all belong to. Type groups
into the command prompt.
$ groups
Depending on our affiliation, we all belong to at least a couple of groups. Since we are all using training accounts you will likely see the groups listed below:
- rc_training01
- training
- genomebrowser-uploads
- domain-users
The user-and-group model means that for each file/directory every user on the system falls into one of three categories:
user
oru
: the ownergroup
org
: a member of the group the file/directory belongs toothers
oro
: everyone else
For each of these three categories, the computer keeps track of whether people in that category can read the file (r
), write to the file (w
), or execute the file (i.e., run the program written in it) (x
). More about this aspect of permissions is coming up later in this lesson.
Let’s look at this model in action by running the command ls -l /n/groups/hbctraining/
, to list the files in that directory:
$ ls -l /n/groups/hbctraining/
total 30G
drwxrwsr-x 4 mm573 hbctraining 831 Feb 29 2016 bcbio-rnaseq
drwxrwsr-x 14 mm573 hbctraining 382 Jul 10 17:03 chip-seq
-rw-r--r-- 1 root hbctraining 0 Apr 5 2015 copy_me.txt
drwxrwsr-x 3 rsk27 hbctraining 201 Apr 5 2015 exercises
drwxrwsr-x 6 rsk27 hbctraining 293 Oct 27 2017 for_chipseq
drwxrwsr-x 14 mm573 hbctraining 494 May 21 2018 intro_rnaseq_hpc
.
.
.
.
As we have learned, the -l
flag tells ls
to give us a long-form listing. Let’s take another look at the columns in this output, starting from the right side moving left.
- File/directory names
- Times and dates last modified. Backup systems and other tools use this information in a variety of ways, but you can use it to tell when you (or anyone else with permission) last changed a file.
- File size in bytes.
- Name of the group that owns the file.
- User name of the file’s owner.
- File’s number of hard links (not important for this class).
- Permissions to the file
NOTE: When listing the contents of a directory using the
ls -l
command, the file size reported for directories do not reflect the size of the data inside it. The number actually represents the size of space on the disk that is used to store the metadata for the directory.The command you’ll want to use to get the size of a directory’s contents is
du -sh
.du
is short for “disk usage”.
Let’s list the contents of the unix_lesson
directory:
ls -l ~/unix_lesson/
drwxrwxr-x 2 rc_training01 rc_training01 78 Oct 6 10:57 genomics_data
drwxrwxr-x 2 rc_training01 rc_training01 73 Oct 6 10:57 other
drwxrwxr-x 5 rc_training01 rc_training01 302 Oct 6 11:53 raw_fastq
-rw-rw-r-- 1 rc_training01 rc_training01 377 Oct 6 10:57 README.txt
drwxrwxr-x 2 rc_training01 rc_training01 62 Oct 6 10:57 reference_data
Who is the owner of the files in this directory? Which group do the files belong to?
Basically, O2 has you (your account ID) listed both as an owner and a group, and this is usually the assignment for the files and folders in your personal directory. Essentially, when a new user is created on a Unix system, a group of the same name is created and personal files for that user are also “owned” by that user’s group.
Interpreting the permissions string
Let’s have a closer look at one of those permission strings in the first column for the README.txt
file:
-rw-rw-r--
- The first character indicates the type of file. Among the different types, a leading dash (
-
) means a regular file, while ad
indicates a directory.
In our case, it is
-
which means README.txt is a regular file.
- The next 9 characters are usually some combination of
r
,w
andx
, where:
r = read permission
w = write/edit permission
x = execute permission (run a script/program or traverse a directory).
The first triplet is the permissions for the file’s owner (u
). Here, the owner can read and write the file: rw-
. If the permission is turned off, we see a dash, so rw-
means “read and write, but not execute”.
rw-
The second triplet shows us the group’s permissions (g
). Here, the group can read and write the file. (In this case the group and the owner are the same so it makes sense that this is the same for both.)
rw-
The final triplet shows us what everyone else (o
) can do. In this case, it’s r--
, so everyone else on the system can only read the file’s contents.
“Everyone” else refers to other users on the system who are not the file’s owner, or in the group that the file’s belongs to.
r--
The execute permissions
We don’t see the execute permission set here since we are not working with executable files. To see an example of a file that is actually executable, try
ls -l /bin/ls
.Sometimes the
x
is replaced by another character, but it is beyond the scope of today’s class. You can get more information here, if you are interested.
Is the permissions string interpreted in the same way for directories?
If we take a look at the permissions for directories (e.g. drwxrwsr-x
): the x
for the permissions here indicates that “execute” is turned on. What does that mean, given that a directory isn’t a program or an executable file, we can’t “execute” it?
Well, x
means something different for directories. It gives someone the right to traverse the directory, but not to look at (or list) its contents. This is beyond the scope of today’s class, but note that you can give someone access to a file that’s deep inside a directory structure without allowing them to see what other files exist in the sub-directories which are part of the path.
Changing permissions
To change permissions, we use the chmod
command (whose name stands for “change mode”). The arguments we provide chmod
include:
- Whose permissions are we changing? (“user” u, “group” g, or “other” o)
- Are we adding permissions (+) or removing permissions (-)?
- Which permissions (or combination of) would we like to add/remove? (“read” r, “write” w, and “execute” x)
Let’s make our README.txt file inaccessible to all users other than you and the group the file belong to. Currently, everyone else is able to read the file.
$ ls -l ~/unix_lesson/README.txt
-rw-rw-r-- 1 rc_training01 rc_training01 377 Oct 6 10:57 ~/unix_lesson/README.txt
$ chmod o-r ~/unix_lesson/README.txt # the "-" after o denotes removing that permission
$ ls -l ~/unix_lesson/README.txt
-rw-rw---- 1 rc_training01 rc_training01 377 Oct 6 10:57 ~/unix_lesson/README.txt
The o
signals that we’re changing the privileges of “others” which also represents “everyone else” as we have referred to throughout this lesson.
Let’s change it back to allow it to be readable by others:
$ chmod o+r ~/unix_lesson/README.txt # the "+" after o denotes adding/giving that permission
$ ls -l ~/unix_lesson/README.txt
-rw-rw-r-- 1 rc_training01 rc_training01 377 Oct 6 10:57 /home/rsk27/unix_lesson/README.txt
If we wanted to make this an executable file for ourselves (the file’s owners) we would say chmod u+x
, where the u
signals that we are changing permission for the file’s owner. To change permissions for the “group”, you’d use the letter g
, e.g. remove write permissions for the group with chmod g-w
.
The fact that something is marked as executable doesn’t actually mean it contains or is a program of some kind. We could easily mark the
~/unix_lesson/raw_fastq/Irrel_kd_1.subset.fq
file as executable usingchmod
. Depending on the operating system we’re using, trying to “run” it will fail (because it doesn’t contain instructions the computer recognizes, i.e. it is not a script of some type).
Exercise
If ls -l myfile.php
returns the following details:
-rwxr-xr-- 1 caro zoo 2312 2014-10-25 18:30 myfile.php
Which of the following statements is true?
- members of caro (a group) can read, write, and execute myfile.php
- members of zoo (a group) cannot execute myfile.php
-
caro (the owner) can read, write, and execute myfile.php
Answer
The third statement is true.
Environment Variables
Environment variables are, in short, variables that describe the environment in which programs run, and they are predefined for a given computer or cluster that you are on. You can reset them to customize the environment.
Let’s see the full list of environment variables on O2:
$ env
It’s a pretty long list! In the context of the shell the environment variables are usually all in upper case.
In this lesson, we are going to focus on two most commonly encountered environment variables: $HOME
and $PATH
.
$HOME
defines the full path for the home directory of a given user.$PATH
defines a list of directories to search in when looking for a command/program to execute.
Environment variables, in most systems, are called or denoted with a “$” before the variable name, just like a regular variable. Let’s use the echo
command to see what is stored in $HOME
:
$ echo $HOME
You should see the path to your home directory. $HOME
can be used instead of the ~
(if you want to type 4 more characters).
$HOME
is pretty straightforward, how about we take a look at what is stored in the $PATH
variable:
$ echo $PATH
/n/cluster/bin:/opt/singularity/bin:/usr/local/rvm/gems/ruby-2.4.9/bin:/usr/local/rvm/gems/ruby-2.4.9@global/bin:/usr/local/rvm/rubies/ruby-2.4.9/bin:/n/cluster/bin:/opt/singularity/bin:/usr/local/bin:/usr/bin:/opt/puppetlabs/bin:/usr/local/rvm/bin:/usr/local/sbin:/usr/sbin:/home/rc_training01/.local/bin:/home/rc_training01/bin
This output is a lot more complex! Let’s break it down. When you look closely at the output of echo $PATH
, you should a list of full paths separated from each other by a “:”.
Here is the list of paths in a more readable format:
/n/cluster/bin
/opt/singularity/bin
/usr/local/rvm/gems/ruby-2.4.9/bin
/usr/local/rvm/gems/ruby-2.4.9@global/bin
/usr/local/rvm/rubies/ruby-2.4.9/bin
/n/cluster/bin:/opt/singularity/bin
/usr/local/bin
/usr/bin
/opt/puppetlabs/bin
/usr/local/rvm/bin
/usr/local/sbin/usr/sbin
/home/rc_training01/.local/bin
/home/rc_training01/bin
Each of these paths are referring to a directory, in this case a lot of them are named bin
.
What are all these paths? And what do they represent?
These are the directories that the shell will look through (in the same order as they are listed) for any given command or executable file that you type on the command prompt.
For example, we have been using the ls
command to list contents in a directory. When we type ls
at the command prompt, the shell searches through each path in $PATH
until it finds an executable file called ls
. So which of those paths contain that executable file?
For any command you execute on the command prompt, you can find out where the executable file is located using the which
command.
$ which ls
What path was returned to you? Does it match with any of the paths stored in $PATH
?
Try it on a few of the basic commands we have learned so far:
$ which <your favorite command>
$ which <your favorite command>
Check the path /usr/bin/
and see what other executable files you recognize. (Note that executable files will be listed as green text or have the *
after their name).
$ ls -lF /usr/bin/
The path /usr/bin
is usually where executables for commonly used commands are stored.
As pointed out earlier, a lot of the folders listed in the
$PATH
variable are calledbin
. This is because of a convention in Unix to call directories that contain all the commands (in binary format)bin
.
Exercise
Are the directories listed by the which
command within $PATH
?
Answer
It should be. For example, if you would like to check the directory of commandpwd
- the output for which pwd
is /usr/bin/pwd
, and /usr/bin
is within $PATH.
Modifying Environment Variables
You can modify the contents of the $PATH
environment variable with the export
command.
The export
command:
- Example
export PATH=$PATH:~/opt/bin
(do not run this) - The arguments or input to
export
should always include$PATH
- This specifies that you want to maintain the existing contents.
- If you don’t maintain all the
bin
directories, none of your commands will work anymore!
- Use the “:” to separate added paths from one another, with no spaces
- The new path being added, should not end in
/
. Even thoughls ~/opt/bin
andls ~/opt/bin/
give you the same results, the$PATH
variable cannot have the trailing/
. - Order matters -
- If you run
export PATH=$PATH:~/opt/bin
Shell will add the~/opt/bin
directory to the end of the pre-existing list within the$PATH
environment variable. - Alternatively, if you use
export PATH=~/opt/bin:$PATH
, the same directory will be added to the beginning of the list. The order determines which directory Shell will look in first to find a program.
- If you run
This command is often used to add paths to a directory with commands you commonly want to use.
Let’s say you often use the bowtie2
command for alignment and it exists in /home/rsk27/installations/alignment_tools/dna/bowtie/
.
If you want to run this tool, you will have to type:
$ /home/rsk27/installations/alignment_tools/dna/bowtie/bowtie2 <inputfile>
However, if /home/rsk27/installations/alignment_tools/dna/bowtie
is part of the $PATH
variable you can instead just type:
$ bowtie2 <inputfile>
Closer look at the inner workings of the shell, in the context of $PATH
Each time you log in to a cluster, or start a new interactive session on a compute node, 2 special shell scripts are run automatically in the background. You have the ability to modify these scripts, so if you want to customize your environment you can add the customizing commands to these shell scripts. One common use of this is to add an export
command that adds paths to the pre-existing contents of the $PATH
environment variable.
So, what are these files and where are they located? These are called .bashrc
and .bash_profile
, and they are located in your home directory. You can create them if they don’t exist, and Shell will use them!
Check what hidden files exist in our home directory using the -a
flag with the ls
command:
$ ls -al ~/
You can use
vim
to modify these files and/or create them.
In closing, permissions and environment variables, especially $PATH
, are very useful and important concepts to understand in the context of UNIX and HPC.
This lesson has been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- The materials used in this lesson were derived from work that is Copyright © Data Carpentry (http://datacarpentry.org/). All Data Carpentry instructional material is made available under the Creative Commons Attribution license (CC BY 4.0).
- Adapted from the lesson by Tracy Teal. Original contributors: Paul Wilson, Milad Fatenejad, Sasha Wood and Radhika Khetani for Software Carpentry (http://software-carpentry.org/)