Data Structures

Python programming
Data structures
Lists
Dictionaries

This lesson covers core Python data structures, focusing on lists, dictionaries and strings. This lesson will also demonstrate how to index, slice, modif, and nest them for data wrangling.

Authors

Noor Sohail

Will Gammerdinger

Published

March 16, 2026

Keywords

Indexing, Slicing, Mutable objects, Strings

Approximate time: 65 minutes

Learning Objectives

In this lesson, we will:

  • Create lists and dictionaries
  • Access elements in lists with slicing and indexing
  • Access values in dictionaries with keys

Overview of lesson

Real data comes in many shapes: a list of sample IDs, a mapping of species to genome lengths or a sentence of free text. Choosing the right structure for each of these makes your code cleaner and sometimes even more efficient under the hood. In this lesson, you will learn how to use each structure so that you can represent your own data in effective ways.

Data Structures

There exist several more types of variables that are commonly referred to as data structures. They can contain multiple values and are more complex than the data types we have discussed so far.

Table 1: Python data structures.
Data Structures What it is
list An ordered, changeable collection of elements.
dictionary A collection of key–value pairs.
tuple An ordered, unchangeable (immutable) collection of elements.
set An unordered collection of unique elements.

We will expand further on lists and dictionaries in this lessons.

Lists

Lists are a data structure that can be appear a bit daunting at first, but are very useful. A list is a data structure that can hold any number of any types of other data structures.

A common analogy for a list is to imagine a box with different compartments that hold a single item each; each item in a compartment in a list is called an element. Each element consists of a single value and there is no limit to how many elements you can have in a list. A list is assigned to a single variable, because regardless of how many elements it contains, in the end it is still a single entity (box).

Lists are useful for making iterative processes more efficient. For example, if you have a task that you want to repeat multiple times, you can put all the elements you want to perform the task on in a list and then perform the task on the list instead of each item individually. This is one of the greatest strengths of using a programming language, automating repetitive tasks!

Creating a List

To create a list, we use the [] keys. We can place all the elements we wish to include in the list within the brackets, separating them with commas. There are no limits on the number of values that can be included in a list and they can be of any data type.

[1, 6, 7]
[1, 6, 7]

If we inspect the type(), we can see that it is a list data structure:

type([1, 6, 7])
list

We are not limited to including just one data type in a list. We can include any number of any data types in a list:

list1 = [1, "hello", 3.14, True]
list1
[1, 'hello', 3.14, True]

Length of a List

Let us create a list of genome lengths in kilobases, which we will call glengths. The species corresponding to these genome lengths are ecoli, human, and corn. We will create two lists, one for the genome lengths and one for the species:

It is often useful to know how many elements are in a list. We can use the len() function to determine the number of elements in a list. The len() function needs a single argument, which in this case is the list we want to determine the length of. For example, to determine how many elements are in our glengths and species lists, we would use the following code:

# Create a numeric list as an object called glengths
glengths = [4.6, 3000, 50000]

# Create a list of species and assign it to a variable called species
species = ["ecoli", "human", "corn"]

# Calculate the number of elements in glengths and species
print(len(glengths))
print(len(species))
3
3

Adding Elements to a List

If we find a new genome and we want to add its length to our list of genome lengths. We can use the append() function to add an element to the end of the list. The append() function takes a single argument, which is the element we want to add to the list.

For example, if we wanted to add the length of the yeast genome, which is 12,000 kb, to our list of genome lengths, we would use the following code:

# Append a new genome length to glengths
glengths.append(12000)

# Print the updated list
print(glengths)
[4.6, 3000, 50000, 12000]

Notice that we used append() to add an element to the list, but we did not assign it to a new variable. This is because the append() function modifies the original list in place without a need to assign it to a new variable. Functions that modify the original object without creating a new object are called in-place functions.

Additionally, notice that we used the syntax glengths.append(12000) instead of append(glengths, 12000). This is because append() is a method that belongs only to the list object, while len() is a built-in function that can be used on many different types of objects. Essentially, since append() is a method that is specific to lists, we use the syntax list.method(). As len() is a function that can be used on many different types of objects, we use the syntax function(object).

Now that we have added a new genome length to glengths, we can see that there are 4 elements in our glengths list:

# Calculate the number of elements in glengths after appending a new element
len(glengths)
4

Indexing

There may be times when we want to access a specific element in a list. We can do this via indexing. Python uses what is called zero-based indexing, which means that the first element of a list is accessed with index 0, the second element with index 1, and so on.

Table 2: Indices of list species, showing how the first element is indexed at zero.
Index Value
0 ecoli
1 human
2 corn

We can access individual elements of a list using square brackets [] and the index of the element we want to access following the list name. For example, to access the first element of the species list, we would use species[0].

# Access the first element of species
species[0]
'ecoli'

Now if we wanted to access the second element of the species list, we would use species[1]. This can be a bit confusing at first, but will become easier with practice.

# Access the second element of species
species[1]
'human'

If you have used R before, you may be used to one-based indexing, which means that the first element of a list is accessed with index 1. As you have now seen, in Python, the first element is accessed with index 0. So when you go back and forth between R and Python, it is important to remember that the indexing is different in the two languages.

When you want to access the last element of a list, you can use the index -1. This is a special index that always refers to the last element of the list, regardless of how many elements are in the list. For example, in order to access the last element of the glengths list, we would use glengths[-1].

# Access the last element of glengths
glengths[-1]
12000

We should see the 12,000 kb genome length that we added in the previous section with the append() function.

  1. How would you access the fourth element of the species list and what is it?
  2. Add "yeast" to the species list and print the updated list.

Nested Lists

We can even have lists of lists! For example, we could have a list that contains both our glengths and species lists:

# Create a nested list that contains both glengths and species
nested_list = [glengths, species]
nested_list
[[4.6, 3000, 50000, 12000], ['ecoli', 'human', 'corn', 'yeast']]

If we wanted to access the first element of the nested_list, we would use nested_list[0], which would give us the glengths list. If we wanted to access the second element of the nested_list, we would use nested_list[1], which would give us the species list.

# Access species inside nested_list
nested_list[1]  
['ecoli', 'human', 'corn', 'yeast']

But perhaps we wanted to access the second element of the species list, which is human. To do this, we would first access the species list with nested_list[1], and then access the second element of that list with [1]. So the code to access human would be nested_list[1][1].

# Access the second element of species inside nested_list
nested_list[1][1]
'human'

This stucture is actually quite similar to how matrices are structured (we will discuss how to utilize matrices with numpy in a future lesson). A matrix has rows and columns, and you can access individual elements by specifying the row and column indices. In a nested list, you have lists within lists, and you can similarly access elements by specifying the indices of the outer list and then the inner list.

Accessing Multiple Elements in a List

While we can directly access individual elements in a list with indexing, we may want to access multiple elements at once. We can do this with slicing.

Slicing

Slicing is a way to access a range of elements in a list. We can use slicing to access a subset of the elements in a list. The syntax for slicing is list[start:stop], where start is the index of the first element we want to access and stop is the index of the first element we do not want to access.

So if we wanted to access the first three elements we would call species[0:3].

# Create a slice to get the first three elements of species
species[0:3]
['ecoli', 'human', 'corn']

If you wanted to access from start all the way to the end of the list, you can omit the stop index. For example, if we wanted to access from the second element to the end of the list, we would use species[1:].

# Create a slice to get all elements from the second element to the end
species[1:]
['human', 'corn', 'yeast']
  1. How might you access the last two elements of the species list using slicing?

Hint: Recall that we can use negative indexing to access elements from the end of the list.

  1. You can actually slice with steps using the syntax list[start:stop:step]. This allows you to access every step-th element in the range from start to stop. How would you access every other element in the species list using slicing?

Strings are Lists!

Earlier, we talked about strings as a data type that is used to represent text. However, strings are actually a special type of list! Each character in a string is an element (of the data type character) in the list, and we can access them with indexing and slicing just like we would with any other list. This is one of many ways to access a substring of a string.

For example, if we have a string s = "hello", we can access the first character with s[0], which would give us h.

# Create a slice to get the first three characters of a string
s = "hello"
s[0:3]
'hel'

We can also slice the string to get a subset of the characters. For example, s[1:4] would give us ell.

# Create a slice to get characters from index 1 to index 3
s[1:4]
'ell'

Dictionaries

Dictionaries are another data structure that are similar to a list. However, instead of being an ordered collection of elements, a dictionary is a collection of key:value pairs. Each key in a dictionary is unique and must be unique and maps to a value. This means instead of accessing elements by their integer index, we access values in a dictionary by their keys.

So let us begin by creating a dictionary that contains the species names as keys and their corresponding genome lengths as values. This can be a more intuitive way to organize our data, as in this case we can access the genome length for a specific species by using the species name as the key instead of using the index.

Creating a Dictionary

We can create a dictionary using curly braces {} and separating the keys and values with colons :. For example, we can create a dictionary called genome_dict like this:

# Create a dictionary with species as keys and genome lengths as values
genome_dict = {
    "ecoli": 4.6,
    "human": 3000,
    "corn": 50000,
    "yeast": 12000
}
genome_dict
{'ecoli': 4.6, 'human': 3000, 'corn': 50000, 'yeast': 12000}

We can represent our dictionary in a table similarly to how we did with the list, but this time we will use keys instead of indices.

Table 3: Dictionary, where keys are species name and values are length of the species’ genome in kilobases.
Key Value
ecoli 4.6
human 3000
corn 50000
yeast 12000

And if we investigate the type() of genome_dict, we can see that it is a dictionary data structure:

type(genome_dict)
dict

Accessing keys and values

In a dictionary, we can access the keys and values separately. We can use the keys() method to get a list of all the keys in the dictionary, and the values() method to get a list of all the values in the dictionary. For example, to get the keys and values of our genome_dict, we would use the following code:

# Get the keys of the dictionary
print(genome_dict.keys()) 
# Get the values of the dictionary
print(genome_dict.values())
dict_keys(['ecoli', 'human', 'corn', 'yeast'])
dict_values([4.6, 3000, 50000, 12000])

This is how we grab all the keys and values in a dictionary. However, if we want to access the value for a specific key, we can use the syntax dict[key]. For example, to get the genome length for human, we would use genome_dict["human"].

# Access the value for the key "human"
genome_dict["human"]
3000
  1. How would you access the genome length for corn in the genome_dict dictionary?

Next Lesson >>

Back to Schedule

Reuse

CC-BY-4.0