Data Structures

Python programming

Data structures

Lists

Dictionaries

This lesson covers core Python data structures, focusing on lists, dictionaries and strings. This lesson will also demonstrate how to index, slice, modif, and nest them for data wrangling.

Authors

Noor Sohail

Will Gammerdinger

Published

March 16, 2026

Keywords

Indexing, Slicing, Mutable objects, Strings

Approximate time: 30 minutes

Learning objectives

In this lesson, we will:

Create lists and dictionaries
Access elements in lists with slicing and indexing
Access values in dictionaries with keys

Overview of lesson

Real data comes in many shapes: a list of sample IDs, a mapping of species to genome lengths, or a sentence of free text. Choosing the right structure for each of these makes your code cleaner and sometimes even more efficient under the hood. In this lesson, you will learn how to use each structure so that you can represent your own data in effective ways.

Data structures

There exist several more types of variables that are commonly referred to as data structures. They can contain multiple values and are more complex than the data types we have discussed so far.

Table 1: Python data structures.

Data Structures	What it is
list	An ordered, changeable collection of elements.
dictionary	A collection of key–value pairs.
tuple	An ordered, unchangeable (immutable) collection of elements.
set	An unordered collection of unique elements.

We will expand further on lists and dictionaries in this lessons.

Lists

Lists are a data structure that can be appear a bit daunting at first, but are very useful. A list can hold numerous data types in a single variable. This makes lists a very flexible data structure for storing and organizing data.

An analogy for a list is to imagine a drawer with different compartments that hold a single item each; each item in a compartment in a list is called an element. Each element consists of a single value and there is no limit to how many elements you can have in a list. A list is assigned to a single variable, because regardless of how many elements it contains, in the end it is still a single entity (drawer).

Lists are useful for making iterative processes more efficient. For example, if you have a task that you want to repeat multiple times, you can put all the elements you want to perform the task on in a list and then perform the task on the list instead of each item individually. This is one of the greatest strengths of using a programming language, automating repetitive tasks!

Creating a list

To create a list, we use the [] keys. We can place all the elements we wish to include in the list within the brackets, separating them with commas. There are no limits on the number of values or data types that can be included in a list.

[1, 6, 7]

[1, 6, 7]

If we inspect the type(), we can see that it is a list data structure:

type([1, 6, 7])

list

We are not limited to including just one data type in a list. We can include any number of any data types in a list:

list1 = [1, "hello", 3.14, True]
list1

[1, 'hello', 3.14, True]

Length of a list

Let us create a list of genome lengths in kilobases, which we will call glengths. The species corresponding to these genome lengths are: ecoli, human, and corn. We will create two lists, one for the genome lengths and one for the species:

# Genome lenghts in kilobases
glengths = [4.6, 3000, 50000]

# Species associated with glengths
species = ["ecoli", "human", "corn"]

It is often useful to know how many elements are in a list. We can use the len() function to determine the number of elements in a list. The len() function requires a single argument, the list of which we want to determine the length. For example, to determine how many elements are in our glengths and species lists, we would use the following code:

# Calculate the number of elements in glengths and species
print(len(glengths))
print(len(species))

3
3

Adding elements to a list

If we find a new genome and we want to add it to our list of genome lengths, we can use the append() function to add an element to the end of the list. The append() function takes a single argument, which is the element we want to add to the list.

For example, if we wanted to add the length of the yeast genome, which is 12,000 kb, to our list of genome lengths, we would use the following code:

# Append a new genome length to glengths
glengths.append(12000)

# Print the updated list
print(glengths)

[4.6, 3000, 50000, 12000]

Notice that we used append() to add an element to the list, but we did not assign it to a new variable. This is because the append() function modifies the original list in place without a need to assign it to a new variable. Functions that modify the original object without creating a new object are called in-place functions.

Additionally, notice that we used applied function to the end of the list object, rather than append(glengths, 12000). This is because append() is a method that belongs only to the list object, while len() is a built-in function that can be used on many different types of objects. Essentially, since append() is a method that is specific to lists, we use the syntax list.method(). As len() is a function that can be used on many different types of objects, we use the syntax function(object).

Now that we have added a new genome length to glengths, we can see that there are more elements than before.

# Calculate the number of elements in glengths after appending a new element
len(glengths)

Indexing

There may be times when we want to access a specific element in a list. We can do this via indexing. Python uses what is called zero-based indexing, which means that the first element of a list is accessed with index 0, the second element with index 1, and so on.

Table 2: Indices of list species, showing how the first element is indexed at zero.

Index	Value
0	ecoli
1	human
2	corn

We can access individual elements of a list using square brackets [] and the index of the element we want to access following the list name. For example, to access the first element of the species list, we would use species[0].

# Access the first element of species
species[0]

'ecoli'

Now if we wanted to access the second element of the species list, we would use species[1]. This can be a bit confusing at first, but will become easier with practice.

# Access the second element of species
species[1]

'human'

R is one-indexed

If you have used R before, you may be used to one-based indexing, which means that the first element of a list is accessed with index 1. As you have now seen, in Python, the first element is accessed with index 0. So when you go back and forth between R and Python, it is important to remember that the indexing is different in the two languages.

When you want to access the last element of a list, you can use the index -1. This is a special index that always refers to the last element, regardless of how many elements are in the list. For example, in order to access the last element of the glengths list, we would use glengths[-1].

# Access the last element of glengths
glengths[-1]

We should see the 12,000 kb genome length that we added in the previous section with the append() function.

Exercise 1

How would you access the fourth element of the species list and what is it?
Add "yeast" to the species list and print the updated list.

Nested lists

We can even have lists of lists! For example, we could have a list that contains both our glengths and species lists:

# Create a nested list that contains both glengths and species
nested_list = [glengths, species]
nested_list

[[4.6, 3000, 50000, 12000], ['ecoli', 'human', 'corn', 'yeast']]

If we wanted to access the first element of the nested_list, we would use nested_list[0], which would give us the glengths list. If we wanted to access the second element of the nested_list, we would use nested_list[1], which would give us the species list.

# Access species inside nested_list
nested_list[1]

['ecoli', 'human', 'corn', 'yeast']

But perhaps we wanted to access the second element of the species list, which is human. To do this, we would first access the species list with nested_list[1], and then access the second element of that list with [1]. So the code to access human would be nested_list[1][1].

# Access the second element of species inside nested_list
nested_list[1][1]

'human'

This stucture is actually quite similar to how matrices are structured (we will discuss how to utilize matrices with numpy in a future lesson). A matrix has rows and columns, and you can access individual elements by specifying the row and column indices. In a nested list, you have lists within lists, and you can similarly access elements by specifying the indices of the outer list and then the inner list.

Accessing multiple elements in a list

While we can directly access individual elements in a list with indexing, we may want to access multiple elements at once. We can do this with slicing.

Slicing

Slicing is a way to access a range of elements in a list. We can use slicing to access a subset of the elements in a list. The syntax for slicing is list[start:stop], where start is the index of the first element we want to access and stop is the index of the first element we do not want to access.

So if we wanted to access the first three elements we would call species[0:3].

# Create a slice to get the first three elements of species
species[0:3]

['ecoli', 'human', 'corn']

If you wanted to access from start all the way to the end of the list, you can omit the stop index. For example, if we wanted to access from the second element to the end of the list, we would use species[1:].

# Create a slice to get all elements from the second element to the end
species[1:]

['human', 'corn', 'yeast']

Exercise 2

How might you access the last two elements of the species list using slicing?

Hint: Recall that we can use negative indexing to access elements from the end of the list.

You can actually slice with steps using the syntax list[start:stop:step]. This allows you to access every step-th element in the range from start to stop. How would you access every other element in the species list using slicing?

Strings are lists!

Earlier, we talked about strings as a data type that is used to represent text. However, strings are actually a special type of list! Each character in a string is an element (of the data type character) in the list, and we can access them with indexing and slicing just like we would with any other list. This is one of many ways to access a substring of a string.

For example, if we have a string s = "hello", we can access the first character with s[0], which would give us h.

# Use an index to get the first character of a string
s = "hello"
s[0]

'h'

We can also slice the string to get a subset of the characters. For example, s[1:4] would give us ell.

# Create a slice to get characters from index 1 to index 3
s[1:4]

'ell'

Dictionaries

Dictionaries are another data structure that are similar to a list. However, instead of being an ordered collection of elements, a dictionary is a collection of key:value pairs. Each key in a dictionary must be unique and maps to a value. This means instead of accessing elements by their integer index, we access values in a dictionary by their keys.

So let us begin by creating a dictionary that contains the species names as keys and their corresponding genome lengths as values. This can be a more intuitive way to organize our data, as in this case we can access the genome length for a specific species by using the species name as the key instead of using the index.

Creating a dictionary

We can create a dictionary using curly braces {} and separating the keys and values with colons :. For example, we can create a dictionary called genome_dict like this:

# Create a dictionary with species as keys and genome lengths as values
genome_dict = {
    "ecoli": 4.6,
    "human": 3000,
    "corn": 50000,
    "yeast": 12000
}
genome_dict

{'ecoli': 4.6, 'human': 3000, 'corn': 50000, 'yeast': 12000}

We can represent our dictionary in a table similarly to how we did with the list, but this time we will use keys instead of indices.

Table 3: Dictionary, where keys are species name and values are length of the species’ genome in kilobases.

Key	Value
ecoli	4.6
human	3000
corn	50000
yeast	12000

And if we investigate the type() of genome_dict, we can see that it is a dictionary data structure:

type(genome_dict)

dict

Accessing `keys` and `values`

In a dictionary, we can access the keys and values separately. We can use the keys() method to get a list of all the keys in the dictionary, and the values() method to get a list of all the values in the dictionary. For example, to get the keys and values of our genome_dict, we would use the following code:

# Get the keys of the dictionary
print(genome_dict.keys()) 
# Get the values of the dictionary
print(genome_dict.values())

dict_keys(['ecoli', 'human', 'corn', 'yeast'])
dict_values([4.6, 3000, 50000, 12000])

This is how we grab all the keys and values in a dictionary. However, if we want to access the value for a specific key, we can use the syntax dict[key]. For example, to get the genome length for human, we would use genome_dict["human"].

# Access the value for the key "human"
genome_dict["human"]

Exercise 3

How would you access the genome length for corn in the genome_dict dictionary?

Next Lesson >>

Back to Schedule

Reuse

CC-BY-4.0

--- title: "Data Structures" description: | This lesson covers core Python data structures, focusing on lists, dictionaries and strings. This lesson will also demonstrate how to index, slice, modif, and nest them for data wrangling. author: - Noor Sohail - Will Gammerdinger date: "2026-03-16" categories: - Python programming - Data structures - Lists - Dictionaries keywords: - Indexing - Slicing - Mutable objects - Strings license: "CC-BY-4.0" editor_options: markdown: wrap: 72 --- ```{python} #| label: load_libraries_data #| echo: false # Load libraries and data ``` Approximate time: 30 minutes ## Learning objectives In this lesson, we will: - Create lists and dictionaries - Access elements in lists with slicing and indexing - Access values in dictionaries with keys ## Overview of lesson Real data comes in many shapes: a list of sample IDs, a mapping of species to genome lengths, or a sentence of free text. Choosing the right structure for each of these makes your code cleaner and sometimes even more efficient under the hood. In this lesson, you will learn how to use each structure so that you can represent your own data in effective ways. ## Data structures There exist several more types of variables that are commonly referred to as data structures. They can contain multiple values and are more complex than the data types we have discussed so far. Table: Python data structures. {#tbl-data_structures} | Data Structures | What it is | |------------|-----------------------------------------------------| | list | An ordered, changeable collection of elements. | | dictionary | A collection of key–value pairs. | | tuple | An ordered, unchangeable (immutable) collection of elements. | | set | An unordered collection of unique elements. | We will expand further on lists and dictionaries in this lessons. ## Lists Lists are a data structure that can be appear a bit daunting at first, but are very useful. A list can hold numerous data types in a single variable. This makes lists a very flexible data structure for storing and organizing data. An analogy for a list is to imagine a drawer with different compartments that hold a single item each; each item in a compartment in a list is called an **element**. Each element consists of a single value and there is no limit to how many elements you can have in a list. A list is assigned to a single variable, because regardless of how many elements it contains, in the end it is still a single entity (drawer). Lists are useful for making iterative processes more efficient. For example, if you have a task that you want to repeat multiple times, you can put all the elements you want to perform the task on in a list and then perform the task on the list instead of each item individually. **This is one of the greatest strengths of using a programming language, automating repetitive tasks!** ### Creating a list To create a list, we use the `[]` keys. We can place all the elements we wish to include in the list within the brackets, separating them with commas. There are no limits on the number of values or data types that can be included in a list. ```{python} #| label: list_example [1, 6, 7] ``` If we inspect the `type()`, we can see that it is a list data structure: ```{python} #| label: list_type_example type([1, 6, 7]) ``` We are not limited to including just one data type in a list. We can include any number of any data types in a list: ```{python} #| label: list_mixed_example list1 = [1, "hello", 3.14, True] list1 ``` ### Length of a list Let us create a list of genome lengths in kilobases, which we will call `glengths`. The species corresponding to these genome lengths are: _ecoli, human, and corn_. We will create two lists, one for the genome lengths and one for the species: ```{python} #| label: create_lists # Genome lenghts in kilobases glengths = [4.6, 3000, 50000] # Species associated with glengths species = ["ecoli", "human", "corn"] ``` It is often useful to know how many elements are in a list. We can use the `len()` function to determine the number of elements in a list. The `len()` function requires a single argument, the list of which we want to determine the length. For example, to determine how many elements are in our `glengths` and `species` lists, we would use the following code: ```{python} #| label: len_function # Calculate the number of elements in glengths and species print(len(glengths)) print(len(species)) ``` ### Adding elements to a list If we find a new genome and we want to add it to our list of genome lengths, we can use the `append()` function to add an element to the end of the list. The `append()` function takes a single argument, which is the element we want to add to the list. For example, if we wanted to add the length of the yeast genome, which is 12,000 kb, to our list of genome lengths, we would use the following code: ```{python} #| label: list_append # Append a new genome length to glengths glengths.append(12000) # Print the updated list print(glengths) ``` Notice that we used `append()` to add an element to the list, but we did not assign it to a new variable. This is because the `append()` function modifies the original list in place without a need to assign it to a new variable. Functions that modify the original object without creating a new object are called **in-place functions**. Additionally, notice that we used applied function to the end of the list object, rather than `append(glengths, 12000)`. This is because `append()` is a **method that belongs only to the list object**, while `len()` is a built-in function that can be used on many different types of objects. Essentially, since `append()` is a method that is specific to lists, we use the syntax `list.method()`. As `len()` is a function that can be used on many different types of objects, we use the syntax `function(object)`. Now that we have added a new genome length to `glengths`, we can see that there are more elements than before. ```{python} #| label: list_len_after_append # Calculate the number of elements in glengths after appending a new element len(glengths) ``` ### Indexing There may be times when we want to access a specific element in a list. We can do this via indexing. Python uses what is called **zero-based indexing**, which means that the _first element_ of a list is accessed with _index 0_, the _second element_ with _index 1_, and so on. Table: Indices of list `species`, showing how the first element is indexed at zero. {#tbl-indexing} | Index | Value | |-------|-------| | 0 | ecoli | | 1 | human | | 2 | corn | We can access individual elements of a list using square brackets `[]` and the index of the element we want to access following the list name. For example, to access the first element of the `species` list, we would use `species[0]`. ```{python} #| label: species_elem_0 # Access the first element of species species[0] ``` Now if we wanted to access the **second** element of the `species` list, we would use `species[1]`. This can be a bit confusing at first, but will become easier with practice. ```{python} #| label: species_elem_1 # Access the second element of species species[1] ``` ::: {.callout-note collapse="true"} # R is one-indexed If you have used R before, you may be used to one-based indexing, which means that the first element of a list is accessed with index 1. As you have now seen, in Python, the first element is accessed with index 0. So when you go back and forth between R and Python, it is important to remember that the indexing is different in the two languages. ::: When you want to access the **last element** of a list, you can use the index `-1`. This is a special index that always refers to the last element, regardless of how many elements are in the list. For example, in order to access the last element of the `glengths` list, we would use `glengths[-1]`. ```{python} #| label: glengths_last_elem # Access the last element of glengths glengths[-1] ``` We should see the 12,000 kb genome length that we added in the previous section with the `append()` function. :::{.callout-tip} # [**Exercise 1**](04_data_structures-Answer_key.qmd#exercise-1) 1. How would you access the fourth element of the `species` list and what is it? 2. Add `"yeast"` to the `species` list and print the updated list. ```{python} #| label: hidden_exercise_2 #| echo: false # Append yeast to the species list species.append("yeast") ``` ::: ### Nested lists We can even have lists of lists! For example, we could have a list that contains both our `glengths` and `species` lists: ```{python} #| label: nested_list_example # Create a nested list that contains both glengths and species nested_list = [glengths, species] nested_list ``` If we wanted to access the first element of the `nested_list`, we would use `nested_list[0]`, which would give us the `glengths` list. If we wanted to access the second element of the `nested_list`, we would use `nested_list[1]`, which would give us the `species` list. ```{python} #| label: nested_list_access # Access species inside nested_list nested_list[1] ``` But perhaps we wanted to access the second element of the `species` list, which is `human`. To do this, we would first access the `species` list with `nested_list[1]`, and then access the second element of that list with `[1]`. So the code to access `human` would be `nested_list[1][1]`. ```{python} #| label: nested_list_access_second_element # Access the second element of species inside nested_list nested_list[1][1] ``` This stucture is actually quite similar to how matrices are structured (we will discuss how to utilize matrices with `numpy` in a future lesson). A matrix has rows and columns, and you can access individual elements by specifying the row and column indices. In a nested list, you have lists within lists, and you can similarly access elements by specifying the _indices of the outer list and then the inner list_. ## Accessing multiple elements in a list While we can directly access individual elements in a list with indexing, we may want to access multiple elements at once. We can do this with slicing. ### Slicing Slicing is a way to access a range of elements in a list. We can use slicing to access a subset of the elements in a list. The syntax for slicing is `list[start:stop]`, where `start` is the index of the first element we want to access and `stop` is the index of the first element we do not want to access. So if we wanted to access the first three elements we would call `species[0:3]`. ```{python} #| label: slicing_example # Create a slice to get the first three elements of species species[0:3] ``` If you wanted to access from `start` all the way to the end of the list, you can omit the `stop` index. For example, if we wanted to access from the second element to the end of the list, we would use `species[1:]`. ```{python} #| label: slicing_example_to_end # Create a slice to get all elements from the second element to the end species[1:] ``` :::{.callout-tip} # [**Exercise 2**](04_data_structures-Answer_key.qmd#exercise-2) 3. How might you access the last two elements of the `species` list using slicing? *Hint: Recall that we can use _negative indexing_ to access elements from the end of the list.* 4. You can actually slice with `steps` using the syntax `list[start:stop:step]`. This allows you to access every `step`-th element in the range from `start` to `stop`. How would you access every other element in the `species` list using slicing? ::: ## Strings are lists! Earlier, we talked about strings as a data type that is used to represent text. However, strings are actually a special type of list! Each character in a string is an element (of the data type `character`) in the list, and we can access them with indexing and slicing just like we would with any other list. This is one of many ways to access a substring of a string. For example, if we have a string `s = "hello"`, we can access the first character with `s[0]`, which would give us `h`. ```{python} #| label: string_index_example # Use an index to get the first character of a string s = "hello" s[0] ``` We can also slice the string to get a subset of the characters. For example, `s[1:4]` would give us `ell`. ```{python} #| label: string_slicing_example # Create a slice to get characters from index 1 to index 3 s[1:4] ``` ## Dictionaries Dictionaries are another data structure that are similar to a list. However, instead of being an ordered collection of elements, a dictionary is a collection of **key:value pairs**. Each key in a dictionary must be unique and maps to a value. This means instead of accessing elements by their integer index, we access values in a dictionary by their keys. So let us begin by creating a dictionary that contains the species names as keys and their corresponding genome lengths as values. This can be a more intuitive way to organize our data, as in this case we can access the genome length for a specific species by using the species name as the key instead of using the index. ### Creating a dictionary We can create a dictionary using curly braces `{}` and separating the keys and values with colons `:`. For example, we can create a dictionary called `genome_dict` like this: ````{python} #| label: create_genome_dict # Create a dictionary with species as keys and genome lengths as values genome_dict = { "ecoli": 4.6, "human": 3000, "corn": 50000, "yeast": 12000 } genome_dict ```` We can represent our dictionary in a table similarly to how we did with the list, but this time we will use keys instead of indices. Table: Dictionary, where keys are species name and values are length of the species' genome in kilobases. {#tbl-dict} | Key | Value | |-------|-------| | ecoli | 4.6 | | human | 3000 | | corn | 50000 | | yeast | 12000 | And if we investigate the `type()` of `genome_dict`, we can see that it is a dictionary data structure: ```{python} #| label: dict_type type(genome_dict) ``` ### Accessing `keys` and `values` In a dictionary, we can access the keys and values separately. We can use the `keys()` method to get a list of all the keys in the dictionary, and the `values()` method to get a list of all the values in the dictionary. For example, to get the keys and values of our `genome_dict`, we would use the following code: ```{python} #| label: dict_keys_values # Get the keys of the dictionary print(genome_dict.keys()) # Get the values of the dictionary print(genome_dict.values()) ``` This is how we grab _all_ the keys and values in a dictionary. However, if we want to access the value for a specific key, we can use the syntax `dict[key]`. For example, to get the genome length for `human`, we would use `genome_dict["human"]`. ```{python} #| label: dict_human_key # Access the value for the key "human" genome_dict["human"] ``` :::{.callout-tip} # [**Exercise 3**](04_data_structures-Answer_key.qmd#exercise-3) 5. How would you access the genome length for `corn` in the `genome_dict` dictionary? ::: *** [Next Lesson >>](05_loops.qmd) [Back to Schedule](../schedule/schedule.qmd)

Learning objectives

Overview of lesson

Data structures

Lists

Creating a list

Length of a list

Adding elements to a list

Indexing

Nested lists

Accessing multiple elements in a list

Slicing

Strings are lists!

Dictionaries

Creating a dictionary

Accessing keys and values

Reuse

Accessing `keys` and `values`