Heatmaps

category_1
category_2
category_3
category_4

Write a description of the lesson here.

Authors

Noor Sohail

Will Gammerdinger

Published

March 14, 2026

Keywords

keyword_1, keyword_2, keyword_3, keyword_4, keyword_5, keyword_6

Approximate time: XX minutes

Learning Objectives

In this lesson, we will:

  • Learning Objective 1
  • Learning Objective 2
  • Learning Objective 3

Overview of lesson

When doing XYZ…

Heatmaps

# Load the data for heatmaps
import pandas as pd
rpkm_ordered = pd.read_csv("data/ordered_counts_rpkm.csv", index_col=0)
rpkm_ordered.head()
sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9 sample10 sample11 sample12
ENSMUSG00000000001 19.784800 19.265000 20.889500 24.076700 23.722200 20.819800 2.61161 5.849540 6.512630 24.198100 24.046500 26.915800
ENSMUSG00000000003 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000
ENSMUSG00000000028 0.937792 1.032290 0.892183 0.827891 0.826954 1.168630 1.13441 0.698754 0.925117 1.045920 0.975327 0.673563
ENSMUSG00000000031 0.035963 0.000000 0.000000 0.000000 0.000000 0.051193 0.00000 0.029845 0.059773 0.000000 0.000000 0.020438
ENSMUSG00000000037 0.151417 0.056033 0.146196 0.180883 0.047324 0.143884 0.00000 0.068594 0.049415 0.017004 0.020640 0.066232

First let us identify the top 6 most highly expressed genes across all of our samples. We can do this by calculating the average expression for each gene across all samples and then selecting the top 6 genes with the highest average expression:

# Calculate the average expression for each gene across all samples
gene_means = rpkm_ordered.mean(axis=1)

# Get the top 6 most highly expressed genes
important_genes = gene_means.sort_values(ascending=False).head(6).index
important_genes
Index(['ENSMUSG00000098973', 'ENSMUSG00000098178', 'ENSMUSG00000076258',
       'ENSMUSG00000076036', 'ENSMUSG00000076138', 'ENSMUSG00000093264'],
      dtype='str')

We can plot the gene expression for each of these genes across all of the samples using a heatmap. A heatmap is a graphical representation of data where individual values are represented as colors. In this case, we can use a heatmap to visualize the expression levels of our important genes across all of our samples.

# Plot a heatmap of the important genes across all samples
import seaborn as sns
import matplotlib.pyplot as plt

# Subset the data to include only the important genes
important_genes_data = rpkm_ordered.loc[important_genes]

# Create the heatmap
sns.clustermap(important_genes_data, 
            cmap="viridis", z_score=0)
Figure 1: Heatmap of the RPKM values for the top 6 most highly expressed genes across all samples.
Figure 2: Heatmap of the RPKM values for the top 6 most highly expressed genes across all samples.

But we have no idea which samples are which in this heatmap! We can add some extra information to our heatmap by adding a color bar that indicates the age of the mouse for each sample. To do this, we need to load in our metadata data frame that contains the age information for each sample:

# Load the metadata data frame
new_metadata = pd.read_csv("data/new_metadata.csv", index_col=0)
new_metadata.head()
genotype celltype replicate mean_expression age_in_days
sample1 Wt typeA 1 10.266102 40
sample2 Wt typeA 2 10.849759 32
sample3 Wt typeA 3 9.452517 38
sample4 KO typeA 1 15.833872 35
sample5 KO typeA 2 15.590184 41

Now we can add a color bar to our heatmap that indicates the age of the mouse for each sample:

# Create the heatmap with a color bar indicating the age of the mouse for each sample
sns.clustermap(important_genes_data, 
            cmap="viridis", z_score=0,
            col_colors=new_metadata["age_in_days"].map({3: "blue", 18: "orange"}))
Figure 3: Heatmap of the RPKM values for the top 6 most highly expressed genes across all samples with a color bar indicating the age of the mouse for each sample.
Figure 4: Heatmap of the RPKM values for the top 6 most highly expressed genes across all samples with a color bar indicating the age of the mouse for each sample.

Subsection 1A

Subsection 1B

  1. A question to evaluate Learning Objective 1
  2. A followup question to question #1

Topic 2

Subsection 2A

Subsection 2B

  1. A question to evaluate Learning Objective 2
  2. A followup question to question #1

Topic 3

Subsection 3A

Subsection 3B

  1. A question to evaluate Learning Objective 3
  2. A followup question to question #1

Back to Schedule

Reuse

CC-BY-4.0