# Load the data for heatmapsimport pandas as pdrpkm_ordered = pd.read_csv("data/ordered_counts_rpkm.csv", index_col=0)rpkm_ordered.head()
sample1
sample2
sample3
sample4
sample5
sample6
sample7
sample8
sample9
sample10
sample11
sample12
ENSMUSG00000000001
19.784800
19.265000
20.889500
24.076700
23.722200
20.819800
2.61161
5.849540
6.512630
24.198100
24.046500
26.915800
ENSMUSG00000000003
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.00000
0.000000
0.000000
0.000000
0.000000
0.000000
ENSMUSG00000000028
0.937792
1.032290
0.892183
0.827891
0.826954
1.168630
1.13441
0.698754
0.925117
1.045920
0.975327
0.673563
ENSMUSG00000000031
0.035963
0.000000
0.000000
0.000000
0.000000
0.051193
0.00000
0.029845
0.059773
0.000000
0.000000
0.020438
ENSMUSG00000000037
0.151417
0.056033
0.146196
0.180883
0.047324
0.143884
0.00000
0.068594
0.049415
0.017004
0.020640
0.066232
First let us identify the top 6 most highly expressed genes across all of our samples. We can do this by calculating the average expression for each gene across all samples and then selecting the top 6 genes with the highest average expression:
# Calculate the average expression for each gene across all samplesgene_means = rpkm_ordered.mean(axis=1)# Get the top 6 most highly expressed genesimportant_genes = gene_means.sort_values(ascending=False).head(6).indeximportant_genes
We can plot the gene expression for each of these genes across all of the samples using a heatmap. A heatmap is a graphical representation of data where individual values are represented as colors. In this case, we can use a heatmap to visualize the expression levels of our important genes across all of our samples.
# Plot a heatmap of the important genes across all samplesimport seaborn as snsimport matplotlib.pyplot as plt# Subset the data to include only the important genesimportant_genes_data = rpkm_ordered.loc[important_genes]# Create the heatmapsns.clustermap(important_genes_data, cmap="viridis", z_score=0)
Figure 1: Heatmap of the RPKM values for the top 6 most highly expressed genes across all samples.
Figure 2: Heatmap of the RPKM values for the top 6 most highly expressed genes across all samples.
But we have no idea which samples are which in this heatmap! We can add some extra information to our heatmap by adding a color bar that indicates the age of the mouse for each sample. To do this, we need to load in our metadata data frame that contains the age information for each sample:
# Load the metadata data framenew_metadata = pd.read_csv("data/new_metadata.csv", index_col=0)new_metadata.head()
genotype
celltype
replicate
mean_expression
age_in_days
sample1
Wt
typeA
1
10.266102
40
sample2
Wt
typeA
2
10.849759
32
sample3
Wt
typeA
3
9.452517
38
sample4
KO
typeA
1
15.833872
35
sample5
KO
typeA
2
15.590184
41
Now we can add a color bar to our heatmap that indicates the age of the mouse for each sample:
# Create the heatmap with a color bar indicating the age of the mouse for each samplesns.clustermap(important_genes_data, cmap="viridis", z_score=0, col_colors=new_metadata["age_in_days"].map({3: "blue", 18: "orange"}))
Figure 3: Heatmap of the RPKM values for the top 6 most highly expressed genes across all samples with a color bar indicating the age of the mouse for each sample.
Figure 4: Heatmap of the RPKM values for the top 6 most highly expressed genes across all samples with a color bar indicating the age of the mouse for each sample.
---title: "Heatmaps"description: | Write a description of the lesson here. author: - Noor Sohail - Will Gammerdingerdate: "2026-03-14"categories: - category_1 - category_2 - category_3 - category_4keywords: - keyword_1 - keyword_2 - keyword_3 - keyword_4 - keyword_5 - keyword_6license: "CC-BY-4.0"editor_options: markdown: wrap: 72---```{r}#| label: load_libraries_data#| echo: false# Load libraries and data# Interfacing with R quarto and python futzinglibrary(reticulate)use_condaenv("/opt/anaconda3/envs/intro_python", required =TRUE)```Approximate time: XX minutes## Learning Objectives In this lesson, we will:- Learning Objective 1- Learning Objective 2- Learning Objective 3## Overview of lessonWhen doing XYZ...## Heatmaps```{python}#| label: load_heatmap_data# Load the data for heatmapsimport pandas as pdrpkm_ordered = pd.read_csv("data/ordered_counts_rpkm.csv", index_col=0)rpkm_ordered.head()```First let us identify the top 6 most highly expressed genes across all of our samples. We can do this by calculating the average expression for each gene across all samples and then selecting the top 6 genes with the highest average expression:```{python}#| label: top_genes# Calculate the average expression for each gene across all samplesgene_means = rpkm_ordered.mean(axis=1)# Get the top 6 most highly expressed genesimportant_genes = gene_means.sort_values(ascending=False).head(6).indeximportant_genes```We can plot the gene expression for each of these genes across all of the samples using a heatmap. A heatmap is a graphical representation of data where individual values are represented as colors. In this case, we can use a heatmap to visualize the expression levels of our important genes across all of our samples.```{python}#| label: fig-top_genes_heatmap#| fig-cap: "Heatmap of the RPKM values for the top 6 most highly expressed genes across all samples."# Plot a heatmap of the important genes across all samplesimport seaborn as snsimport matplotlib.pyplot as plt# Subset the data to include only the important genesimportant_genes_data = rpkm_ordered.loc[important_genes]# Create the heatmapsns.clustermap(important_genes_data, cmap="viridis", z_score=0)```But we have no idea which samples are which in this heatmap! We can add some extra information to our heatmap by adding a color bar that indicates the `age` of the mouse for each sample. To do this, we need to load in our metadata data frame that contains the age information for each sample:```{python}#| label: load_metadata# Load the metadata data framenew_metadata = pd.read_csv("data/new_metadata.csv", index_col=0)new_metadata.head()```Now we can add a color bar to our heatmap that indicates the age of the mouse for each sample:```{python}#| label: fig-top_genes_heatmap_age#| fig-cap: "Heatmap of the RPKM values for the top 6 most highly expressed genes across all samples with a color bar indicating the age of the mouse for each sample."# Create the heatmap with a color bar indicating the age of the mouse for each samplesns.clustermap(important_genes_data, cmap="viridis", z_score=0, col_colors=new_metadata["age_in_days"].map({3: "blue", 18: "orange"}))```### Subsection 1A### Subsection 1B:::{.callout-tip}# [**Exercise 1**](13_lesson-Answer_key.qmd#exercise-1)1. A question to evaluate Learning Objective 12. A followup question to question #13. ...:::## Topic 2### Subsection 2A### Subsection 2B:::{.callout-tip}# [**Exercise 2**](13_lesson-Answer_key.qmd#exercise-2)1. A question to evaluate Learning Objective 22. A followup question to question #13. ...:::## Topic 3### Subsection 3A### Subsection 3B:::{.callout-tip}# [**Exercise 3**](13_lesson-Answer_key.qmd#exercise-3)1. A question to evaluate Learning Objective 32. A followup question to question #13. ...:::***[Back to Schedule](../schedule/schedule.qmd)