Bulk RNA-Seq

In this tutorial we are going to run through how UCDeconvolve can be used to aid in the analysis and annotation of bulk RNA-Sequencing data. For this tutorial, we are going to recreate part of the analysis in Figure 5 of the unicell deconvolve manuscript, where we analyze bulk-RNA-Seq data from patients at various stages of Type 2 Diabetes from a prior study GSE50244.

Loading Packages & Authenticating

The first step in this analysis will be to load scanpy and ucdeconvolve after following the installation and registration instructions, and authenticate our API. In this tutorial we saved our user access token in the variable TOKEN.

[17]:

import matplotlib.pyplot as plt
import scanpy as sc
import seaborn as sns
import requests
import io

import ucdeconvolve as ucd
ucd.api.authenticate(TOKEN)

2023-04-25 17:23:16,217|[UCD]|INFO: Updated valid user access token.

Download and Pre-Process Data

We will download a preprocessed version of the target dataset from an online repository. This dataset contains expression data from pancreatic islet biopsies from ~100 patients at varying stages of T2D.

[3]:

url = "https://github.com/dchary/ucdeconvolve_paper/raw/main/figure5/adata_t2d_GSE50244.h5ad"
adata = sc.read_h5ad(io.BytesIO(requests.get(url).content))

Run UCDBase For Context-Free Prediction

We begin by obtaining context-free cell type predictions using UCDBase.

[5]:

ucd.tl.base(adata)

2023-04-25 17:19:09,360|[UCD]|INFO: Starting UCDeconvolveBASE Run. | Timer Started.
Preprocessing Dataset | 100% (1 of 1) |##| Elapsed Time: 0:00:00 Time:  0:00:00
2023-04-25 17:19:10,062|[UCD]|INFO: Uploading Data | Timer Started.
2023-04-25 17:19:10,917|[UCD]|INFO: Upload Complete | Elapsed Time: 0.854 (s)
Waiting For Submission : UNKNOWN | Queue Size : 0 | \ |#| 2 Elapsed Time: 0:00:03
Waiting For Submission : QUEUED | Queue Size : 1 | / |#| 4 Elapsed Time: 0:00:06
Waiting For Submission : RUNNING | Queue Size : 1 | | |#| 4 Elapsed Time: 0:00:06
Waiting For Completion | 100% (77 of 77) || Elapsed Time: 0:00:14 Time:  0:00:14
2023-04-25 17:19:33,178|[UCD]|INFO: Download Results | Timer Started.
2023-04-25 17:19:33,417|[UCD]|INFO: Download Complete | Elapsed Time: 0.239 (s)
2023-04-25 17:19:34,084|[UCD]|INFO: Run Complete | Elapsed Time: 24.723 (s)

Examine Differences in Pancreatic Beta Cell Fractions Across Disease States

We want to look at differences in cell type fractions between patients stratified by disease state. Late-stage T2D is characterized by a loss of beta cells. Let’s see if this dataset is concordant with this hypothesis.

[34]:

# Load only beta cell predictions
preds = ucd.utils.read_results(adata, celltypes = ['type b pancreatic cell'])

# Append to adata as a column
adata.obs['type b pancreatic cell'] = preds['type b pancreatic cell']

# Plot boxplots
fig, ax = plt.subplots(figsize = (3.5,5))
sns.boxplot(data = adata.obs, x = 'DiseaseStatus', y = 'type b pancreatic cell',
            order = ['Normal', 'Pre-Diabetes', 'Diabetes'], ax = ax,
            palette="Set2", width = 0.5)
sns.stripplot(data = adata.obs, x = 'DiseaseStatus', y = 'type b pancreatic cell',
            order = ['Normal', 'Pre-Diabetes', 'Diabetes'], ax = ax, color = 'k', size = 4)
sns.despine(ax = ax, trim = True, offset = 5)

../_images/notebooks_bulk_example_7_0.png

Run UCDSelect For Contextualized Predictions

We may want to leverage a pancreas-specific reference dataset to explore more detail subtypes, but also to confirm the findings suggested by our context-free approach. We can use UCDSelect to do this, which leverages a transfer learning regime utilizing UCDBase as a feature extraciton engine to calculate cell-type features for an input target dataset and an annotated reference dataset.

UCDSelect comes with pre-built reference datasets for common tissue types. To view datasets available as prebuilt references, run the utility function ucd.utils.list_prebuilt_references().

Note

Would you like to have a particular study incorporated as a prebuilt reference? Email us at ucdeconvolve@gmail.com and let us know!

[35]:

ucd.utils.list_prebuilt_references()

[35]:

['allen-mouse-cortex', 'enge2017-human-pancreas', 'lee-human-pbmc-covid']

Running UCDSelect

Let’s go ahead and run ucdselect using the enge2017-human-pancreas reference.

[36]:

ucd.tl.select(adata, "enge2017-human-pancreas")

2023-04-25 17:28:26,171|[UCD]|INFO: Starting UCDeconvolveSELECT Run. | Timer Started.
Preprocessing Mix | 100% (1 of 1) |######| Elapsed Time: 0:00:00 Time:  0:00:00
Preprocessing Ref | 100% (1 of 1) |######| Elapsed Time: 0:00:00 Time:  0:00:00
2023-04-25 17:28:27,651|[UCD]|INFO: Uploading Data | Timer Started.
2023-04-25 17:28:28,784|[UCD]|INFO: Upload Complete | Elapsed Time: 1.132 (s)
Waiting For Submission : UNKNOWN | Queue Size : 0 | / |#| 0 Elapsed Time: 0:00:00
Waiting For Submission : QUEUED | Queue Size : 1 | - |#| 1 Elapsed Time: 0:00:01
Waiting For Submission : RUNNING | Queue Size : 1 | | |#| 1 Elapsed Time: 0:00:01
Waiting For Completion | 100% (77 of 77) || Elapsed Time: 0:00:22 Time:  0:00:22
2023-04-25 17:29:12,832|[UCD]|INFO: Download Results | Timer Started.
2023-04-25 17:29:12,958|[UCD]|INFO: Download Complete | Elapsed Time: 0.125 (s)
2023-04-25 17:29:13,621|[UCD]|INFO: Run Complete | Elapsed Time: 47.449 (s)

Let’s go ahead and read our results, where we see only pancreas-specific cell types being deconvoled which were annotated in the original reference.

[38]:

ucd.utils.read_results(adata, key = 'ucdselect').head(5)

[38]:

	beta	alpha	ductal	delta	acinar	mesenchymal
sample_id
GSM1216753	0.000859	0.048488	0.704008	0.119697	0.126949	0.000000
GSM1216755	0.126114	0.124226	0.000000	0.000000	0.710946	0.038713
GSM1216758	0.365763	0.102278	0.000000	0.000000	0.000000	0.531958
GSM1216760	0.566138	0.015259	0.000000	0.000000	0.207783	0.210820
GSM1216763	0.000000	0.066785	0.125175	0.000000	0.396808	0.411232

Let’s repeat our boxplot plotting and look at results for type b pancreatic cells when using a contextualized reference for deconvolution with fine-tuning.

[58]:

# Load only beta cell predictions
preds = ucd.utils.read_results(adata, key = 'ucdselect', celltypes = ['beta'])

# Append to adata as a column
adata.obs['beta'] = preds['beta']

# Plot boxplots
fig, ax = plt.subplots(figsize = (3.5,5))
sns.boxplot(data = adata.obs, x = 'DiseaseStatus', y = 'beta',
            order = ['Normal', 'Pre-Diabetes', 'Diabetes'], ax = ax,
            palette="Set2", width = 0.5)
sns.stripplot(data = adata.obs, x = 'DiseaseStatus', y = 'beta',
            order = ['Normal', 'Pre-Diabetes', 'Diabetes'], ax = ax, color = 'k', size = 4)
sns.despine(ax = ax, trim = True, offset = 5)

../_images/notebooks_bulk_example_16_0.png

We can see that islet samples in the Diabetes cohort exhibit almost no detectable beta cells, which patients who are otherwise healthy report a predicted beta cell median fraction of ~30%. Note that while the trend of relative proportions between ucdbase and ucdselect is the same, the absolute proportions were different. Healthy islets are often ~70% beta cells, however these bulk islet samples likely represent imperfect dissections, with ~50% of the tissue predicted to represent pancreatical acinar / ductal epithelial cells and mesencyhmal stroma.