library(DevKidCC)
library(tidyverse)
library(Seurat)
library(rmarkdown)
library(knitr)

Introduction

In this vignette I’ll run through how to use DevKidCC to classify kidney cells in scRNA-Seq data. I’ll then highlight the various visualisation tools that are included in the package.

getwd()
#> [1] "/group/kidn1/Group-Little_MCRI/People/Sean/PhD/R-projects/DevKidCC/vignettes"

DevKidCC

First we need to load the data and generate a Seurat object. To generate a Seurat object from raw files, follow the tutorials at the Satija Lab website

Here we will load the organoid dataset included with the package, from Howden et al. 2019

load("/group/kidn1/Group-Little_MCRI/People/Sean/PhD/R-projects/DevKidCC/data/organoid.rda")
organoid
#> An object of class Seurat 
#> 33697 features across 546 samples within 1 assay 
#> Active assay: RNA (33697 features, 5000 variable features)
#>  5 dimensional reductions calculated: cca, cca.aligned, tsne, pca, umap


This is a reduced dataset but it should work for demonstration purposes. This dataset has already been filtered for high quality cells, which should be done prior to any downstream analysis anyway so might as well do it here. What cells are in the dataset will not effect the classification outcome, but poor quality cells will skew dimensional reduction and gene expression analysis.

Running the classification tool is a simple one line command:

#organoid <- DevKidCC(organoid)

This runs through each classification stage of the model and outputs all the information including scores and tier outcomes in the metadata.

colnames(organoid@meta.data)
#>  [1] "orig.ident"          "nCount_RNA"          "nFeature_RNA"       
#>  [4] "nGene"               "nUMI"                "percent.mito"       
#>  [7] "percent.ribo"        "phase"               "G1"                 
#> [10] "G2M"                 "S"                   "age"                
#> [13] "res.0.2"             "res.0.4"             "res.0.6"            
#> [16] "res.0.8"             "res.1"               "res.1.2"            
#> [19] "res.1.4"             "res.1.6"             "res.1.8"            
#> [22] "res.2"               "Identity"            "scpred_Endo"        
#> [25] "scpred_Nephron"      "scpred_NPC"          "scpred_Stroma"      
#> [28] "scpred_UrEp"         "scpred_max"          "scpred_prediction"  
#> [31] "scpred_no_rejection" "LineageID"           "scpred_DN"          
#> [34] "scpred_EN"           "scpred_PN"           "scpred_RC"          
#> [37] "NephronID"           "scpred_EPT"          "scpred_PT"          
#> [40] "SegmentID"           "scpred_DT"           "scpred_EDT"         
#> [43] "scpred_LOH"          "scpred_EPod"         "scpred_PEC"         
#> [46] "scpred_Pod"          "scpred_CS"           "scpred_MesS"        
#> [49] "scpred_MS"           "scpred_SPC"          "StromaID"           
#> [52] "DKCC"                "sample.id"
organoid@meta.data[1:5, c(23, 25, 27, 28, 32, 53)]
#>                         Identity scpred_Nephron scpred_Stroma  scpred_UrEp
#> 18_AAACCTGAGTGTGAAT Dist T + LoH    0.999132837  0.0003101490 1.408198e-05
#> 18_AAACCTGCAAGTTCTG       Prox T    0.606596991  0.0112463345 9.026115e-05
#> 18_AAAGTAGCACCGTTGG     Stroma 5    0.002439798  0.7213967338 2.081041e-04
#> 18_AACACGTTCTCGTTTA   Cell Cycle    0.021803191  0.1974095816 4.504845e-04
#> 18_AACCGCGGTCTAAAGA   Cycling Ep    0.999713745  0.0001917537 4.249040e-05
#>                      LineageID sample.id
#> 18_AAACCTGAGTGTGAAT    Nephron How_T_D18
#> 18_AAACCTGCAAGTTCTG unassigned How_T_D18
#> 18_AAAGTAGCACCGTTGG     Stroma How_T_D18
#> 18_AACACGTTCTCGTTTA unassigned How_T_D18
#> 18_AACCGCGGTCTAAAGA    Nephron How_T_D18

Here we can see a comparison between the original identity of the cells Identity compared to the scores given for Nephron, Stroma and UrEp identities. The columns LineageID and DKCC show the assigned identities for the cells at the first (LineageID) and final (DKCC) classification levels.

These can now be used to visualise score or classification outcomes on TSNE or UMAP plots, once computed.
Hint: make cols = myColours() to replicate the colour scheme used in the paper and custom functions.

(DimPlot(organoid, group.by = "LineageID") | DimPlot(organoid, group.by = "DKCC", label = T, repel = T)) /
  (FeaturePlot(organoid, features = "scpred_Nephron", order = T) | FeaturePlot(organoid, features = "PAX2", order = T))
#> Warning: Using `as.character()` on a quosure is deprecated as of rlang 0.3.0.
#> Please use `as_label()` or `as_name()` instead.
#> This warning is displayed once per session.

Custom Visualisations

In an effort to make visual analysis of this data easier, we provide some functions for this.

ComponentPlot
This function will draw a bar chart showing contribution for annotated identities to a sample. This way you can directly compare two or more samples for the percentage or cell total contributions.

ComponentPlot(organoid, identity = "orig.ident", component = "LineageID", show.pct = T, do.label = T, feature = "PAX2")


Personally I prefer flipping the axes

ComponentPlot(organoid, identity = "orig.ident", component = "LineageID", show.pct = T, do.label = T, feature = "PAX2") + coord_flip()

There are a couple of features you can toggle in this function:
- show.pct will switch between a percentage contribution or total number of cells - do.label turns the labels inside the bars on or off - show.unassigned can remove the unassigned cells from the graph - show.gene.exp will switch the colours from the identities to a gene expression of the selected feature - feature changes the gene for the gene expression information if that is toggled on

ComponentPlot(organoid, identity = "orig.ident", component = "LineageID", show.pct = T, do.label = T, feature = "PAX2", show.gene.exp = T) + coord_flip()

DotPlotCompare
This function is a modified version of Seurat::DotPlot(). I’ve modified this by allowing you to express the original reference data and if wanted the previously published organoid datasets, as classified using DevKidCC. This function can be used to directly compare gene expression profiles between datasets.

DotPlotCompare(organoid, features = c("PAX2", "EPCAM", "NPHS1", "COL3A1", "GATA3"), columns = 1, size = "pct.exp")
#> Warning: Removed 15 rows containing missing values (geom_point).

I’ve tried to include multiple usage methods for this function but the simplest is to call your dataset and the genes you are interested in.
There are some fine tuning steps you can use if desired:
Select specific identities. There are key words you can use; “nephron” will select only the nephron segments, or you can select one or more specific identites. You can also switch the grouping from “gene” to “segment” and change the relative size of the dots with “dot.scale”.

DotPlotCompare(organoid, features = c("PAX2", "EPCAM", "NPHS1", "COL3A1", "GATA3"), columns = 1, size = "pct.exp", idents = "nephron") |
  DotPlotCompare(organoid, features = c("PAX2", "EPCAM", "NPHS1", "COL3A1", "GATA3"), columns = 1, 
                 size = "pct.exp", idents = "EPod", show = "segment", dot.scale = 10)
#> Warning: Removed 15 rows containing missing values (geom_point).
#> Warning: Removed 1 rows containing missing values (geom_point).


To compare your dataset to the existing organoid database, change compare.to.organoids. This function doesn’t filter specific organoids so it isn’t the easiest on the eye. Hopefully I can add that feature at a later date. Our test organoids are at the top or right (if flipped). Prior to using this call, you will need to download the organoid database. The database is stored as an rda file and can be downloaded at the following link: https://drive.google.com/file/d/1_6L0EKsYcHq2cS7cRK2NM5G6JCEZClW-/view?usp=sharing

DotPlotCompare(organoid, features = c("PAX2", "EPCAM", "NPHS1", "COL3A1", "GATA3"), compare.to.organoids = T, 
               columns = 1, size = "pct.exp", idents = "EPod", show = "segment") + coord_flip()
#> Warning in DotPlotCompare(organoid, features = c("PAX2", "EPCAM", "NPHS1", :
#> Please refer to https://github.com/KidneyRegeneration/DevKidCC for instructions
#> on downloading the database
#> Warning: Removed 1 rows containing missing values (geom_point).

IdentBoxPlot
This is a function to plot the assigned identities for multiple samples in a single Seurat object against eachother in a comparative manner. It doesn’t provide any statistics however, just a visual representation which can be useful with large dataset of many samples, unlike this example.

#IdentBoxPlot(organoid, group = "orig.ident", component = "LineageID", show.pct = T, column = 4)