Reviewers Responses

A second question stemming from this is how good (realistic) these organoids are. The scRNA-seq is in the main figures and text mainly (only?) discussed and interpreted from the nephron point of view (which indeed looks impressive but as argued before the comparison to established methods is minimal). Only Sup Fig 3 shows there is a huge stromal component in these organoids which is (at this moment) not addressed in the text. What is this? Where is it coming from? Are these specific renal stromal cell types or more general (‘stromal’ after all means something different to different people). Is this also the case with the old methods, if so is this stromal compartment increasing or not?

Further analysis of existing data: This is asking for more detailed analysis of the stroma, which is a little beyond the scope of this paper. However, the “what is this?” question could be addressed by further scRNAseq analysis - isolation of stromal component and comparison to existing mouse datasets where stroma has been well-characterised, as well as immunofluorescence for stromal markers. The “where is this coming from” would require detailed lineage tracing, part of a separate question that is not the focus of this paper. We could argue this. We cannot be expected to study the origin of individual cell types within these organoids. This is unreasonable. Sean has offered to do some direct stromal comparisons, but I feel this still remains largely supplementary data.

Only further in the manuscript (Fig 5) we return to this stromal core in the organoids as an explanation for the radial morphology of the PE-organoids as a source of WNT inhibition. Here it turns out there is mainly cartilage developing at the stromal core of the organoids, and only now it is made clear that clusters 2 and 5 of the seq data are in fact ‘cartilage clusters’ (line 422). Sure, other protocols show cartilage development (as mentioned in the discussion) but again the comparison to the established methods becomes essential and is lacking (Fig 5B only shows PE-organoids). Further analysis of existing data potentially possible: Reviewer interpreted the data to show that ‘mainly cartilage’ develops in the centre of the organoid – I am not sure of exact proportion but it is not all of the core. Some further analysis of the core (eg- sox9 staining) could be beneficial here. In terms of comparison with existing protocols – this could be addressed through further scRNAseq analysis and could be rolled together with the stromal analysis above).
Add to the above analysis.

Strategy

Re-evaluate the clustering and DKCC analysis from the original paper
Compare to the England et al. 2020 stromal genes
Perform an analysis in Azimuth using the developing fetal atlas

Part 1: Re-evaluate

library(knitr)
library(Seurat)

## This version of bslib is designed to work with shiny version 1.6.0 or higher.

library(tidyverse)

## Registered S3 method overwritten by 'cli':
##   method     from    
##   print.boxx spatstat

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──

## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.0.4     ✓ dplyr   1.0.6
## ✓ tidyr   1.1.4     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

seed <- 250395

Read in the dataset

extdiff <- read_rds("/group/kidn1/Group-Little_MCRI/People/JessV/Profiling/DD156_1502cln2_ExtDiff_CDBLY2_scRNASeq/output/rds/ExtDiff_Merge.rds")
extdiff$SCT_snn_res.1 <- factor(extdiff$SCT_snn_res.1, levels = 0:23)

This dataset is classified using DevKidCC and Seurat.

(VlnPlot(extdiff, group.by = "LineageID", features = c("EPCAM", "NPHS1", "LRP2", "SLC12A1", "SIX1", "COL1A1", "COL3A1", "VIM", "DCN"), stack = T) + NoLegend()) |
(VlnPlot(extdiff, group.by = "SCT_snn_res.1", features = c("EPCAM", "NPHS1", "LRP2", "SLC12A1", "SIX1", "COL1A1", "COL3A1", "VIM", "DCN"), stack = T) + NoLegend())

get unassigned cells from devkidcc and recluster

We can grab both the “Stroma” and “unassigned” cells from the DevKidCC analysis. These represent the stroma and mesenchymal populations present, thus the cells we are interested in.

mes <- extdiff[, extdiff$LineageID %in% c("unassigned", "Stroma")]
mes <- CreateSeuratObject(mes@assays$RNA@counts, meta.data = mes@meta.data) %>% 
  NormalizeData() %>% 
  ScaleData(vars.to.regress = c("S.Score", "G2M.Score", "percent.mt")) %>% 
  FindVariableFeatures() %>% 
  RunPCA(npcs=30, seed.use = seed) %>% 
  RunUMAP(dims=1:30, seed.use = seed)
mes <- mes %>% FindNeighbors() %>% FindClusters(resolution = seq(0.1, 0.9, 0.1))
write_rds(mes, here::here("Stroma.rds"))

Can look at the gene expression within each subset of stroma.

(VlnPlot(mes, group.by = "DKCC", features = c("SIX1", "COL1A1", "COL3A1", "COL4A1", "VIM", "DCN", "GATA3", "ALCAM"), stack = T) + NoLegend())

How do these cells look when plotted out in the UMAP coords?

DimPlot(mes, group.by = "RNA_snn_res.0.2", label = T) | DimPlot(mes, group.by = "age", label = T)

## Warning: Using `as.character()` on a quosure is deprecated as of rlang 0.3.0.
## Please use `as_label()` or `as_name()` instead.
## This warning is displayed once per session.

So using a resolution of 0.2, what is the breakdown of cells from DevKidCC analysis in these clusters?

table(mes$DKCC, mes$RNA_snn_res.0.2)

##             
##                 0    1    2    3    4    5    6    7
##   CS            0 2538 1184 1374  129    0   18    0
##   MesS          0   90   97   66    0    0   17    4
##   MS            0   24   14    6    0    0    7    0
##   unassigned 4950  880 1433  310 1499 1148   63   72

clusters 0, 5 and 7 were predominantly unassigned, while clusters 1 and 3 were predominantly assigned as stroma. clusters 2, 3 and 4 were a mix.

The clusters are divided cleanly by the two ages of the samples, which is expected.

mes$RNA_snn_res.0.2 <- factor(mes$RNA_snn_res.0.2, levels = 0:(length(unique(mes$RNA_snn_res.0.2))-1))
markers <- FindAllMarkers(SetIdent(mes, value = "RNA_snn_res.0.2"), test.use = "t", random.seed = seed)
write_csv(markers, "StromaUnassignedMarkers.csv")

marker.genes <- (markers %>% group_by(cluster) %>% top_n(3, avg_logFC))$gene
(VlnPlot(mes, group.by = "RNA_snn_res.0.2", features = marker.genes, stack = T) + NoLegend())

Some interesting and useful markers jump out here. OGN is expressed by developing cartilage. CXCL12 and CRABP1 are strongly coexpressed with OGN. These cells are in clusters 1,2,3 and 7. There are collagens present, COL2A1, COL3A1 and COL21A1 coming up as markers. Clusters 0 and 5 have a PAX8 signature, indicating these may be mesenchymal cells that go on to contribute to the nephron lineages.

map(0:5, ~VlnPlot(mes, group.by = "RNA_snn_res.0.2", features = (markers %>% filter(cluster==.x))$gene[1:50], stack = T) + NoLegend())

## [[1]]

## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

## 
## [[5]]

## 
## [[6]]

run through toppfun and stringdb

Can export these gene lists and run them through a geneset analysis process like StringDB or ToppFun.

WriteXLS::WriteXLS(map(0:7, ~markers %>% filter(cluster==.x) %>% arrange(-avg_logFC)), ExcelFileName = "StromaUnassignedMarkers.xlsx", SheetNames = paste0("cluster_", 0:7))

The marker list is available at StromaUnassignedMarkers.xlsx.

I ran the top 50 DE genes through string db for cluster 1, which has OGN as it’s most DE gene. The result indicates there is an enrichment for biological processes such as Biomineral Tissue Development, Ossification, Skeletal System Developmet. Lots of collagens.

We can focus on completing this analysis if you think it’s useful. For now i’ll kick on with the single cell specific stuff.

Part 2: Comparisons to Stroma Publications

England et al. 2020 comparison

Here we can plot on the dataset where the genes that were identified as stromal population markers in the England et al., 2020 paper.

england.markers <- c("FOXD1", "NTN1", "FIBIN", "SMOC2", "DLK1", "LOX", "CLCA3A1", "PENK", "WNT4", "IGF1", "AKR1B10", "GATA6", "MEIS1", "MEIS2", "MEIS3", "SNAI2")
(VlnPlot(mes, group.by = "RNA_snn_res.0.2", features = england.markers, stack = T) + NoLegend())

## Warning in FetchData(object = object, vars = features, slot = slot): The
## following requested variables were not found: CLCA3A1

Cluster 2, with an enrichment of FIBIN, SMOC2, DLK1 is reminiscent of a subset of the cortical stroma in E18.5 mice.

Tanigawa et al. 2022 comparison

Recent paper claiming to differentiate stromal progenitors from mouse ESC and then culture them with other mouse ESC diff pops (ub, np) to make higher order organoids.
They published some cool single cell data and lists of de genes so we can use those.

tanigawa <- map(readxl::excel_sheets("/group/kidn1/Group-Little_MCRI/Data/SingleCellRNASeq/External_Downloaded/Tanigawa_2022/41467_2022_28226_MOESM4_ESM.xlsx"),
                ~readxl::read_xlsx(path = "/group/kidn1/Group-Little_MCRI/Data/SingleCellRNASeq/External_Downloaded/Tanigawa_2022/41467_2022_28226_MOESM4_ESM.xlsx", sheet = .x))

## New names:
## * `` -> ...1
## New names:
## * `` -> ...1
## New names:
## * `` -> ...1
## New names:
## * `` -> ...1
## New names:
## * `` -> ...1
## New names:
## * `` -> ...1
## New names:
## * `` -> ...1
## New names:
## * `` -> ...1

## New names:
## * `` -> ...1
## * `ave.exp(P0)` -> `ave.exp(P0)...7`
## * `ave.exp(P0)` -> `ave.exp(P0)...8`

## New names:
## * `` -> ...1

names(tanigawa) <- readxl::excel_sheets("/group/kidn1/Group-Little_MCRI/Data/SingleCellRNASeq/External_Downloaded/Tanigawa_2022/41467_2022_28226_MOESM4_ESM.xlsx")

tanigawa.markers <- map(tanigawa, ~.x$`...1` %>% toupper())
tanigawa.markers <- map(tanigawa.markers, ~.x[.x %in% rownames(mes)])
map(tanigawa.markers, ~FeaturePlot(mes, features = .x[1:9], order = T) + NoLegend())

## Warning in FeaturePlot(mes, features = .x[1:9], order = T): All cells have the
## same value (0) of MUSTN1.

## Warning in FeaturePlot(mes, features = .x[1:9], order = T): All cells have the
## same value (0) of MYOC.

## Warning in FeaturePlot(mes, features = .x[1:9], order = T): All cells have the
## same value (0) of PGR.

## $`REN and Mes(#22)`

## 
## $`Mes(#17)`

## 
## $`SP(#8)`

## 
## $`CS(#7)`

## 
## $`OM(#5)`

## 
## $`IM(#2)`

## 
## $`US(#15)`

## 
## $`iSP specific(#35)`

## 
## $`nd-iS specific(#0)`

## 
## $`nd-iS specific(#9)`

Can’t see much of a pattern with these, but it is a limited gene list. I’ll plot the heatmaps, using the top 30 genes for each segment.

map(tanigawa.markers, ~DoHeatmap(mes, features = .x[1:30], group.by = "RNA_snn_res.0.2"))

## $`REN and Mes(#22)`

## 
## $`Mes(#17)`

## 
## $`SP(#8)`

## 
## $`CS(#7)`

## 
## $`OM(#5)`

## 
## $`IM(#2)`

## 
## $`US(#15)`

## 
## $`iSP specific(#35)`

## 
## $`nd-iS specific(#0)`

## 
## $`nd-iS specific(#9)`

IN general, some of the genes for each identity are expressed but there are not definite patterns that identify a match between gene populations.

try SCINA to predict identity

The last effort here would be using a predictive tool. SCINA takes lists of genes and predicts cell identity from these.

library(SCINA)
results <- SCINA(mes@assays$RNA@counts, map(tanigawa.markers, ~.x[1:50]), max_iter = 100, convergence_n = 10, 
    convergence_rate = 0.999, sensitivity_cutoff = 0.9, rm_overlap=TRUE, allow_unknown=TRUE, log_file='SCINA.log')
mes$SCINA <- results$cell_labels

DimPlot(mes[, mes$SCINA!="unknown"], group.by = "SCINA", label = T, repel = T) | DimPlot(mes[, mes$SCINA=="unknown"], group.by = "SCINA", label = T)

Well that is curious! while the majority of cells do come up as unknown, there are some clearly defined regions that have been identified as a particular stromal population. The D13 cells are split between a SP (stromal progenitor) population, an IM (inner medullary) population, and some few Mes (mesangial) cells. Moving into the older cells, D13p14, there is a small group of CS (cortical stroma), a larger group of OM (outer medullary) and some cells that associated with the induced stroma (iSP and nd-iS).

table(mes$SCINA)

## 
##             CS(#7)             IM(#2)  iSP specific(#35)           Mes(#17) 
##                624                317                 48                543 
## nd-iS specific(#0)             OM(#5)   REN and Mes(#22)             SP(#8) 
##                240               1747                 89               2452 
##            unknown            US(#15) 
##               9826                 37

Part 3: Azimuth

run cells through azimuth

I can only run small subsets of the data through Azimuth (upload limits). As such I subset the data and took 10% of all the cells, evenly distributed between the two ages.

subset_seu <- function(seu, x = 0.1, by = "capture", times = 1) {
  
  ds <- caret::createDataPartition(seu@meta.data[, by], times = times, p = x)
  if (times==1){
    seu <- seu[, ds$Resample1]
  } else {
    seu <- map(ds, ~seu[, .x])
  }
  return(seu)
}

mes.small <- subset_seu(mes.small, by = "age")
write_rds(mes.small, "Stroma_small.rds")

I ran this through Azimuth, using the “Human Fetal Development” reference.

az.umap <- read_rds("azimuth_umap.Rds")
az.meta <- read_tsv("azimuth_pred.tsv")
mes.small@meta.data <- bind_cols(mes.small@meta.data, az.meta)
mes.small@reductions$azimuth <- az.umap
write_rds(as.SingleCellExperiment(mes.small), "Mes_small_sce.rds")

mes.small <- read_rds("Mes_small_sce.rds")
mes.small <- as.Seurat(mes.small)

## Loading required package: SingleCellExperiment

## Loading required package: SummarizedExperiment

## Loading required package: GenomicRanges

## Loading required package: stats4

## Loading required package: BiocGenerics

## Loading required package: parallel

## 
## Attaching package: 'BiocGenerics'

## The following objects are masked from 'package:parallel':
## 
##     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
##     clusterExport, clusterMap, parApply, parCapply, parLapply,
##     parLapplyLB, parRapply, parSapply, parSapplyLB

## The following objects are masked from 'package:dplyr':
## 
##     combine, intersect, setdiff, union

## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs

## The following objects are masked from 'package:base':
## 
##     anyDuplicated, append, as.data.frame, basename, cbind, colnames,
##     dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
##     grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
##     order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
##     rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
##     union, unique, unsplit, which, which.max, which.min

## Loading required package: S4Vectors

## 
## Attaching package: 'S4Vectors'

## The following objects are masked from 'package:dplyr':
## 
##     first, rename

## The following object is masked from 'package:tidyr':
## 
##     expand

## The following object is masked from 'package:base':
## 
##     expand.grid

## Loading required package: IRanges

## 
## Attaching package: 'IRanges'

## The following objects are masked from 'package:dplyr':
## 
##     collapse, desc, slice

## The following object is masked from 'package:purrr':
## 
##     reduce

## Loading required package: GenomeInfoDb

## Loading required package: Biobase

## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.

## Loading required package: DelayedArray

## Loading required package: matrixStats

## 
## Attaching package: 'matrixStats'

## The following objects are masked from 'package:Biobase':
## 
##     anyMissing, rowMedians

## The following object is masked from 'package:dplyr':
## 
##     count

## Loading required package: BiocParallel

## 
## Attaching package: 'DelayedArray'

## The following objects are masked from 'package:matrixStats':
## 
##     colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges

## The following object is masked from 'package:purrr':
## 
##     simplify

## The following objects are masked from 'package:base':
## 
##     aperm, apply, rowsum

## 
## Attaching package: 'SummarizedExperiment'

## The following object is masked from 'package:Seurat':
## 
##     Assays

## Warning: Keys should be one or more alphanumeric characters followed by an
## underscore, setting key from PC__ to PC_

## Warning: All keys should be one or more alphanumeric characters followed by an
## underscore '_', setting key to PC_

## Warning: Keys should be one or more alphanumeric characters followed by an
## underscore, setting key from UMAP__ to UMAP_

## Warning: All keys should be one or more alphanumeric characters followed by an
## underscore '_', setting key to UMAP_

## Warning: Keys should be one or more alphanumeric characters followed by an
## underscore, setting key from UMAP__ to UMAP_

## Warning: All keys should be one or more alphanumeric characters followed by an
## underscore '_', setting key to UMAP_

## Warning: Cannot add objects with duplicate keys (offending key: UMAP_), setting
## key to 'azimuth_'

The above is some trickery to get the Version 4 Seurat object into Version 3.

DimPlot(mes.small, reduction = "AZIMUTH", group.by = "predicted.annotation.l1") /  DimPlot(mes.small, reduction = "UMAP", group.by = "predicted.annotation.l1")

THe top plot uses the coordinates from the HFD dataset UMAP, while the bottom is the stromal dataset UMAP. All the cells get assigned their most similar population within the reference.

We can ask the question, which of these have predictions above, say, 0.5?

mes.small$pred.l1.score <- mes.small$predicted.annotation.l1.score>0.5
DimPlot(mes.small, reduction = "UMAP", group.by = "predicted.annotation.l1", split.by = "pred.l1.score")

All in all, not that many.

tabulate results

Ask the question, what cells are classified as and is that score greater than 0.5 similarity?

table(mes.small$predicted.annotation.l1, mes.small$pred.l1.score)

##                                            
##                                             FALSE TRUE
##   Astrocytes                                   26    0
##   Bronchiolar and alveolar epithelial cells     1    0
##   CCL19_CCL21 positive cells                    3    0
##   Corneal and conjunctival epithelial cells     1    0
##   Endocardial cells                             1    0
##   ENS glia                                     86    5
##   ENS neurons                                  72    3
##   Epicardial fat cells                         91   94
##   Intestinal epithelial cells                   5    0
##   Mesangial cells                               5    0
##   Mesothelial cells                             4    0
##   Metanephric cells                           103  695
##   Neuroendocrine cells                         32    3
##   PAEP_MECOM positive cells                     1    0
##   Satellite cells                             157    3
##   Smooth muscle cells                          10   29
##   Squamous epithelial cells                     1    0
##   Stromal cells                                65   27
##   Vascular endothelial cells                    1    0
##   Visceral neurons                             69    0

VlnPlot(mes.small, features = "predicted.annotation.l1.score", group.by = "predicted.annotation.l1", pt.size = 0.1) + NoLegend() + geom_hline(yintercept = c(0.75, 0.5, 0.25))

## Warning: Groups with fewer than two data points have been dropped.

## Warning: Groups with fewer than two data points have been dropped.

## Warning: Groups with fewer than two data points have been dropped.

## Warning: Groups with fewer than two data points have been dropped.

## Warning: Groups with fewer than two data points have been dropped.

## Warning: Groups with fewer than two data points have been dropped.

Summary

As expected, there is no “smoking gun” here, no crystal clear answer to what these cells are. There are genes that indicate a cell type is present similar to that of cartilage or pre-cartilage, and that many collagen genes are being expressed on top of that. What we could draw from this is that many of the non-kidney cells in the organoid are over-expressing ECM genes such as collagens, which may be leading to a cartilagenous identity crisis or a differentation pathway towards cartilage from a general mesenchymal precursor.

I don’t think there is much in the way of a defined stromal sub-population signature, at least when using the gene lists provided through England or Tanigawa. There are genes that are certainly enriched in the renal stroma, and in some cases many of those in a population.

Finally when we look at the projection onto a fetal development dataset only one population is reliably identified, and that is the “Metanephric” cells at day 13. The day 13+14 cells almost all have low similarity scores for their best correlation.

Stromal analysis in response to reviewers comments

Sean Wilson

2022-02-08