Revealing the evolutionary history of the cancer cells is challenging given the standard approach of single time point sequencing, as we cannot observe changes to cancer cells across time. This challenge has not deterred researchers from tackling the problem: using acute statistical models, researchers have developed methods to reconstruct clonal trees using only the cells sequenced at a single time point. In the early stages, the clone reconstruction approaches relied solely on the bulk DNA-seq data; however, with recent advances in single-cell technologies, it is now possible to resolve the ancestral structure with greater accuracy than ever before.

The single-cell transcriptomics (scRNA-seq) is gaining popularity as it can characterize cancer in its functional state. This trend provides an opportunity to integrate scRNA-seq with bulk DNA-seq for clone tree reconstruction. PhylEx is a Bayesian method for clone tree reconstruction on single nucleotide variants (SNVs) found in bulk DNA-seq with scRNA-seq data. Using the bulk data, we can identify SNVs and the variant allele frequencies of each of the SNVs with high confidence. From the scRNA-seq data, we can take advantage of the co-occurrence of SNVs in single cells to identify branching events with high accuracy.

We deployed PhylEx to analyze high-grade serous ovarian cancer (HGSOC) cell-line data that consists of a primary as well as a metastatic tumor [1]. We performed Smart-seq3, a sequencing method that performs full-length sequencing [2].

Fig Figure 1. A. Ground truth tree labeled with clone names and the number of SNVs assigned to each clone. B. PhylEx reconstructed tree using bulk DNA-seq and scRNA-seq. The number of SNVs of each type is labeled. C. ZINB-WaVE dimensionality reduction on gene expression data with trajectory analysis results overlayed (arrows). D. ZINB-WaVE dimension plot with color-coding using Gaussian mixture clustering and with trajectory analysis result overlayed.

PhylEx showed high-concordance to the ground truth tree (Figure 1A-B). The clone ABCD and its descendants are from the primary region, whereas clones EFGHI and EF are from the metastatic region in Figure 1A. Aside from merging EFGHI and EF clones, PhylEx reconstructs the tree topology correctly.

We performed dimensionality reduction using the zero-inflated negative binomial model (ZINB-WaVE) on the raw expression counts. Then, we performed pseudo-time trajectory analysis on the reduced dimension using Slingshot; to that end, we first assigned cells to clones by matching the cells’ observed genotypes to the genotypes of the clones based on the SNVs assigned to each clone. As pseudo-time trajectory analysis attempts to reconstruct the clones’ temporal ordering from the single-cell gene expression data, it serves as a useful comparison. We found that the trajectory inferred was generally incorrect (Figure 1C); in particular, cells assigned to clone CD were inferred as a sibling to ABCD clone. Furthermore, if model-based clustering was applied to label the cells before performing Slingshot, the cells were grouped arbitrarily based on expression profile. Hence, the trajectory analysis did not yield any meaningful results. We have performed gene set enrichment analysis using the MSigDB Gene ontology set to compare the clones [3].

Our results show that cells in the metastatic region were found to down-regulate pathways related to the immune system (Table 1). Together, these analysis demonstrate that reconstructing the evolutionary path taken by the cells can reveal insights that would otherwise be unavailable using only the gene expression data.

Table 1: Top 10 down-regulated pathways in the metastatic region compared to the primary region.

# Gene ontology P-value FDR
1 MHC protein complex 4.85e-15 1.32e-11
2 Antigen processing and presentation of endogenous antigen 6.40e-15 1.32e-11
3 MHC class I protein complex 6.57e-14 8.08e-11
4 Response to type I interferon 2.86e-11 1.60e-08
5 Antigen processing and presentation of endogenous peptide antigen 2.61e-10 1.14e-07
6 Positive regulation of T cell mediated cytotoxicity 7.96e-09 2.51e-06
7 Interferon Gamma mediated signaling pathway 2.08e-08 5.72e-06
8 Regulation of T cell mediated cytotoxicity 2.37e-08 6.07e-06
9 Positive regulation of antigen processing and presentation 4.99e-07 9.59e-05
10 Detection of other organism 6.87e-07 1.28e-04

[1] Laks E, McPherson A, Zahn H, Lai D, Steif A, Brimhall J, Biele J, Wang B, Masud T, Ting J, Grewal D, Nielsen C, Leung S, Bojilova V, Smith M, Golovko O, Poon S, Eirew P, Kabeer F, Ruiz de Algara T, Lee SR, Taghiyar MJ, Huebner C, Ngo J, Chan T, Vatrt-Watts S, Walters P, Abrar N, Chan S, Wiens M, Martin L, Scott RW, Underhill TM, Chavez E, Steidl C, Da Costa D, Ma Y, Coope RJN, Corbett R, Pleasance S, Moore R, Mungall AJ, Mar C, Cafferty F, Gelmon K, Chia S; CRUK IMAXT Grand Challenge Team, Marra MA, Hansen C, Shah SP, Aparicio S. Clonal Decomposition and DNA Replication States Defined by Scaled Single-Cell Genome Sequencing. Cell. 2019 Nov 14;179(5):1207-1221.e22. doi: 10.1016/j.cell.2019.10.026. PMID: 31730858; PMCID: PMC6912164.

[2] Hagemann-Jensen, M., Ziegenhain, C., Chen, P. et al. Single-cell RNA counting at allele and isoform resolution using Smart-seq3. Nat Biotechnol 38, 708–714 (2020). https://doi.org/10.1038/s41587-020-0497-0

[3] Arthur Liberzon, Aravind Subramanian, Reid Pinchback, Helga Thorvaldsdóttir, Pablo Tamayo, Jill P. Mesirov, Molecular signatures database (MSigDB) 3.0, Bioinformatics, Volume 27, Issue 12, 15 June 2011, Pages 1739–1740, https://doi.org/10.1093/bioinformatics/btr260