Strange IndiaStrange India

Human lung tumour specimens

This study was approved by the institutional review board of the University of Cologne. We analysed 160 tumours and patient-matched blood samples from 65 patients with SCLC (Fig. 1a). The samples were collected from multiple collaborating hospitals and clinical facilities under institutional review board-approved protocols, and all patients provided written informed consent. For some patients the material was collected as part of an ongoing clinical trial (BIOLUMA, study no. NCT03083691), and those patients received as second- or third-line treatment anti-PD-1 either alone or in combination with anti-CTLA-4 immune checkpoint inhibitors. The course of treatment for all patients and information on all samples are detailed below and summarized in Extended Data Table 1 and Supplementary Tables 1 and 2.

All tumour samples were pathologically reviewed by at least two independent expert pathologists who inspected the histomorphology based on haematoxylin and eosin and immunohistochemical staining. All tumours were confirmed with SCLC histology; tumours from three patients were diagnosed with additional morphological components of LCNEC or adenocarcinoma (Extended Data Table 1 and Supplementary Table 1). All patient-matched, multiregional tumour and normal blood samples were confirmed as belonging to the same patient by short tandem repeat (STR) analysis conducted at the Institute of Legal Medicine at the University of Cologne, Germany, and further confirmed by genome sequencing data.

In the majority of cases we analysed at least two tumour samples per patient, which were acquired at either single or multiple timepoints throughout the clinical course of treatment (Supplementary Table 2). More than two tumour samples were acquired for 37% of patients (n = 24 of 65). For five patients we analysed tumour samples at three distinct time points (n = 5 of 65, 8%; Extended Data Table 1 and Supplementary Table 2). Samples were acquired as biopsies and lung resections, and we additionally engrafted tumour tissue from fine-needle biopsies (n = 2, one pleural and one lymph node metastasis) and CTCs (n = 29 of 160, 18%) onto immune-compromised mice (NSG mice) to establish PDX (in total n = 31 of 160, 19%; Fig. 1a); this approach allowed for enrichment of limited tumour material for in-depth genomic studies. Samples analysed as PDX are listed in Supplementary Tables 2 and 3 and are highlighted in Fig. 2d. As previously described12,13, sampling a patient’s blood for CTCs provides a minimally invasive approach towards analysis of tumour cells under therapy, and xenotransplant models have been shown to recapitulate the genomic profiles of the patient’s tumour. Xenotransplant models were established following an approach previously described12; tumour cells were engrafted subcutaneously into the flanks of 7–14-week-old NSG mice (male and female, NOD.Cg-Prkdcscid Il2rgtm1Wjl/SzJ; Jackson Laboratories), and tumours were harvested at a maximum volume of 1.5 mm3. Tumour histology was confirmed by pathological review, and STR profiling with patient-matched normal and tumour samples confirmed the identity of the engrafted patient-derived material. All animals were housed in a specific-pathogen-free facility under ambient temperature and humidity while maintaining a 12/12 h light/dark cycle. Animal experiments were approved by, and conducted in accordance with, the regulations of the local animal welfare authorities (State Agency for Nature, Environment and Consumer Protection of the State of North Rhine-Westphalia, nos. AZ: 84-02.04.2012.A281, 84-02.04.2015.A172 and 84-02.04.2018.A002).

Samples were categorized by location: we referred to the primary lung tumour and grouped metastatic sites as intrapulmonary metastases, including pulmonary and lung and mediastinal lymph node metastases; tumour sites grouped as extrapulmonary metastases include intrathoracic distant metastases of the pleura and extrathoracic distant metastases affecting abdominal sites, the brain or other less common metastatic sites (breast, skin, sternum), as well as CTCs propagated as CTC-derived xenotransplant models, which represent cells that spread to the bloodstream with the potential to seed distant metastases. In patients with highly metastatic disease we furthermore assessed whether, based on radiological images, tumour sites sampled throughout therapy were pre-existing at the time of first diagnosis or before treatment, and whether these sites were exposed to any given therapy (chemotherapy, radiation or immune checkpoint blockade). Furthermore, we assessed whether any samples were taken from a newly formed metastatic site which, according to radiological imaging, was not pre-existing at the time of first diagnosis or before any other treatment exposure. For CTC-derived models, because we had no information regarding whether the tumour site may have shed cells to the bloodstream, we classified any CTC-derived sample as tumour cells that may have been exposed to any given treatment. A schematic overview of the acquired samples and affected organ sites is depicted for each patient in the Supplementary Appendix.

Clinical characteristics

The clinical characteristics of the patients in our cohort are in line with those typically found in SCLC (Fig. 1b, Extended Data Table 1 and Supplementary Table 1). Median age at the time of first diagnosis was 64 years, and patients were predominantly male (n = 43 of 65, 66%) with a history of heavy smoking and a median number of 40 pack years (smoking history was known for 89% of patients, n = 58 of 65; the number of pack years was determined for 85% of patients, n = 55 of 65). For clinical correlations the following categories were defined: age groups of 65 years or more and under 65. Smoking status was classified as ‘current smoker’, ‘former smoker’ or ‘never smoker’.

The majority of the patients presented with a highly metastatic tumour classified as stages III and IV (n = 57 of 65, 88%; additional information on tumour, node and metastasis staging is provided in Supplementary Table 1). Seven patients were diagnosed with limited-stage disease or with tumour stage I, II or IIIA, and were therefore amenable to surgical lung resection.

Although one patient declined further therapy, all other patients in our cohort received systemic treatment with platinum-based chemotherapy. The majority of patients were treated with a combination of cisplatin/carboplatin and etoposide (n = 61 of 65; 94%); with regard to recent changes in the treatment of SCLC2, additional PD-L1 inhibition was administered to five of these patients. Due to the initial diagnosis with histological components of non-SCLC (adenocarcinoma or LCNEC), two patients were treated with cisplatin/carboplatin combined with vinorelbine (patients S02814 and S02917). Furthermore, one patient received only monotherapy with carboplatin. Throughout the course of treatment 72% of patients (n = 47 of 65) received additional radiation, mainly of the chest/lung/mediastinum (n = 35 or 47) or brain (n = 38 of 47); four patients underwent stereotactic surgery of brain metastases.

The clinical response to treatment was assessed by radiological imaging and classified as either complete response (CR), partial response (PR), stable disease (SD), progressive disease (PD) or mixed response (PR/PD). The clinical response to systemic first-line platinum-based chemotherapy was analysed for n = 55 patients; these patients receiving treatment with only systemic chemotherapy and were therefore considered for subsequent correlations of genomic and molecular phenotypes with clinical response. Genomic and molecular correlations with clinical response to chemotherapy were not considered for n = 10 patients in our cohort, because these patients were either lost to follow-up (n = 2), declined further treatment (n = 1) or received a lung resection resulting in differences in the dynamics of disease progression (n = 7).

Of the 55 patients who received only first-line systemic platinum-based chemotherapy, 60% (n = 33 of 55) responded to treatment with PR (n = 32) or CR (n = 1), 9% had stable disease (n = 5 of 55), 11% showed mixed response (n = 6 of 55) and 20% (n = 11 of 55) experienced a progressive disease, of which three succumbed to the disease during first-line treatment. Following treatment, two patients experienced fatal sepsis (patient S02608 while receiving treatment and experiencing disease progression; and patient S02658 following completion of chemotherapy; Supplementary Table 1); both patients were consequently censored when performing correlations with relapse-free survival, and the therapy response of patient S02608 was not evaluated. Median progression-free survival was 6.3 months. In addition we determined CTFI as an independent measure of sensitivity and duration of response to first-line platinum-based chemotherapy; median CTFI was 88 days. Fifty-three per cent of patients (n = 28 of 53) either did not respond, relapsed or succumbed to the tumour disease within 90 days following completion of first-line chemotherapy (following the guidelines of NCCN)14, and these patients were thus clinically classified as either chemorefractory or -resistant). Of the remaining patients who, based on NCCN guidelines, were considered as ‘platinum-sensitive’, 30% (n = 16 of 53) relapsed within 6 months following completion of chemotherapy and 17% were relapse-free for more than 6 months (n = 9 of 53). At relapse, 83% of patients (n = 44 of 53) received second-line systemic therapies that included treatment with anti-PD-1 and/or anti-CTLA-4 immune checkpoint inhibitors (n = 27) or other chemotherapeutics, including topotecan (n = 8), rechallenge with carboplatin and etoposide (n = 2) or combinations of adriamycin, cyclophosphamide and vincristine (n = 7) (Fig. 1 and Supplementary Table 1). Following tumour progression, ten patients were amenable to additional lines of therapy including immune checkpoint inhibitors (n = 6) or chemotherapeutics (n = 4).

The analysis of multiregional and longitudinal tumour sites from 65 patients with SCLC focused on distinct clinical scenarios. For interpatient comparisons we focused on studies of tumour pairs (‘Analysis of clonal architecture from multiregional and longitudinal tumour samples’; Fig. 1c). We focused on distinct clinical scenarios: (1) analysis of tumour samples from spatially distinct sites obtained from treatment-naive patients at the time of first diagnosis (n = 16); (2) analysis of temporally distinct tumour sites referring to samples acquired before treatment and during therapy, including those from patients undergoing neo-adjuvant treatment (n = 5); and (3) samples acquired before treatment and at relapse following completion of first-line platinum-based chemotherapy (that is, either following an initial response or disease progression despite treatment, n = 42). The analysis further focused on (4) spatially, but not temporally, separate tumours analysed solely at the time of relapse (n = 14), and (5) tumour sites acquired at the time of relapse from platinum-based therapy and following subsequent lines of treatment with immune checkpoint inhibitors (pre- and post-treatment with ICI, n = 7). We thus performed in total n = 84 paired analyses of tumour sites in 65 patients with SCLC (Supplementary Table 4).

In addition we performed clinical correlations in an independent cohort of patients with SCLC, who all received first-line systemic treatment with platinum-based chemotherapy; we performed whole-exome sequencing of the tumour samples and identified key genome alterations (n = 64 patients; Supplementary Table 12). This cohort was analysed to validate findings described in Fig. 5b; at least 56 samples are required to validate the findings at a significance level of 5% and a power of 80%; thus, we validated our findings at a power of greater than 80%.

DNA and RNA extraction

Nucleic acids were extracted from fresh-frozen blood or tissue or from formalin-fixed, paraffin-embedded (FFPE) tissue specimens (Supplementary Table 3). Tumour tissues were analysed by haematoxylin and eosin staining and nucleic acids were extracted from regions with a tumour content of at least 70%. All tumour samples derived from murine xenotransplant models showed a tumour content of at least 95% with no discernible innervation of murine cells, which was similarly observed in previous studies12,13. Fresh-frozen samples were processed by preparation of tissue sections, each of 20 μm thickness, on a cryostat (Leica) while maintaining a temperature of −20 °C. In the case of FFPE samples, sections of 20 μm thickness were prepared on slides on a microtome. DNA was extracted from both fresh-frozen tissues and EDTA blood with the Gentra Puregene DNA extraction kit (Qiagen) according to the protocol of the manufacturer.

To allow for high-quality sequencing data of FFPE material we applied ultrasonic acoustic energy, using the adaptive focused acoustics technology from Covaris and following the protocol of the manufacturer. DNA isolation was then performed with a bead-based approach (AMPure XP Beads, Beckman) and any fractions containing paraffin material were excluded from subsequent DNA isolation steps.

For samples with limited tumour material we further adjusted protocols, which included repeated rounds of protein and nucleic acid precipitation, to increase the DNA yield for subsequent sequencing studies. All DNA isolates were hydrated in TE buffer and molecular weight was assessed using the Agilent TapeStation system (Genomic DNA ScreenTape no. 5067-5365, Agilent Technologies). DNA isolates from fresh-frozen samples were confirmed as being of high molecular weight (above 10 kb), and samples with evident signs of degradation were excluded from further sequencing studies.

For RNA extraction, tissue sections were first lysed and homogenized with the Tissue Lyzer (Qiagen). Subsequent RNA extraction was performed with the Qiagen RNAeasy Mini Kit according to the instructions of the manufacturer. Alternatively we used the RNAeasy Micro Kit to extract RNA from small tissue biopsies. RNA quality was assessed with RNA Screen Tape (no. 5067-5576, Agilent Technologies) at the TapeStation. Samples with RNA integrity number above 7 were further analysed by RNA sequencing (RNA-seq).

Next-generation sequencing

All sequencing reactions were performed on either the Illumina HiSeq or NovaSeq sequencing platform. Details on genome sequencing data and quality metrics are provided in Supplementary Table 3. Sequencing data are deposited in the European Genome-Phenome Archive (accession no. EGAS50000000169).

Whole-exome sequencing

We performed whole-exome sequencing for all patient samples with the SureSelect Human All Exon V6 Kit (Agilent) following the protocol of the manufacturer. Exon-enriched libraries were subjected to paired-end sequencing on either the Illumina NovaSeq or Illumina HiSeq platform. For the former, libraries were prepared to reach a mean insert size of 200 base pairs (bp) for sequencing with a read length of 2× 100 bp. For the latter, DNA was prepared with a mean insert size of 160 bp for 2× 75 bp paired-end sequencing. Both tumour and normal DNA material were sequenced aiming for a coverage of at least 150× which, following filtering of PCR-duplicated reads and alignment to the annotated human genome (hg19), resulted in an average coverage of 127×. Tumour samples showed a median purity of 88% (interquartile range 78–96%), thus minimizing problems in the assessment of tumour-specific mutations. This allowed for sufficient sequencing depth for reliable analysis for allelic fractions and clonality, as described below. Median genome ploidy was determined at 2.5 (interquartile range 1.9–3.2; Supplementary Table 3).


Whole-genome sequencing (WGS) was performed for samples with sufficient DNA material and quality, additionally providing information on genomic rearrangements not identified by WES. Short-insert DNA libraries from fresh-frozen samples were prepared with the TruSeq DNA Nano PCRfree sample preparation kit (Illumina), and FFPE samples were prepared with the Aceel-NGS 2S Plus DNA library Kit. Paired-end sequencing at a minimum read length of 2× 150 bp was performed, and human DNA libraries were sequenced to an average coverage of 31× for both tumour and matched normal tissue (Supplementary Table 3).


Whole-transcriptome sequencing was performed to determine expression profiles for SCLC tumours in this cohort. RNA-seq was performed with RNA extracted from fresh-frozen human tumour tissue samples. Complementary DNA libraries were prepared from poly-A-selected RNA, applying the Illumina TruSeq protocol for messenger RNA. Libraries were then sequenced with a 2× 100 bp paired-end protocol, generating 50 Mio reads and thus accounting for a minimum mean coverage of 30× of the annotated transcriptome. Samples analysed by transcriptome sequencing are shown in Supplementary Table 2.

Dideoxynucleotide sequencing for validation of somatic alterations

If available, transcriptome or additional genome sequencing data were used to validate somatic mutations determined by genome sequencing. In cases without additional sequencing data, dideoxynucleotide chain termination sequencing (Sanger sequencing) was performed to validate key mutations, genomic rearrangements and chimeric fusion transcripts. Specifically, shared clonal mutations of key mutated genome alterations were confirmed by Sanger sequencing as being present in all tumour samples from a patient. For genomic rearrangements determined by WGS in a subset of samples per patient (Supplementary Tables 2 and 3), PCR reactions were performed and the genomic breakpoint was probed and analysed in that subset of samples. Complex genome alterations affecting TP53, RB1 and TP73 were thus confirmed in all samples of the respective patient (annotation provided in Extended Data Fig. 4). Clonal assessment of genomic rearrangement affecting key genes was determined with SVclone40 (see below). For subclonal and private mutations of key gene alterations, Sanger sequencing was performed to confirm both the mutation call and absence of these alterations in matching tumour samples. Primer pairs were designed to amplify the target region encompassing the somatic alteration. PCR reactions were performed with either genomic DNA, whole-genome-amplified DNA or cDNA. Amplified products were subjected to Sanger sequencing and the respective electropherogram was analysed with Geneious v.8 (

Data processing of transcriptome sequencing data

As previously described5,41, transcriptome sequencing data were processed with TRUP (tumour-specimen suited RNA-seq unified pipeline). Paired-end reads were mapped to the human reference genome (GRCh37/hg19). Samples obtained from patient-derived xenotransplant models were mapped to a combined human and murine reference genome (GRCh37/hg19 and GRCm38/mm10). Expression levels were determined for uniquely mapped paired-end reads using Cufflinks referring to the human reference genome, and expression levels were quantified as fragments per kilobase exon per million mapped reads (Supplementary Table 10).

Data processing of genome sequencing data

Raw sequencing reads were processed as previously described5,6,15. Reads were aligned to the human reference genome (GRCh37/hg19). Our cohort additionally included patient tumours expanded in immune-compromised mice (n = 32 samples; Fig. 1a and Supplementary Table 2). In these cases, sequencing reads of all samples from a given patient (including the normal reference sample and tumour samples obtained directly from the patient and derived from murine xenotransplant models) were aligned to a combined human and murine reference genome (GRCh37/hg19 and GRCm38/mm10), to exclude sequencing reads from murine cells and to allow for uniform processing of all samples from a given patient. Concordant read-pairs were identified as potential PCR duplicates and were subsequently masked in the alignment file and annotated as the number of masked reads. The quality of the sequencing data is summarized in Supplementary Table 3.

Human sequencing reads (mapped to the human reference genome) were analysed for tumour purity, tumour ploidy, somatic mutations and copy number alterations15. In addition, WGS data were analysed for genomic rearrangements with the previously described analysis pipeline5,6,15,42. Mutation calling was performed as previously described5,6,43. In brief, variant counts were assessed for tumour and matching normal samples, corrected for sequencing noise and compared with a database of 300 whole-exome and genome sequenced normal samples to filter and determine somatic mutation calls. Variants at low allelic fractions are often prone to result from sequencing artefacts, which occur as a consequence of sequencing noise arising from high-coverage WES due to either fragmented DNA as part of FFPE material or low-level contamination with murine reads in tumours derived from murine xenograft models. We therefore implemented strict filtering criteria for mutations occurring at allelic fractions of less than 0.2. Mutations were then filtered out if (1) the forward–reverse score was below 0.2 (forward–reverse score is 1.0 if 50% of variant reads are found on the forward or reverse read, and 0 if all variant reads are on one orientation); and (2) the allelic fraction of the variant v in consideration of minimal coverage C of the normal or matching tumour sample at position i (Cimin(tumour/normal)) did not exceed the read count (rc) threshold with a default value of 10. This was calculated as Cimin(tumour/normal) × vi < rc. We thus introduced a decision boundary that filters out mutations at relatively low allelic fractions and low sequencing coverage; mutations with low allelic fractions but high coverage were retained for further analyses. In addition we adjusted the stringency of this cut-off for individual samples. Although this stringent cut-off limits the identification of subclonal mutations, we have thus controlled for potential sequencing noise and false-positive mutation calls. As described below, multiregional studies may suggest mutations at very low allele fractions in one tumour that might be more abundant at another tumour site. In this instance, truly subclonal mutations at low allelic fractions that were filtered out in one sample at this step of the analysis were reintroduced as somatic mutation calls if the same mutation passed all stringent filtering criteria in another matched tumour sample.

Analysis of clonal architecture from multiregional and longitudinal tumour samples

We have developed a computational approach to identify individual clones from tumour sequencing data by applying a model that assigns an expected allelic fraction to each mutation under the assumption of clonality (that is, all tumour cells carry this mutation). The expected allelic fraction is corrected for tumour purity, average tumour ploidy and copy number state at the respective genomic coordinates of the said mutation. Relating the observed to the expected allelic fraction results in an estimated CCF that is a specific metric pertinent to each mutation15,44,45. Subsequent clustering of CCFs enabled identification of cell clones represented by subsets of individual mutations. The CCFs and associated clones present in a given tumour thus define the overall clonal composition at the time point of sampling. Through a one-dimensional approach to CCF clustering, we determined for each single tumour its clonal composition (one-dimensional mutation clustering15), a method benchmarked in pan-cancer studies for tumour heterogeneity44,45.

To study tumour evolution from multiregional or longitudinal tumour samples from a given patient, we further developed a two-dimensional approach to analysing pairs of samples from the same patient (two-dimensional clustering) and thus to the reconstruction of clonal dynamics15,43 (manuscript in preparation). Information on tumour phylogenies, subclonal mutations, subclones and clonal composition of sites is summarized in Fig. 2 and detailed information is provided in Supplementary Table 4. In addition, tumour phylogenies determined for each patient are provided in the Supplementary Appendix.

The sequencing data of tumour samples in our cohort showed an average purity of 85% (Supplementary Table 3). Thus WES at an average coverage of 127× provided the required sequencing depth to determine subclones in our data. The analysis of tumour subclones focused on mutation calls as determined by exome sequencing in each sample to track individual tumour clones.

The computational method for tumour phylogeny reconstruction starts by executing an extensive set of comparisons and quality controls of copy number states, and a set of mutations and their respective CCFs for each sample. Rather than working with mutation calls and copy number states assessed individually for each tumour sample, we first performed comparisons and adjustments across all samples of a given patient. This included generating a unified copy number segmentation for all samples, which is critical for assigning within each chromosomal segment allele-specific mutation calls, and subsequently to compute CCFs for each mutation. We furthermore created for each patient a unified list of all somatic single-nucleotide mutations (SNMs) determined from each sample, and in all samples we reprobed the presence of somatic mutations of the unified list with relaxed filter criteria for calling somatic mutations at low allelic fractions from sequencing data. This approach allowed us to confirm whether high-confidence mutation calls from one sample were either private events or also present in other patient-matched samples but occurring at lower allelic fractions. Following the refined assessment of copy number states and somatic mutation calls, we determined for each mutation both the observed and expected allele frequency under the assumption of clonality (that is, a cancer cell fraction of 1), so that the CCF of the mutation can be calculated as the ratio between observed and expected allele frequency15. We applied additional filter criteria to mark somatic mutations calls occurring near telomers (that is, located in the tails determined by 1.5% of chromosome length) or centromeric regions on the chromosome, where copy number estimations are frequently error prone and therefore lead to a potentially incorrect calculation of CCF.

Somatic insertions and deletions (indel calls) can lead to additional false-positive calls of SNMs as a consequence of improper mapping of reads with inserted or deleted bases. To reduce the number of false-positive SNM calls resulting from indels, we filtered out all SNMs in close proximity (less than 10 bp away) to any mutation call for insertions and deletions.

We applied filtering criteria for mutation calls present on chromosomal areas and which, in multiregional analyses, were found to undergo loss of heterozygosity (LOH) in at least one, but not all, of the samples of a given patient45,46,47. Samples with LOH may not harbour certain mutational calls due to the LOH event, whereas patient-matched samples without LOH may show those mutations. Consequently observed private, or almost private (CCF < 0.2), mutations in one sample lacking LOH events (whereas other patient-matched samples show the LOH event) may indicate a shared clone that undergoes copy number losses, and argue against the subclonal private acquisition of these mutations in this chromosomal area. A clear phylogenetic reconstruction in these cases is not straightforward: due to the inherent uncertainty if the mutations were not present in the other sample (that is, truly private) or lost via the LOH event, these mutations were excluded from phylogenetic tumour clone reconstructions. Following the same criteria, mutations in areas with subclonal copy number events in which one of the copy number clones was hit by an LOH event were also filtered out to avoid further uncertainty in the reconstruction of tumour phylogenies. As previously described15, our method also considered subclonal copy number changes in single-tumour samples. In consideration of copy number status and the observed allele frequency, the number of mutated copies was estimated and the CCF of the mutation determined. Somatic mutations that were found as clonal and that were the subject of subclonal copy number changes within single samples were filtered out.

In addition, we used the mapping qualities of the aligner (bwa mem, v.0.7.13-r1126) to filter out mutations in regions where more than 10% of uniquely mapped reads had a mapping quality below 10 (that is, less than 90% probability of having identified the correct mapping position).

With regard to potentially shared mutations, we also performed a power analysis to compare the CCFs of a given mutation between two samples with regard to their sequencing depth: we calculated a score per sample to consider the contribution of a single mutated read to the CCF. Per sample, the distribution of these scores could be estimated by a log-normal distribution whose 2.5% tails (z-score = 1.96) were cut off to filter out subsets of over- and underpowered mutations.

Last, to check further whether mutations observed as being private to one of the samples were truly private or simply not detected in the other sample (for example, due to insufficient coverage), we applied this statistical test: under the null hypothesis, the mutation is shared with an allelic fraction at least as high as that observed in one of the samples, and the probability (P value) of not detecting it within the given number of sequencing reads can be estimated using a binomial model. If the null hypothesis is rejected, the mutation is considered as being truly private, or otherwise is being filtered out. To determine those mutations that are rejected we apply the false discovery rate control at 5% by Benjamini–Hochberg correction.

Subsequent two-dimensional cluster analyses were performed with the set of mutations that passed all filters. This set was binned into a two-dimensional histogram of CCFs representing the observed data, which were modelled as a surface using two-dimensional smoothing splines with a common smoothing parameter. Based on an error estimate of the samples’ CCFs, this method deconvolutes part of the sequencing noise from the data. Subsequently the peaks of the surface were identified and interpreted as cluster centres (marked as red triangles in the cluster images for each patient; Supplementary Appendix), and all mutations were assigned to their nearest cluster centre by Euclidean distance. During the assignment procedure we require that shared mutations are assigned only to shared clusters whereas private mutations (that is, those exclusively called in one of the two samples) are assigned only to private clusters. Moreover, we set a minimum threshold of four mutations per cluster and disregarded identified surface peaks otherwise. Considering the cluster centre’s CCF as being representative of the corresponding cell clone, we applied the infinite sites hypothesis assuming that mutations appear once in the evolutionary history, and then determined the CCF sum rule46,47 to infer the most probable phylogenetic tree and, in particular, clonal composition per sample at the time point when sampling was derived. In the rare event that tumour phylogenetic rules allow for multiple solutions of tumour phylogeny, we assume maximum parsimony and prefer linear evolution over branched evolution within one sample.

In the case of CCF clusters that conflicted with phylogenetic rules we reanalysed somatic mutation calls initially computed with expected allele frequencies under the assumption of clonality. However, chromosomal segments with polyploidy allow for multiple values of absolute numbers of mutated copies (the so-called mutation multiplicity of each mutation call15,48). We therefore accounted for all potential solutions for mutation multiplicity of a given somatic mutation call and computed CCFs that rejected the assumption of clonality (null hypothesis) within the sample and which, in subsequent paired two-dimensional cluster analyses, resolved conflicts in phylogenetic tumour clone reconstruction.

Analysis of tumour phylogenies

Our approach thus enabled us to assign tumour phylogenies for all 65 patients, and to track individual clones from multiregional and longitudinal data. We assigned mutations to the most recent common ancestor (C0) if they were shared and found to be clonal across all tumour sites sampled (that is, having CCFs of approximately 1.0). Alterations with lower CCFs, or those found to be private to single-tumour sites, were determined as subclones. Clusters of at least n = 5 subclonal mutations were defined and labelled as subclone C1, C2 or C3, and derivates of these subclones were assigned accordingly (Fig. 2a). The resulting tumour phylogenies for all 65 patients are provided in the Supplementary Appendix, detailing all spatially and temporally distinct sites analysed and depicting the clinical treatment history for each patient. Additional information is provided in Supplementary Table 4.

To study patterns of tumour evolution we assigned tumour phylogenetic trees to the following classes (Fig. 2a): class A if no subclones were identified; class B if one subclone was identified, allowing only for linear evolvement of this subclone; class B if at least two subclones were found with linear phylogenies; class D, phylogenies with one branching event from C1 subclones; class E, phylogenies with one branching event from the most recent common ancestor clone C0; and class F, tumour phylogenies showing two or more branching events.

In this regard, increasing the number of tumour samples per patient will enhance the ability to determine subclonal mutations and subclones16. Because we analysed various numbers of samples for each patient (in 37% of cases, more than two samples per patient) we additionally downscaled our analyses to only two samples per patient to permit interpatient comparisons (Fig. 2d); we thus performed a total of n = 84 paired analyses (Fig. 1c and Supplementary Table 4). In the paired analysis for each patient we chose as representative the analysis showing the highest level of subclonal complexity, defined by the number of subclones and subclonal mutations identified. Downscaling the number of tumour samples per patient did not show any significant change in the absolute number of subclonal mutations but led to reduced numbers of assigned subclones with phylogenetic complexity of classes A–E only (Extended Data Fig. 1a,b). Downscaling the analysis to two samples per patient for interpatient comparison enabled the study of distinct scenarios throughout the clinical course of the patients (Fig. 1b). To study the full complexity of a patient’s tumour, all available samples were taken into consideration (Supplementary Appendix).

Analysis of cancer cell fractions for structural rearrangements

The analysis of the clonal architecture from multiregional and longitudinal tumour samples focused on the study of CCFs assigned to SNMs. In addition, to assess the clonality of structural rearrangement we applied SVclone (with default settings) to the whole-genome sequencing data of cases harbouring genomic rearrangements in key genes including RB1, TP53, TP73 and CREBBP/EP300. We first performed local remapping to the human genome for genomic rearrangements identified by our in-house pipeline42 and assigned CCFs for both chromosomal pairs of a given rearrangement with SVclone40 (Supplementary Table 7). The data are presented in Extended Data Fig. 5c; the gene alterations identified were found to be part of the clonal proportion of the respective sample.

Analysis of mutational signatures

We analysed our data for the activity of mutational signatures available in COSMIC, referring to SBS (COSMIC_v3.3_SBS_GRChr37_exome17).

Mutational signatures were analysed for the following categories: (1) the clonal proportion of all treatment-naive tumours, (2) the subclonal proportion of all treatment-naive tumours and (3) the subclonal proportion of all post-treatment tumours acquired following first-line platinum-based chemotherapy (Fig. 3a). The analysis of treatment-naive tumours refers to all naive samples available in this cohort (n = 58); signatures assigned to post-treatment tumours included all patients who received first-line platinum-based chemotherapy (n = 45), and we further distinguished whether tumour sites were exposed to chemotherapy alone (n = 20) or were potentially exposed to additional ionizing radiation (n = 25; Supplementary Table 2). Due to the high tumour mutational burden, signature assignments to clonal mutations were performed in cases with a median of over 300 mutations. To avoid overfitting and noise, assignments for subclonal mutations were performed only for cases with at least n > 20 mutations.

To fit mutational signatures to our samples we applied SigProfilerAssignment (that is, Analyze.cosmic_fit function17,49) to identify a representative subset of signatures. We initially fitted SBS mutational signatures to the mutation catalogue of each sample assigned to the categories. Selecting mutational signatures found in at least n = 5 cases, we thus identified the most prevalent subset of signatures in the clonal and subclonal proportions of treatment-naive and post-treatment tumours (SBS1, SBS2, SBS3, SBS4, SBS5, SBS13, SBS15, SBS16, SBS24, SBS29, SBS39, SBS40 and SBS92), to which all mutations were then fitted. Post-treatment samples additionally showed platinum-based signatures (SBS31 and SBS35), which were therefore included for the assignment of signatures for the subclonal proportion of post-treatment tumours. In addition we applied the in-house-developed computational tool CaMuS50 to confirm signature assignments. With CaMuS we first linearly fitted the COSMIC signatures to all mutations for each sample (including clonal and subclonal mutations) using a backward selection procedure. We next selected only those signatures that markedly reduced the cost of the model calculated over the whole dataset. Both tools generated similar results. The results of SigProfilerAssignment are provided in Fig. 3 and Extended Data Figs. 2 and 3. Comparisons with CaMuS are provided in Extended Data Fig. 2h and the data are summarized in Supplementary Table 5.

To track the dynamic activity of mutational signatures in patient-matched tumour samples over the course of the disease, we specifically assigned the subset of signatures identified with SigProfilerAssignment to patient-matched clonal and subclonal mutations pre- and post-treatment, including SBS31 and SBS35 (both related to platinum chemotherapy treatment) for all assignments of signatures. We thus confirmed the presence of platinum-based signatures only in post-treatment subclonal mutations of tumour samples but not in the patient-matched treatment-naive clonal or subclonal proportion of the tumour. In addition we analysed tumour samples from a cohort of patients undergoing subsequent second- or third-line treatment with immune checkpoint inhibition (n = 7). Tumour samples acquired before treatment with ICI were analysed in the categories above (corresponding to samples acquired at the time of relapse following first-line platinum-based chemotherapy). Samples pre- and post-treatment with ICI were analysed with the subset described above (Supplementary Table 5).

We furthermore tested our whole-genome and whole-exome sequencing data for mutational processes related to ionizing radiation. Following previous studies in this field25, we determined the ratio of insertions to deletions (indels) versus substitution burden and the ratio of deletions versus insertions based on exome- and genome-wide data (Extended Data Fig. 2f).

Analysis of significant mutations, copy number alterations and genome ploidy

To assess the relevance of key gene alterations in our cohort we referred to our previous study of significant gene alterations determined for 110 human SCLC samples5 (Supplementary Table 8). In addition we expanded this analysis to our present cohort of 65 patients. We determined the mutational landscape for each patient by creating the union of all mutations identified in multiple samples—this refers to the sum of mutually inclusive and private events (Supplementary Table 6). We combined the data from our current cohort of 65 patients with mutational data for 110 human SCLC samples5 (n = 175 patients) and determined significant gene alterations at a significance threshold of Q < 0.05 following our previously described method5. In brief, our approach estimates the background mutation rate for each gene and corrects for both synonymous mutations and the expression in human SCLC, referring to the transcriptional data of human SCLC5. The analysis included genes with fragments per kilobase exon per million mapped reads values of over 1 in at least 50 samples. Furthermore we analysed the data for significant mutational hotspots and significant enrichment of gene-damaging mutations. Mutations that significantly cluster within a gene were determined at Q < 0.05 (mutational hotspots). The analysis of gene-damaging mutations refers to (1) nonsense mutations resulting in early stop codons, (2) splice site mutations resulting in aberrant splicing, intron retention or in-frame losses of larger regions within the protein product and (3) frameshift mutations leading to early stop codons and thus resulting in greater changes in the gene and encoded transcript, presumably leading to either no protein product, to proteins with larger deletions within the protein structure or to truncated proteins. The enrichment of gene-damaging alterations was determined at Q < 0.05. We focused our studies on genes recurrently mutated in at least 8% of cases (affecting at least n = 14 patients in the combined analysis of this cohort and the previous cohort5); this allowed us to perform interpatient comparisons and to study a sufficient number of cases in our present cohort of n = 65 patients. To complement our analytical approach we also used other computational tools to study significant gene alterations, including MutSig2CV51, dNdSCV52 and OncodriveFML53. In brief, MutSig2CV and dNdSCV were run using their default configuration; for OncodriveFML we used the ‘complement’ method for the signature and ‘amean’ as statistics. Taking into account different levels of stringency, all computational models showed a high degree of overlap. All relevant and significant gene alterations are listed in Supplementary Table 8. In addition we studied gene alterations previously reported for targeted sequencing data from larger cohorts of patients with SCLC4; we scored the frequency and significance level of reported alterations for the samples in our cohort. Comparison of these data is provided in Extended Data Fig. 4e and Supplementary Table 8.

With regard to frequent alterations affecting TP53, RB1 and TP73 (Supplementary Table 8), which also included larger genomic rearrangements of these genes (Supplementary Table 7), we further analysed the gene-damaging effect of alterations. The impact of any genome alterations was evaluated in combination with the transcriptome sequencing data of these tumours, thus further informing on the presumed damage to the gene transcript and resulting protein product (Supplementary Table 11).

Significant copy number alterations were determined from uncorrected unsegmented copy number signals obtained from whole-exome sequencing data by applying the method CGARS54. We determined the analysis separately for pre- and post-treatment tumour samples, referring to one sample per patient case in both scenarios. Significant amplifications were determined with the upper quantiles 0.30, 0.10 and 0.05; deletions were computed in reference to lower quartiles 0.30, 0.15 and 0.05. Significance threshold was set at Q = 0.05. Significant copy number alterations are listed in Supplementary Table 9.

Overall genome ploidy was assigned for all patient tumours (Supplementary Table 3), with a threshold of 2.8 or above set to define those with higher genome ploidy33. Higher ploidy in cancer genomes can result either from multiple successive and independent copy number gains or through events of whole-genome doubling. To further determine events of genome duplication (or whole-genome doubling), tumours found to undergo ploidy changes were further analysed for the fraction of the genome with LOH to assign an event of genome doubling45 (Extended Data Fig. 5g,h).

Clinical correlations with chemotherapy relapse-free survival

We studied correlations of genomic subsets with relapse-free survival in patients receiving first-line systemic treatment with platinum-based chemotherapy. The analysis focused on the study of n = 55 patients for whom the clinical response to first-line platinum-based chemotherapy was determined. Ten patients from our cohort were not considered for this analysis because of either loss to follow-up (n = 2), declined further treatment and no longer in clinical care (n = 1) or received a lung resection resulting in longer disease-free survival and differences in the dynamics of disease progression (n = 7). We determined relapse-free survival by referring to CTFI, defined as the time between the end of chemotherapy and tumour recurrence, including for patients with disease progression resulting in death. Two patients in our cohort were reported with sepsis-related mortality and were censored in the analysis for recurrence-free survival, leaving a final total of n = 53 patients. All survival analyses were performed with SPSS. Survival distributions were plotted as Kaplan–Meier curves, with P values determined by log-rank test (Extended Data Fig. 8a). Hazard ratios with a 95% confidence interval and P values were further derived from Cox proportional hazard models. We performed correlations with key genomic parameters referring to significant gene mutations identified in Extended Data Fig. 4 and, in addition, we stratified patients according to genome ploidy (information available for n = 53 patients). We included in our analysis as clinical characteristics information on sex, age and tumour stage. We performed additional analyses on both smoking status and pack years of patients (available for n = 50 and n = 47 patients, respectively). Furthermore we included in our analyses the gene expression of key lineage transcription factors ASCL1, NEUROD1 and POU2F3 (available for n = 45 patients).

We checked that the assumption of proportional hazards was provided by log-minus-log survival plots and by the addition of time-dependent covariates to models. We performed multicollinearity assessment of predictors. We identified relevant gene alterations by performing regressions with backward elimination of insignificant predictors (backwards Wald, at a retention threshold of P < 0.05). The results of the Cox proportional hazard model are shown as forest plots.

Clinical correlations of genomic alterations with relapse-free survival were additionally analysed in an independent cohort of patients with SCLC (n = 64) who all received first-line systemic treatment with platinum-based chemotherapy. Note that we used WES and WGS to determine the full spectrum of alterations in key genes in our discovery cohort. By contrast, data for the independent cohort refer to WES data, which limits the detection of complex gene rearrangements that frequently affect CREBBP, EP300, TP73 and, to some extent, TP53 (ref. 5) (Extended Data Fig. 4a). The somatic alteration status for TP53, TP73, CREBBP, EP300 and FMN2 as determined by WES is provided in Supplementary Table 12.

Immunoblot analysis

Immunoblots were performed to probe tumour cell lysates for the expression of p53 (Extended Data Fig. 9a). Tissue samples from this cohort containing sufficient material were processed to 5 μm sections on a cryostat maintained at −20 °C. The non-SCLC cell line A549 served as control for the expression of wild-type p53 (ref. 55); we confirmed the identity of this cell line by STR profiling and performed tests to ensure no contamination with mycoplasma. Between 40 and 50 tissue sections per sample were sonicated for 3× 10 min and incubated for an additional 30 min in RIPA buffer supplemented with protease inhibitors (cOmplete Mini Protease Inhibitor Cocktail, Roche) and nuclease (benzonase, Millipore) at 4 °C. A549 cells were incubated in RIPA for 30 min at 4 °C. Supernatants were collected following centrifugation at 4 °C for 10 min at 20,000g and protein concentrations determined by bicinchoninic acid assay (Pierce). Either 15 μg (tissue samples) or 90 μg (A549) of protein in 3× Laemmli buffer was separated on 4–12% Tris-glycine SDS–polyacrylamide gel electrophoresis gels (Thermo Fisher Scientific) and transferred to polyvinylidene difluoride membranes (Millipore). PageRuler 10–180 kDa (Thermo Scientific) served as the protein ladder for size determination. Membranes were blocked with Tris buffered saline with 5% milk powder for 1 h at room temperature and incubated overnight with a 1:1,000 dilution of anti-p53 (clone D07, mouse monoclonal antibody, abcam, no. ab80644) and anti-HSP90 (clone C45G5, rabbit monoclonal antibody, Cell Signaling, no. 4877) at 4 °C, washed in Tris buffered saline with Tween 20 and incubated for 1 h with a 1:10,000 dilution of fluorescence-labelled secondary anti-mouse (IRDye 800CW goat anti-mouse, LI-COR, no. 926-32210) and anti-rabbit (IRDye 800CW goat anti-rabbit, LI-COR, no. 926-32211) antibodies. Blots were analysed with the Odyssey CLx imaging system (LI-COR).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Source link


Leave a Reply

Your email address will not be published. Required fields are marked *