Whole-genome doubling drives oncogenic loss of chromatin segregation - Nature

Cell culture

Table of Contents

hTERT-RPE-1 WT and hTERT RPE-1 TP53^−/− (46, XX)²⁷ cells were a gift from J. Korbel. The cells were grown in DMEM/F-12, GlutaMAX (10565018) supplemented with 10% FBS (Thermo Fisher Scientific, 10270106) and 1% antibiotic–antimycotic (Thermo Fisher Scientific, 15240062). CP-A (KR-42421) (47, XY) cells were purchased from the American Type Culture Collection (CRL-4027). CP-A TP53^−/− cells were generated in this study using a CRISPR–Cas9 approach. The cells were grown in MCDB-153 medium (Sigma-Aldrich, M7403) supplemented with 20 mg l^–1 adenine (Sigma-Aldrich, A2786), 400 µg l^–1 hydrocortisone (Sigma-Aldrich, H0135), 50 mg l^–1 bovine pituitary extract (Thermo Fisher Scientific, 13028014), 1× insulin-transferrin-sodium selenite media supplement (Sigma-Aldrich, I1884), 8.4 µg l^–1 cholera toxin (Sigma-Aldrich, C8052), 4 mM glutamine (Sigma-Aldrich, G7513), 5% FBS and 1% antibiotic–antimycotic. K562 (67, XX) cells were purchased from DSMZ (ACC 10) and cultured in RPMI medium (Thermo Fisher Scientific, 11875093) supplemented with 10% FBS and 1% penicillin–streptomycin (Thermo Fisher Scientific, 15140122). All cell lines were grown in a sterile, humidified incubator at 37 °C with 5% CO₂ and passaged every 3–5 days, depending on the cell line, to maintain appropriate cell densities.

Mice

All animals used in the study were NOD SCID gamma (NSG) female mice maintained at the EPFL animal facilities. Mice were kept in a 12 h-light 12 h-dark cycle, at 18–23 °C with 40–60% humidity, as recommended and in accordance with the regulations of the Animal Welfare Act (SR 455) and Animal Welfare Ordinance (SR 455.1). Mice were subcutaneously injected with 5 million cells in a 2:1 ratio of cell mixture to Matrigel basement membrane matrix (Corning, 354234), and tumour growth was monitored. Animal experiments were performed in accordance with the Swiss Federal Veterinary Office guidelines and as authorized by the Cantonal Veterinary Office (animal licence VD2932.1). Animals were sacrificed if the tumour volume was ≥1 cm³.

Tissue dissociation

Subcutaneous tumours from mice were dissociated using a human tumour dissociation kit (Miltenyi Biotec, 130-095-929) with an enzyme cocktail and a gentleMACS dissociator with heaters (Miltenyi Biotec). The cell suspension was then strained through a 40 µm cell strainer (Corning, 352340). Samples were treated with 1× Red Blood Cell Lysis solution (Miltenyi Biotec, 130-094-183) for 10 min at 4 °C, and then spun down at 300g for 5 min and resuspended in 0.5% BSA in PBS. Last, mouse cells were removed from the sample using a Mouse Cell Depletion kit (Miltenyi Biotec, 130-104-694) following the manufacturer’s protocol.

CRISPR cloning

The sgRNA sequences targeting TP53 (ref. ²⁷) were cloned into the pSpCas9(BB)-2A-GFP vector (PX458), which was a gift from F. Zhang (Addgene plasmid 48138; http://n2t.net/addgene:48138; RRID:Addgene_48138). In brief, 10 μM final concentration of each forward and reverse oligonucleotide were annealed in 1× T4 ligation buffer (New England Biolabs, B0202S) at 37 °C for 30 min, heated up at 95 °C for 5 min and ramped down by 0.1 °C s^–1 to room temperature. In parallel, 10 μg of PX458 vector was digested with 10 U of BsmBI (New England Biolabs, R0580L) in 1× NEBuffer 3.1 at 55 °C for 1 h. Digested plasmid was run on a 1% agarose gel, extracted and purified using NucleoSpin Gel and PCR Clean-up (Macherey-Nagel, 740609) following the manufacturer’s instructions. Annealed CRISPRs and digested plasmid were ligated using 5 U T4 DNA ligase (Thermo Fisher Scientific, EL0011) in 1× T4 DNA ligase buffer for 10 min at room temperature. The plasmid was then added to DH5α chemically competent bacteria and kept for 30 min on ice, followed by heat shock at 42 °C for 45 s. The bacteria were cooled down on ice and recovered in SOC medium for 1 h at 37 °C. Transformed bacteria were grown on ampicillin-containing growth medium at 37 °C overnight. Bacterial colonies were picked and expanded in LB broth supplemented with ampicillin for 12 h at 37 °C. Plasmid DNA was extracted using a Plasmid Plus Midi kit (Qiagen, 12945) according to the manufacturer’s protocol. The plasmids were verified by Sanger sequencing (Microsynth) with hU6 primers.

CP-A TP53
^−/− cell line generation

CP-A WT cells were grown to 60–70% confluency in 10 cm plates. Next, 5 μg of PX458 plasmid containing TP53-targeting sgRNAs were diluted into 200 µl Opti-MEM I reduced serum medium (Thermo Fisher Scientific, 31985062). Then 15 µl FuGENE HD transfection reagent (for a 3:1 transfection reagent:DNA ratio) (Promega, E2312) was added to the DNA and incubated at room temperature for 15 min. The mixture was then added to a plate drop-by-drop and mixed by shaking. Cells were incubated for 48 h at 37 °C in a humidified incubator. Transfected cells, expressing GFP, were single-cell sorted on a BD FACSAria Fusion instrument (BD Biosciences). Clones were allowed to expand, then individually tested by immunoblotting for TP53 protein levels following 24 h of treatment with 3 µM doxorubicin (Cayman Chemical, 15007).

WGD induction

Cells were seeded to 60–70% density. For mitotic slippage induction, 0.1 μg ml^–1 nocodazole (Sigma-Aldrich, M1404) was added to the growth medium and CP-A and K562 cells were incubated for 72 and 48 h, respectively. For cells with an elongated G1 phase after tetraploidization, WGD in CP-A TP53^−/− cells was induced with 0.1 μg ml^–1 nocodazole for 72 h and treated with 0.5 μM of the CDK4/6i palbociclib (Sigma-Aldrich, PZ0383) for the last 16 h of the WGD induction protocol. For cytokinesis failure inductions, RPE cells were incubated for 24 h with 0.1 μg ml^–1 nocodazole-containing medium. Following nocodazole treatment, the cells were exposed for an additional 24 h to 4 μM dihydrocytochalasin B (Cayman Chemical, 20845). The treatment was removed and cells were allowed to recover for 48 h to allow transition from a binucleated to a mononucleated state. Alternatively, WGD was induced through cytokinesis failure in RPE TP53^−/− cells by incubation for 24 h with 9 μM of the CDK1 inhibitor RO-3306 (Sigma-Aldrich, SML0569) for G2 synchronization. The compound was washed off and cells were then treated with 4 μM dihydrocytochalasin B for 24 h. Cells were allowed to recover for 48 h, and tetraploid cells were sorted on the basis of cell cycle staining with 1 μg ml^–1 Hoechst 33342 (Thermo Fisher Scientific, H1399).

Isolation of spontaneous high-ploidy cells

CP-A TP53^−/− cells were stained with 1 μg ml^–1 Hoechst 33342 for cell cycle profiling. Dividing cells with high ploidy (high Hoechst 33342 signal, >4N peak) were bulk sorted. Cells were allowed to recover overnight and then fixed for downstream analyses.

CIN induction

CIN was induced in RPE TP53^−/− cells using a modified protocol described previously⁵¹. In brief, the cells were synchronized at the G1/S border with 5 mM thymidine (Sigma-Aldrich, T9250) for 24 h. Six hours after thymidine block release, the cells were treated with 500 nM of the MPS1 inhibitor reversine⁵² (Sigma-Aldrich, R3904) for 12 h. Before processing for downstream analyses, cells were allowed to recover for 6 h.

Cell cycle staining

Cells were collected and washed with PBS (Thermo Fisher Scientific, 10010023). Permeabilization was performed in 0.01% Triton X-100 (AppliChem, A1388) in PBS for 30–60 min at 4 °C. Following PBS washes, the cells were fixed and stained with FxCycle PI/RNase staining solution (Thermo Fisher Scientific, F10797) overnight at 4 °C in the absence of light. Propidium iodide intensity for cell cycle detection was measured using Guava easyCyte (Luminex) and Galios (Beckman Coulter) cytometers and analysed using FlowJo (v.10.8) (BD).

Karyotyping

Cells were treated with 20 ng ml^–1 KaryoMAX colcemid solution (Thermo Fisher Scientific, 15212012) for 2 h at 37 °C in a humidified incubator. Cells were collected in 0.8% sodium citrate solution (Sigma-Aldrich, S4641) and maintained at 37 °C for 30 min. The cell suspension was fixed with 3:1 methanol:acetic acid (Chemie Brunschwig, M/4000/17; FSHA/0406/PB08) added drop-by-drop, washed twice in the fixative solution and incubated overnight at −20 °C. Cells were dropped onto a glass slide (Thermo Fisher Scientific, J1800AMNZ). Slides were incubated for 2 min in a humidified chamber at 65 °C and air-dried at room temperature for 30 min. Slides were mounted and DAPI-stained concomitantly with ProLong Diamond antifade mountant with DAPI (Thermo Fisher Scientific,. P36962) according to the manufacturer’s instructions. Metaphases were imaged at ×100 resolution on a Zeiss Axioplan upright microscope. Images were analysed using Fiji (v.2.9.0)⁵³.

Immunoblotting

For non-histone proteins, cells were incubated in RIPA buffer consisting of 50 mM Tris-HCl pH 8.0, 1 mM EDTA, 1% Triton X-100, 0.5% sodium deoxycholate (Sigma-Aldrich, D6750), 0.1% SDS and 150 mM NaCl, for 30 min on ice for protein extraction. For histone extraction, cells were initially incubated with PBS lysis buffer consisting of 1% Triton X-100, 1 mM DTT (AppliChem, A2948), 1× protease inhibitor cocktail for 15 min at 4 °C and spun down at 12,000g. The resulting pellet was incubated overnight with 0.2 N hydrochloric acid (AppliChem, A5634). Lysates were then centrifuged at 12,000g for 10 min at 4 °C. Supernatant containing the protein fraction was isolated and mixed with 6× Laemmli sample buffer (12% SDS w/v, 60 mM Tris-HCl, pH 6.8, 50% glycerol (Fisher Scientific, G/0650), 600 mM DTT and 0.06% bromophenol blue (Sigma-Aldrich, B5525)) at 96 °C for 5 min. Mid-molecular weight proteins and histones were then separated on 12% or 15% SDS–PAGE gels, respectively, whereas high-molecular weight proteins were separated on a 7.5% Mini-PROTEAN TGX precast protein gel (Bio-Rad, 4561023). All gels were transferred onto 0.2 µm nitrocellulose membranes (Bio-Rad, 1704270) using a Trans-Blot Turbo transfer system (Bio-Rad, 1704150) according to the manufacturer’s specifications. The membranes were blocked in a solution containing 5% milk (AppliChem, A0830) and 0.1% Tween-20 (Fisher Bioreagents, 10113103) in PBS for 30 min at room temperature. Blots were incubated in the same milk solution at either 4 °C overnight with primary antibodies against TP53 (Santa Cruz Biotechnology, sc-126; 1:500), β-actin (Cell Signaling Technology, 4967; 1:5,000), CTCF (Active Motif, 61311; 1:1,000), RAD21 (Abcam, ab992; 1:5,000) and α-actinin (Cell Signaling Technology, 6487; 1:1,000), or for 1 h at room temperature with primary antibodies against trimethyl-histone H3 (Lys9) (Cell Signaling Technology, 13969; 1:1,000), acetyl-histone H3 (Lys27) (Cell Signaling Technology, 8173; 1:1,000), trimethyl-histone H3 (Lys27) (Cell Signaling Technology, 9733; 1:1,000), and histone H3 (Cell Signaling Technology, 4499; 1:5,000). The membranes were incubated with fluorescent labelled goat anti-mouse (LI-COR Biosciences, 926-68070; 1:10,000) or goat anti-rabbit (LI-COR Biosciences, 926-32211; 1:10,000) for 2 h at room temperature and imaged using an Odyssey CLx imaging system (LI-COR Biosciences). Alternatively, the membranes were incubated with HRP-conjugated goat anti-mouse antibody (Merck, AP308P; 1:5,000) or goat anti-rabbit antibody (Merck, AP307P) for 1 h at room temperature. Blots were incubated with Amersham ECL western blotting detection reagent (GE Healthcare, RPN2232) according to the manufacturer’s instructions, and captured using a Fusion FX6 Edge imaging system (Witec). Images were analysed using Fiji (v.2.9.0)⁵³.

Immunofluorescence

Cells were cultured on coverslips coated with poly-d-lysine (Sigma-Aldrich, P7280) and incubated in standard conditions. Cells on coverslips were fixed with ice-cold methanol at 4 °C for 30 min. Cells were washed with PBS multiple times and incubated with 5% BSA (Sigma-Aldrich, A7906) at room temperature for 30 min. Next coverslips were incubated with primary antibodies at the indicated concentrations against pericentrin (0.1 μg ml^–1; Abcam, ab4448) and α-tubulin (0.5 μg ml^–1; Sigma-Aldrich, T6074) diluted in 1% BSA for 1 h at room temperature in a humidified chamber. Coverslips were washed with PBS and incubated with the fluorescent secondary antibodies anti-mouse IgG-Alexa Fluor 594 (2 μg ml^–1; Thermo Fisher Scientific, A-11005) and anti-rabbit IgG-Alexa Fluor 488 (2 μg ml^–1; Thermo Fisher Scientific, A-11034) diluted in 1% BSA for 1 h at room temperature in the dark. Coverslips were washed in PBS followed by mounting and counterstaining with DAPI with ProLong Diamond antifade mountant. Cell images were captured at ×63 resolution on a Zeiss Axioplan upright microscope. Images were analysed using Fiji (v.2.9.0)⁵³.

Soft-agar assay

For each condition, 100,000 cells resuspended in complete MDCB-153 medium were mixed in a 1:1 ratio with 0.7% sterile noble agar (Thermo Fisher Scientific, J10907). Cells were plated on Costar ultralow attachment plates (Corning, 3473) on top of a mixture of 1:1 MCDB-153 medium and 1.4% sterile noble agar. The mixture was allowed to solidify in a humidified atmosphere at 37 °C overnight, then fresh complete MCDB-153 medium was added on top of the layers of agar. Samples were incubated for up to 10 weeks in normal conditions, and the medium was replaced twice a week. Individual colonies were picked from the agar layer and cultured for downstream analysis. Last, the plates were stained with a solution of 0.5% crystal violet (Thermo Fisher Scientific, 405830250) in 20% ethanol for 30 min, washed with PBS and imaged.

Hi-C library preparation and analysis

Hi-C library preparation

Bulk Hi-C library preparation was performed as previously described^11,22, with minor modifications. Around 1–2 million cells were collected and fixed with 2% formaldehyde (Thermo Fisher Scientific, 11483217). The reaction was quenched with 200 mM glycine (VWR, 101194M) final concentration, cells were washed with PBS (Thermo Fisher Scientific, 10010023) and lysed in a solution containing 10 mM Tris-HCl pH 8.0 (Thermo Fisher Scientific, 15568025), 10 mM NaCl (Sigma-Aldrich, S6546), 0.2% IGEPAL CA-630 (Sigma-Aldrich, I8896) and 1× proteinase inhibitor cocktail (Roche, 11697498001) at 4 °C for 30 min. Resulting nuclei were resuspended in 1× NEB3.1 buffer (New England Biolabs, B7203S). The suspension of nuclei was incubated with 0.11% SDS (Carl Roth, CN30) final concentration for 10 min at 65 °C. The reaction was quenched with 1% Triton X-100 (AppliChem, A1388), and nuclei were digested with 100 U MboI restriction enzyme (New England Biolabs, R0147) at 37 °C overnight. The restriction enzyme was inactivated according to the manufacturer’s specifications, and digested nuclei were washed and resuspended in 1× NEB3.1. Digested ends were then marked with biotin through incubation in 0.03 mM biotin-14-dATP (Thermo Fisher Scientific, 19524016), 0.03 mM dCTP, 0.03 mM dGTP, 0.03 mM dTTP (Promega, U1420) and 50 U Klenow DNA polymerase I (New England Biolabs, M0210M) for 4 h at room temperature. Resulting blunt-ends were proximally ligated with 50 U T4 DNA ligase (Thermo Fisher Scientific, EL0011), 1× T4 DNA ligase buffer (Thermo Fisher Scientific, B69), 5% PEG, 1% Triton X-100 and 0.1 mg ml^–1 BSA (New England Biolabs, B9000S) for 4 h at room temperature. Crosslink reversal was performed on the proximity-ligated chromatin through incubation with 300 mM NaCl and 1% SDS overnight at 68 °C. The sample was then treated with 50 μg ml^–1 RNase A (Thermo Fisher Scientific, EN0531) for 30 min at 37 °C, followed by 400 μg ml^–1 proteinase K (Promega, V3021) at 65 °C for 1 h. DNA was purified by precipitation with 1.6 volumes of pure ethanol and 0.1 volumes of sodium acetate, pH 5.2 (Thermo Fisher Scientific, R1181) at −80 °C. DNA was eluted and then fragmented by sonication at 80 V peak incidence power, 10% duty factor, 200 cycles per burst for 60–80 s with an E220 focused-ultrasonicator (Covaris). Sheared DNA was size-selected for library preparation using AMPure XP beads (Beckman Coulter, A63881). Next, biotin-marked fragments were isolated using Dynabeads MyOne Streptavidin C1 (Thermo Fisher Scientific, 65001), and all subsequent steps were performed on the bead-bound DNA fraction. Hi-C library preparation continued with an end polishing reaction, which involved incubation of DNA with 1× T4 ligase buffer (New England Biolabs, B0202S), 2.5 mM each dNTP, 50 U T4 polynucleotide kinase (New England Biolabs, M0201), 12 U T4 DNA polymerase (New England Biolabs, M0203) and 5 U Klenow DNA polymerase I, at room temperature for 30 min. PolyA tail was added by incubating the DNA sample in 1× NEBuffer 2 (New England Biolabs, B7002S) with 0.5 mM dATP and 25 U Klenow fragment (3′→5′ exonuclease) (New England Biolabs, M0212) at 37 °C for 30 min. DNA fragment ends were then ligated to Illumina TruSeq unique dual indexes (Integrated DNA Technologies) in 1× T4 ligation buffer with 5% PEG and 15 U T4 DNA ligase for 2 h at room temperature or overnight at 16 °C. Last, libraries were PCR amplified using Illumina forward (AATGATACGGCGACCACCGAGATCTACAC) and reverse (CAAGCAGAAGACGGCATACGAGAT) primers and KAPA HiFi HotStart ReadyMix (Roche, KK2602) for 6–10 cycles. Resulting fragments were size-selected using AMPure XP beads. Libraries were sequenced in a PE150 configuration on HiSeq X, NovaSeq 6000 or HiSeq 2500 systems (Illumina).

Generation of Hi-C contact maps

For each library replicate, reads were mapped to the human hg19 reference genome using bwa mem (v.0.7.17)⁵⁴ and processed using the Juicer pipeline (v.1.6)⁵⁵. For each sample, Hi-C maps were generated at the following resolutions: 10 kb, 20 kb, 25 kb, 50 kb, 100 kb, 250 kb, 500 kb, 1 Mb and 10 Mb. Once the concordance between replicates of the same biological condition was assessed, Hi-C maps of the same condition were merged using the mega.sh script provided in the Juicer pipeline. All Hi-C maps were normalized using the Knight–Ruiz method (KR)⁵⁶ implemented in the Juicer pipeline.

Definition of Hi-C compartments

Compartments were called using the Calder pipeline²⁹ on KR-normalized Hi-C maps at 50 kb resolution. Calder returns a segmentation of the genome in compartments where each segment is assigned both a compartment rank (a real number between 0 and 1) and a compartment label (B.2.2, B.2.1, …, A.1.2, A.1.1), which is a discretization of the rank in eight different categories. Compartment ranks and labels correlate with the chromatin state of the DNA region, with values close to 0 being more B-like compartments and values close to 1 being more A-like compartments.

Assessment of similarity between Hi-C contact maps

Pairwise comparisons between intrachromosomal contact maps were based on the following metrics: a correlation measure between the contacts, stratified by the distance between the interacting loci; the conservation of compartment domains and their boundaries; the correlation at the level of boundary insulation; and the correlation at the level of Calder compartment rank.

Replicates of the same biological condition (control versus control, WGD versus WGD) and samples of different conditions (control versus WGD, control vs 20-weeks post-WGD tumours) were compared, as well as samples of a different cell line (control versus GM12878 from ref. ¹²). Inter-replicate comparisons and intercell line comparisons gave a reference baseline of random fluctuations and extensive chromatin changes, respectively, for each score.

Correlation of contacts (stratum-adjusted correlation coefficient)

The stratum-adjusted correlation coefficient⁵⁷ was used as implemented in the HiCRep.py package⁵⁸. The maximum genomic distance to test was set to 10 Mb, and Hi-C maps were binned at 100 kb and smoothed with a window of H = 3 bins. For each comparison, a stratum-adjusted correlation coefficient value was computed for each chromosome.

Conservation of compartment domains

Compartment domains were called at 50 kb resolution using the Calder pipeline on the KR-normalized Hi-C matrices.

Given two compartment domain sets identified on the same chromosome in two samples, the measure of concordance⁵⁹ was calculated, which was previously defined to compare two clustering assignments. The measure of concordance is a real number bounded between 0 and 1, with 1 representing identical chromosome segmentation and 0 maximum discordance.

Conservation of insulating boundaries

Hi-C insulation was computed as previously described³⁰. Insulation scores for each chromosome were calculated using the FANC library⁶⁰, specifically, the InsulationScores.from_hic function on the KR-normalized intrachromosomal Hi-C matrices at 50 kb resolution using a sliding window of 1 Mb. Sliding windows with more than 20% of missing values were not considered. Scores were normalized by the geometric mean chromosome-wise and finally log₂-scaled. The final score is therefore centred at 0, with local minima representing putatively TAD boundaries.

Comparisons between samples were performed by computing the Spearman correlation coefficient of the insulation scores for each chromosome.

Hi-C compartment similarity

The compartment segmentation given by Calder was split in bins of 50 kb, assigning to each bin the compartment rank of the segment it belongs to. The similarity between two samples was then computed separately for each chromosome as the Spearman correlation of the two binned rank vectors.

Hi-C interchromosomal similarity

The interchromosomal interactions for each pair of Hi-C maps were compared by considering separately the interactions between each pair of different chromosomes in the two samples. The Spearman correlation coefficient of the raw interaction counts was computed between the two samples for each chromosome pair. For each Hi-C comparison, therefore, a correlation value for each pair of chromosomes was obtained.

Analysis of Hi-C interchromosomal interactions

To determine interaction biases between pairs of chromosomes, Hi-C interactions were aggregated between each pair of chromosomes, obtaining a 23 × 23 interaction matrix I. The matrix was balanced using iterative correction⁶¹ to remove interaction biases due to the length of the chromosomes (such that the marginal sum of each chromosome is 1). This resulted in a normalized matrix I_ICE. This normalization is similar to the one presented in ref. ¹¹, with the advantage of ensuring constant marginals. When compared, both normalizations produced comparable results.

Chromosomes were then divided into two clusters on the basis of their interaction profile in I_ICE: chromosomes from 1 to 14 and X were categorized as long, whereas chromosomes from 15 to 22 were categorized as short.

To compare control and WGD interchromosomal interaction matrices, their ratio R = log₂[I_ICE(WGD)/I_ICE(Control)] was computed. Chromosome interactions were then split into three categories on the basis of the chromosome cluster of their ends: long–long, long–short and short–short. Chromosome interaction categories were compared by computing a Mann–Whitney test P value between R values of each pair of categories.

Hi-C interchromosomal map balancing at 10 Mb resolution

To visualize interchromosomal Hi-C maps at 10 Mb resolution, Iterative Correction using the Cooler package⁶² was performed. Counts were normalized such that each bin had total number of interchromosomal interactions equal to 1.

Analysis of Hi-C intercompartmental interactions

The genomic segmentation in eight classes given by Calder (B.2.2 to A.1.1) was considered. Each 50 kb genomic bin was then associated to the compartment level it belongs. For each chromosome, its intrachromosomal contacts were extracted at 1 Mb resolution and then normalized by genomic distance¹² using the FANC package⁶⁰. These interactions were then upscaled to 50 kb resolution by assigning to each 50 × 50 kb pixel the value of the 1 × 1-Mb superpixel it belonged to. This procedure was performed to smooth the normalized interaction values and to ensure enough coverage for each genomic distance. For each 50 kb genomic bin b, the sum of the normalized interactions between that bin and the bins belonging to the eight compartment level classes was computed separately, thus obtaining one value s_b(comp) for each compartment level comp. These values were then divided by the total sum of interactions of that bin T_b. To consider the bias induced by the amount of chromosome covered by each compartment level, these values were further divided by the percentage of bins belonging to each compartment level B_comp, thus obtaining z_b(comp) = s_b(comp)/T_bB_comp. The obtained value was finally divided by their sum to obtain for each bin f_b(comp) = z_b(comp)/Σ_cz_b(c), which is a number between 0 and 1 for each compartment level representing the level of segregation of each compartment level for that bin. For each bin b, it was defined as CScore_b the segregation level of the compartment level to which the bin belongs to. This definition is an adaptation of the compartment score computed in ref. ⁶³, but applied at the bin level.

Given two conditions, for example, WGD and control, the difference for each pair of subcompartments comp₁, comp₂ was computed as follows:

$$\begin{array}{l}{\sigma }_{{{\rm{comp}}}_{1}}({{\rm{comp}}}_{2})=\frac{{\rm{No.\; times}}\,{f}_{b}^{{\rm{WGD}}}({{\rm{comp}}}_{2}) < {f}_{b}^{{\rm{Control}}}({{\rm{comp}}}_{2})}{{\rm{No.\; times}}\,{f}_{b}^{{\rm{WGD}}}({{\rm{comp}}}_{2})\ge {f}_{b}^{{\rm{Control}}}({{\rm{comp}}}_{2})},\\ \,\,\,\,{\rm{for}}\,b\in {{\rm{comp}}}_{1}\end{array}$$

and

$$\sigma \left({{\rm{comp}}}_{1},{{\rm{comp}}}_{2}\right)=-{\log }_{2}\left(\frac{1}{2}\left({\sigma }_{{{\rm{comp}}}_{1}}\left({{\rm{comp}}}_{2}\right)+{\sigma }_{{{\rm{comp}}}_{2}}\left({{\rm{comp}}}_{1}\right)\right)\right).$$

${\sigma }_{{{\rm{comp}}}_{1}}\left({{\rm{comp}}}_{2}\right)$ represents the ratio between the number of bins in comp₁, which lose compartment segregation with comp₂, and the number of bins in comp₁, which gain compartment segregation with comp₂.

$\sigma \left({{\rm{comp}}}_{1},{{\rm{comp}}}_{2}\right)$ is simply the average of ${\sigma }_{{{\rm{comp}}}_{1}}\left({{\rm{comp}}}_{2}\right)$ and ${\sigma }_{{{\rm{comp}}}_{2}}\left({{\rm{comp}}}_{1}\right)$, which makes it a symmetric measurement of average segregation changes between compartment levels comp₁ and comp₂. The –log₂ of this number was computed for representation purposes, with positive and negative log₂ ratios indicating gain and loss of contacts, respectively, between the two compartment levels.

To specifically assess the extent of loss of segregation for a specific compartment level comp, a similar strategy was adopted:

$${\sigma }_{{\rm{comp}}}=\frac{{\rm{No.\; times}}\,C{{\rm{Score}}}_{b}^{{\rm{WGD}}} < C{{\rm{Score}}}_{b}^{{\rm{Control}}}}{{\rm{No.\; times}}\,C{{\rm{Score}}}_{b}^{{\rm{WGD}}}\ge C{{\rm{Score}}}_{b}^{{\rm{Control}}}},{\rm{for}}\,b\in {\rm{comp}}$$

was computed, which represents the ratio between the number of bins in comp losing segregation and the number of bins in comp gaining segregation. Values higher than 1 indicate loss of segregation, whereas values below 1 indicate gain of segregation.

Boundary insulation analysis

TAD boundaries in control and WGD samples were determined from insulation scores using the fanc.Boundaries.from_insulation_score function from the FANC package⁶⁰, looking at local minima of the score in the 400 kb region around the bin. Each boundary was assigned the insulation score corresponding to its position. Lower values of the score signify higher insulation capability of the boundary. The boundaries shared between the two samples (±50 kb) were then extracted. The top 300 insulating boundaries were selected as follows: control and WGD boundaries were separately ranked on the basis of their insulation scores. For each condition, the top 300 ranked boundaries were selected and their maximum insulation score (corresponding to the weaker boundary in the set) was determined, which was called ${I}_{{\rm{top}}300}^{{\rm{Control}}}$ and ${I}_{{\rm{top}}300}^{{\rm{WGD}}}$, respectively. An insulation threshold ${I}_{{\rm{top}}300}={\rm{\max }}\left({I}_{{\rm{top}}300}^{{\rm{Control}}},{I}_{{\rm{top}}300}^{{\rm{WGD}}}\right)$ was defined. Finally, shared boundaries between control and WGD having insulation scores smaller than ${I}_{{\rm{top}}300}$ were selected. It should be noted that this approach does not ensure that the final number of selected boundaries is exactly 300.

Independence of LCS measurement from Hi-C resolution and coverage per haploid copy

The aggregated map of RPE TP53^−/− WGD cells (218 million reads) was compared with one of the control replicates maps (108 million reads). Conversely, one replicate of the control (108 million reads) was compared with the aggregated map of the same control (221 million reads).

Detection of regions of significant CoREs

To determine significant CoREs, we developed an algorithm to identify contiguous genomic regions with consistently different compartment ranks computed using Calder. We refer to this method as DiffComp. A segmentation algorithm was designed as follows. Given two Hi-C experiments X and Y, the genomic segmentations of both in compartment domains was determined using Calder on the 50 kb resolution KR-normalized Hi-C matrices. Both segmentations were then binned in 50 kb bins, assigning to each bin its relative compartment rank. Thus, for each chromosome, compartmentalization in the two samples is represented as two numerical vectors C_X, C_Y.

The pairwise rank difference for each genomic bin were computed as ΔR_XY= C_X – C_Y. This vector represents the differential rank between the two experiments, with positive values indicating a shift towards active compartments and negative values indicating a shift towards inactive compartments.

The genome was segmented based on ΔR_XY using a recursive strategy. Given σ*, which represents the maximum allowed standard deviation in the signal that a segment can have before being split into subsegments, the procedure involves the following process.

Each chromosome is initially considered a single whole segment and then

(1)

The standard deviation of the segment σ(s) and its average value mean(s) were calculated.
(2)

If σ(s) < σ*, then the procedure stops and the segment is assigned mean(s) as value, which represents its subcompartment repositioning score.
(3)

Otherwise, the segment is split into subsegments depending on whether they are above or below the mean(s) value.
(4)

For each of the subsegments, the procedure is repeated from point (1).

The expected distribution of compartment changes can be computed using technical or biological replicates of the same experiment. An expected differential vector ΔR_E = C_R1 – C_R2 was computed using two replicates of RPE TP53^−/− control.

In this analysis, σ* = 0.1 was fixed, which is 1.3-times the standard deviation of ΔR_E. For each detected compartment repositioning segment s, an empirical P value as P value(s) = P(max{|ΔR_XY(s)|} < |E|) was computed, where E∈ΔR_E. This P value depends both on the average value of the segment and on its length, for which longer segments have higher statistical power.

The output of this method is a list of CoRE regions together with their average compartment repositioning score (which can vary from −1 and 1) and their computed empirical P value.

For each comparison studied, CoRE regions having an absolute average value above 0.1, an empirical P value below 0.01 and a segment length above 300 kb were considered.

CoRE overlap in CP-A TP53
^−/− colonies

To assess the amount of overlap between two sets of CoREs C₁, C₂ belonging to different sample comparisons, the two sets were divided in activations and inactivations on the basis of the sign of the compartment repositioning score (${C}_{1}^{{\rm{A}}},{C}_{1}^{{\rm{I}}},\,{C}_{2}^{{\rm{A}}},{C}_{2}^{{\rm{I}}}$). The CoREs of the same type coming from both sample comparisons (${C}_{12}^{{\rm{A}}},{C}_{12}^{{\rm{I}}}$) were merged together by stacking overlapping regions together (using the bedtools merge command from bedtools (v.2.30.0)^64,65), finally creating a consensus set of CoREs concatenating the two sets (${C}_{12}=[{C}_{12}^{{\rm{A}}},{C}_{12}^{{\rm{I}}}\,]$). Each consensus CoRE was checked for overlapping with a CoRE of the same type in C₁ and C₂. Consistent CoREs were considered overlapping when two CoREs of the same type were overlapping and at least one of the two was a consistent CoRE.

Tracing compartment repositioning from WGD to tumour time points

For each of the tumours, the CoRE regions that passed the previously defined statistical filters were considered. For each of these regions s, the corresponding Calder segmentation in the WGD and control time points were extracted. The average compartment rank of the CoRE region in the two previous time points were then computed, which were defined as r_WGD(s) and r_Control(s), respectively. The compartment rank of the CoRE region in the tumour were defined as r_Tumour(s) = r_Control(s) + mean(s), where mean(s) is given by the CoRE detection algorithm. A parameter ε was then defined, which is the minimum absolute rank difference between WGD and control, namely |r_WGD(s) – r_Control(s)|, to classify the CoRE region as activating or inactivating in WGD with respect to control.

Given ε, CoRE regions in tumours can be discriminated on the basis of the type of change in tumours (activating or inactivating) as well as the type of change at WGD (unchanged, activating or inactivating). The number of CoRE regions belonging to each of the six combinations was counted.

A CoRE region was defined as ‘consistent’ when it belongs to the activation–activation or the inactivation–inactivation class. The percentage of consistent CoRE regions were counted with different choices of parameter ε. Observing the steepness of the curves in the three tumour samples, a shared parameter to ε = 0.05 was fixed.

Comparing different segmentation algorithms for CoRE detection

The segmentation strategy in DiffComp was compared to the circular binary segmentation (CBS) algorithm, which was previously developed for the segmentation of copy number changes⁶⁶. CBS was applied to the Calder differential rank vector ΔR_XY = C_X – C_Y, which was computed as explained above. Segments detected using CBS were annotated with their compartment repositioning score and P values as described above. CoREs were then filtered on the basis of the repositioning score, P value and size as defined above. The CoREs detected using DiffComp and the ones detected with CBS were compared using as the benchmark the RPE TP53^−/− 20 week post-WGD tumour 1 versus RPE TP53^−/− control comparison. We then compared the breakpoint positions between the two segmentation strategies, the corresponding sets of significant CoREs and the traceability of these events to subcompartment changes occurring in WGD cells.

Hi-C phasing in RPE TP53
^−/− 20-week post-WGD tumours

Integrated phasing was performed using Hi-C reads from the pooled RPE Hi-C replicates (control). Single-nucleotide variants (SNVs) were first identified from the Hi-C reads using Freebayes⁶⁷ (version v1.3.2-46-g2c1e395-dirty). SNVs were phased into two haplotypes, namely Hap1 and Hap2, using a previously described integrated phasing strategy⁶⁸. In brief, population-based phasing was first conducted using SHAPEIT2 (ref. ⁶⁹; version v2.904.3.10.0-693.11.6.el7.x86_64) with hg19 1000 Genomes project phase 3 as a reference panel. Pseudo-reads generated from the population haplotype likelihood were then combined with the Hi-C reads as input to HapCUT2 (ref. ⁷⁰) for the second round of phasing. This approach returned several phasing blocks for each chromosome. Phasing information was retained only from the most variants phased block, which harbours the majority of input SNVs (>90%). Only Hi-C interactions for which anchors mapped strictly to one of the two haplotypes were retained for analysis, thus obtaining three sets of interaction types: Hap1–Hap1, Hap1–Hap2 and Hap2–Hap2.

Analysis of contacts between homologous chromosomes after WGD

After WGD, the rates both in cis and in trans contacts are expected to increase. In detail, putative in cis contacts should increase by a factor of 3 following WGD, and in trans contacts should increase by a factor of 4. Hence, it is expected that the ratio of in trans (T) versus in cis (C) contacts increases after WGD as described below:

$$\begin{array}{ccc}{r}_{{\rm{Control}}}=\frac{{\rm{T}}}{2{\rm{C}}}{\rm{;}} & {r}_{{\rm{WGD}}}=\frac{4{\rm{T}}}{6{\rm{C}}}{\rm{;}} & \frac{{r}_{{\rm{WGD}}}}{{r}_{{\rm{Control}}}}=\frac{4}{3}=1.33\end{array}$$

To verify this prediction, in cis interactions were defined as all the Hi-C-phased interactions of the type Hap1–Hap1 and Hap2–Hap2, and in trans interactions all the Hi-C-phased interactions of type Hap1–Hap2. The following was computed:

$${r}_{{\rm{C}}{\rm{o}}{\rm{n}}{\rm{t}}{\rm{r}}{\rm{o}}{\rm{l}}}=\frac{{\rm{No.}}\,{({{\rm{H}}{\rm{a}}{\rm{p}}}_{1}-{{\rm{H}}{\rm{a}}{\rm{p}}}_{2})}^{{\rm{C}}{\rm{o}}{\rm{n}}{\rm{t}}{\rm{r}}{\rm{o}}{\rm{l}}}}{{\rm{No.}}\,{({\rm{H}}{\rm{a}}{{\rm{p}}}_{1}-{{\rm{H}}{\rm{a}}{\rm{p}}}_{1})}^{{\rm{C}}{\rm{o}}{\rm{n}}{\rm{t}}{\rm{r}}{\rm{o}}{\rm{l}}}+{\rm{No.}}\,{({{\rm{H}}{\rm{a}}{\rm{p}}}_{2}-{{\rm{H}}{\rm{a}}{\rm{p}}}_{2})}^{{\rm{C}}{\rm{o}}{\rm{n}}{\rm{t}}{\rm{r}}{\rm{o}}{\rm{l}}}}$$

$${r}_{{\rm{W}}{\rm{G}}{\rm{D}}}=\frac{{\rm{No.}}\,{({{\rm{H}}{\rm{a}}{\rm{p}}}_{1}-{{\rm{H}}{\rm{a}}{\rm{p}}}_{2})}^{{\rm{W}}{\rm{G}}{\rm{D}}}}{{\rm{No.}}\,{({\rm{H}}{\rm{a}}{{\rm{p}}}_{1}-{{\rm{H}}{\rm{a}}{\rm{p}}}_{1})}^{{\rm{W}}{\rm{G}}{\rm{D}}}+{\rm{No.}}\,{({{\rm{H}}{\rm{a}}{\rm{p}}}_{2}-{{\rm{H}}{\rm{a}}{\rm{p}}}_{2})}^{{\rm{W}}{\rm{G}}{\rm{D}}}}$$

and $\frac{{r}_{{\rm{WGD}}}}{{r}_{{\rm{Control}}}}$ separately for each chromosome, and for the genome-wide average ratio. Finally, $\frac{{r}_{{\rm{WGD}}}}{{r}_{{\rm{Control}}}}=1.25$ was obtained, close to the predicted value.

Calling copy number alterations from bulk and phased Hi-C reads

A strategy to impute broad CNVs from Hi-C data was designed as follows:

(1)

For each bin b, its coverage n_b was computed.
(2)

Bins overlapping genomic gaps and bins having ${n}_{b} < \bar{R}-\gamma M$ were excluded by the analysis, with $\bar{R}$ being the genome-wide coverage median, M being the median genome-wide absolute deviation of the coverage and $\gamma \in {\mathbb{N}}$ begin a defined parameter.
(3)

n_b was normalized by the median chromosome coverage ($\bar{{R}_{{\rm{C}}}}$), obtaining $\widetilde{{n}_{b}}={n}_{b}/\bar{{R}_{{\rm{C}}}}$. This step enables to identify copy number changes at the subchromosomal level.
(4)

The CBS algorithm⁶⁶ was run on $\widetilde{{n}_{b}}$. If a chromosome has no breakpoints, the entire chromosome is defined as a segment.
(5)

For each segment s, the median value of its genome-wide normalized coverage, ${w}_{s}={\rm{median}}{({n}_{b}/\bar{R})}_{b\in s}$ was computed.
(6)

CNVs were defined as follows: all segments or chromosomes having $\left|{w}_{s}-1\right|\ge t$, with t being a defined threshold representing the minimum absolute difference from the genome-wide median coverage a segment has to have to be defined as a CNV.

For bulk Hi-C data of the CP-A TP53^−/− colonies, a bin size of 2 Mb was used, with t = 0.4 and γ = 7. For phased Hi-C data, γ = 4 was used.

Detecting significant interactions in RPE TP53
^−/− control cells and post-WGD tumours

HiC-DC⁷¹ was used to compute the statistical significance of chromatin interactions at the bin level (20 kb resolution). The degree of freedom in the hurdle negative binomial regression model was set as 6. The sample size parameter was determined by trying 20 values in the (0.5,1) range with equal distance, then choosing the maximum value that did not result in optimization failure in R. Other parameters of HiC-DC were set as default.

RNA-seq protocol and analysis

RNA-seq library preparation

RNA was extracted from RPE TP53^−/− control cells and WGD cells using a RNeasy Mini kit (Qiagen, 74104) following the manufacturer’s protocol. Resulting RNA was processed for sequencing using a TruSeq Stranded mRNA kit (Illumina, 20020594) according to the supplier’s recommendations. Libraries were then sequenced on an Illumina NovaSeq 6000 platform in a PE150 configuration.

RNA-seq data processing and analysis

RNA-seq fastq files were analysed using the nfcore/rnaseq pipeline (v.3.8; https://nf-co.re/rnaseq) using as the aligner star_rsem (ref. ⁷²), mapping the reads to the hg19 genome. Differentially expressed genes between WGD and control were determined using DESeq2 (ref. ⁷³). Genes having an absolute log₂(FC) above 0.1 and a P value of <0.01 were considered significantly differentially expressed. Gene set enrichment analysis was performed using Enrichr⁷⁴.

Relationship between gene expression changes and LCS at WGD

Each gene was associated to the 50 kb bin containing its transcription start site. Each gene bin was then associated to the compartment rank computed by Calder in RPE TP53^−/− control and WGD and computed the difference (Δcompartment). To check the association with boundary insulation changes after WGD, the genes for which the transcription start site was in proximity of an insulation boundary in RPE TP53^−/− control (±50 kb) having an insulation score below −0.1 were considered. The percentage of upregulated and downregulated genes in proximity of boundaries gaining and losing insulation and the fold changes against the percentages in the total set of genes were computed.

scHi-C protocol and analysis

scHi-C library preparation

Single-cell Hi-C was performed using a modified protocol described previously⁷⁵. Fixation of the nuclei with formaldehyde, MboI digestion and biotin fill-in was performed in a pool of 1 million cells following the same procedure as described for bulk processing. Next in-nucleus proximity ligation was done with 50 U T4 DNA ligase, 1× T4 DNA ligase buffer, 5% PEG, 1% Triton X-100 and 0.1 mg ml^–1 BSA at 16 °C overnight, with light mixing. Only pools of nuclei with at least 75% of the population showing an integral nuclear membrane were considered for further processing. Nuclei were strained multiple times through a 10 µm nylon net filter (Merck-Millipore, NY1009000). Sample preparation was done using DispenKit (SEED Biosciences), and single nuclei were dispensed in skirted Eppendorf twin.tec PCR plate 96-wells (Eppendorf, 0030128648) containing 50 µl of NEBuffer 3.1 (New England Biolabs, B7203S) using the single cell isolator DispenCell (SEED Biosciences) following the manufacturer’s instructions. A nucleus passing through the tip of the DispenCell, which acts as a Coulter counter⁷⁶, leaves an electrical impedance change mark, which is proportional to the volume of the nucleus. For RPE TP53^−/− cells, a minimum impedance change of 75 Ω was detected for the diploid nucleus and 200 Ω for the tetraploid nucleus. A lower impedance change was associated with debris or unsuccessful induction of genome doubling in the case of the WGD condition; thus, such nuclei were not considered for further processing. To avoid processing of nuclei aggregates, dispensed single nuclei associated with a threshold higher than 400 Ω and 800 Ω for diploid and tetraploid nuclei, respectively, were also discarded. These ranges were set from quality metrics of a test scHi-C batch. Specifically, an unreasonable number of unique interactions of each fragment end (for example, >2 for diploid loci) would indicate a nuclei aggregate rather than a single nucleus. Following dispensing, the single nuclei were de-crosslinked by incubation at 65 °C overnight. Next each selected nucleus was mixed with 25 µl Dynabeads MyOne Streptavidin C1 and transferred to a 1.5 ml tube and incubated at room temperature for 1 h on a rotating wheel. The bead-bound fragments were digested with 10 U of AluI restriction enzyme (New England Biolabs, R0137) in 1× rCutSmart buffer (New England Biolabs, B6004) at 37 °C for 2 h. A-tailing reaction and adapter ligation were performed for each single cell as for the bulk Hi-C processing. Similarly, PCR amplification of the single cell libraries was performed in the same master mix as described above for bulk Hi-C, for 27–30 cycles. Libraries were cleaned using AMPure XP beads and then ran on a 2% agarose gel at 100 V for 50–60 min. Successful libraries presented a 300–700 bp smear, which was cut from the gel. DNA purification from the agarose was performed using NucleoSpin Gel and PCR clean-up (Macherey-Nagel, 740609) following the manufacturer’s instructions. An additional AMPure XP bead size selection was generally necessary to remove any primer dimer contamination. Libraries were sequenced on a NextSeq550 platform (Illumina) in a PE75 configuration.

scHi-C contact map generation and quality filtering

For each cell, paired-end R1 and R2 fastq files were separately aligned to the hg19 reference genome using bwa mem (v.0.7.17). The scHiCExplorer pipeline⁷⁷ was used to generate Hi-C contact maps in Cooler format⁶². Quality control of scHi-C interactions was performed as previously reported^75,78. Specifically, as it was previously reported that end-pairs covered by only one read are probably results of alignment or pairing errors of the sequencing machine⁷⁸, they were removed from the analysis. Cells for which the percentage of singleton interactions was above 75% were also removed from the analysis. Additionally, cells with fewer than 100,000 unfiltered interactions were removed. Finally, for the remaining end-pairs, which were supported by at least two duplicated reads, duplicates were removed. One cell was removed from the analysis because of the absence of interchromosomal interactions involving chromosome 1, which indicated the occurrence of technical issues during the library preparation.

Analysis of single-cell Hi-C inter-chromosomal interactions

Similar to bulk Hi-C, for each cell x passing the previously defined filters, inter-chromosomal interactions between each pair of chromosomes were aggregated and the ICE-balanced interaction matrix I_ICE(x) was obtained. A loss of chromosomal segregation score was then defined for each cell LCS(x) = LS(x)/(LL(x) + SS(x)), where: LS(x) = the number of balanced interactions between long and short chromosomes; LL(x) = the number of balanced interactions between long and long chromosomes; and SS(x) = the number of balanced interactions between short and short chromosomes.

The higher the LCS score, the higher the loss of chromosomal segregation in the analysed cell. The scores were compared between the control and WGD RPE TP53^−/− populations and a Wilcoxon two-tailed P value was computed.

Chromosome pairs having the highest interaction enrichment or depletion in the WGD population were identified by sorting chromosome pairs for each cell by their number of balanced interactions and then computing the average rank of each pair in the control and WGD populations.

Definition of scHi-C compartment segregation

A simplified strategy for compartment imputation was adopted. For each cell, the intrachromosomal contact matrices of each chromosome at 1 Mb resolution were extracted. Bins having zero marginal counts were removed from the analysis. The observed over expected matrix was then calculated as previously described¹¹, whereby the contact decay profile was computed in logarithmically increasing bins using the cooltools package⁶². The normalized matrix was centred around 0 by subtracting 1. Next, the Pearson correlation of each pair of bins was computed and, finally, the first two principal component analysis (PCA) components for each bin of the matrix were extracted. A and B compartments were assigned to each scHi-C bin associating the relative compartment in the bulk Hi-C by aggregating Calder segmentations of RPE TP53^−/− control samples into A and B regions. Segregation scores for single cells were then computed as the silhouette score between A and B clusters⁷⁹ for each chromosome, using the two previously determined PCA components as point coordinates in the two-dimensional cartesian plane.

Definition of single-cell compartment consistency across conditions

scHi-C bins were clustered on the basis of the two PCA components using the KMeans algorithm⁸⁰, with the number of clusters fixed to 2. Next the adjusted Rand index⁸¹ between the K-means clusters and the bulk A and B clusters was computed. Only intrachromosomal maps with a score above 0 were considered for analysis. The two K-means clusters were renamed into A and B, such that the correlation with the bulk compartments was maximized. The consistency of compartment calls across cells of the same biological condition (control or WGD) was calculated for each 1 Mb bin as follows: if A and B are respectively the number of cells in which the bin was called as the A or B compartment, the consistency of the bin is max(A, B)/(A + B).

Pseudo-bulk scHi-C analysis

Interchromosomal pseudo-bulk Hi-C values were derived from individual cell interchromosomal counts by summing all interactions for each chromosome pair, thus obtaining a 23 × 23 chromosome interaction matrix I. Observed and expected (O/E) chromosome level interactions were derived as described in ref. ¹¹. In brief, the number of observed interactions between a pair of chromosomes I_ij was divided by the number of possible interactions between the two chromosomes E_ij. E_ij was empirically estimated as the product between the total number of interchromosomal interactions of the first chromosome (C_i) and the second chromosome (C_j), divided by double the total number of interchromosomal interactions (N):

$${E}_{ij}=\frac{{{\rm{C}}}_{i}\times {{\rm{C}}}_{j}}{2N}$$

A chromosome-level interaction enrichment matrix OE = I/E was obtained.

To remove noise, a correlation was computed for each chromosome pair (i,j) as the Spearman correlation of the vectors corresponding to the rows of the two chromosomes in the OE matrix (OE_i, OE_j). A 23 × 23 correlation matrix ρ was obtained such that ρ_ij = Spearman (OE_i, OE_j).

Differences in correlation between control and WGD in pseudo-bulk Hi-C were calculated as the log₂ ratio of the correlations shifted by 1:

$${\sigma }_{i,j}={{\rm{\log }}}_{2}\left(\frac{{\rho }_{{ij}}^{{\rm{WGD}}}+1}{{\rho }_{{ij}}^{{\rm{Control}}}+1}\right)$$

Calling copy number alterations from scHi-C

The same procedure as defined for bulk and phased Hi-C data was performed, using as bin size of 5 Mb and a maximum absolute deviation threshold t = 0.4.

scRNA-seq and analysis

Sequencing

scRNA-seq was performed using a Chromium Next GEM Single Cell 3′ kit v.3.1 (10x Genomics) following the manufacturer’s protocol. The number of cells targeted for each condition was 3,000. Resulting libraries were sequenced on a NovaSeq 6000 or HiSeq 4000 system (Illumina).

scRNA-seq read alignment and data processing

The sequencing reads for all the samples (RPE TP53^−/− control, RPE TP53^−/− 6-weeks post-WGD, RPE TP53^−/− 20-weeks post-WGD, RPE TP53^−/− 20-weeks post-WGD tumours T1–T3) were aligned using the human reference transcriptome (hg19, Ensembl-87 build) with the 10x Genomics Cell Ranger pipeline (v.3.1.1)⁸² with default parameters. The Seurat R package (v.3.1.5)⁸³ was utilized for data processing. Raw unique molecular identifier (UMI) read count data for each sample were read as Seurat data objects by keeping genes expressed in at least one cell. The following number of cells were acquired and retained after filtering for each sample: RPE TP53^−/− control, 3,996 cells acquired and 2,180 cells retained; RPE TP53^−/− 6-weeks post-WGD, 3,716 acquired and 3,475 cells retained; RPE TP53^−/− 20-weeks post-WGD, 2,727 cells acquired and 1,976 cells retained; RPE TP53^−/− 20-weeks post-WGD T1, 3,359 cells acquired and 2,851 cells retained; RPE TP53^−/− 20-weeks post-WGD T2, 3,467 cells acquired and 2,721 cells retained; and RPE TP53^−/− 20-weeks post-WGD, 3,790 cells acquired and 3,078 cells retained. On the intersection of genes across the dataset, 17,187 genes were found. The 17,187 genes were kept in all datasets and the merge function from the library was applied to merge the data. The dataset consisted of 17,187 genes and 21,055 cells in total. Cells having fewer than 200 and more than 10,000 genes expressed were removed from the analysis. Cells that had more than 8% of mitochondrial UMI genes expressed were also removed. The final dataset consisted of 17,187 genes and 16,281 cells. At this stage, the library depths were standardized using the NormalizeData function from Seurat, in which UMI counts for each cell were divided by the total UMI counts for that cell with a scaling factor set at 10,000. The expression matrix thus obtained was natural log-transformed. RunPCA function was run with default parameters and 2,000 highly variable genes (found using the FindVariableFeatures function). The RunUMAP function was run to obtain the uniform manifold approximation and projections, keeping default parameters and the number of PCA components, that is, ndim=1:12 was used.

scRNA-seq copy number changes using InferCNV

The InferCNV R package (v.1.1.0)³⁹ was used to call copy number changes. The UMI counts encompassing 17,187 genes and 21,055 cells were used as input to infer copy number changes. RPE control cells were considered as control samples. An inferCNV object was created using the CreateInfercnvObject object with raw UMI counts and hg19 genomic annotations as input. Run function parameters were set as follows: cutoff=0.1, cluster_by_groups=FALSE, denoise=TRUE, tumour_subcluster_partition_method = “qnorm”, HMM = TRUE, HMM_type = “i6”, analysis_mode = “subcluster”, HMM_report_by =“subcluster”. The residuals matrix generated from InferCNV containing the copy number status was used to perform hierarchical clustering using the fastcluster (v.1.2.3) Python package⁸⁴. The Elbow method was used to determine the optimal number of clusters⁸⁵, in which the distance between the clusters was plotted against each threshold. Matplotlib (v.3.4.2) library⁸⁶ was used for data visualization.

Differential expression analysis using scRNA-seq data

Differentially expressed genes between RPE TP53^−/− control and 20-week post-WGD tumours were detected using MAST⁸⁷ from the Seurat package. All the genes with an adjusted P value below 0.001 and absolute log₂(FC) greater than 0.3 were considered as differentially expressed.

Transcriptional regulator scores were then computed following the SCENIC workflow^88,89, using the pyscenic package v.0.11.2 as follows: the gene regulatory network was generated using the grn command, then the regulons (transcription factors and their target genes) were identified with using the ctx command using the motif list motifs-v9-nr.hgnc-m0.001-o0.0.tbl (downloaded from cisTarget database: https://resources.aertslab.org/cistarget/). The regulons for each single cell were scored using the aucell command.

Enrichment of differentially expressed genes on CoREs

The percentage of upregulated and downregulated genes overlapping with activating and inactivating CoREs were computed separately and used as background the complete set of genes. Effect sizes were then computed as log₂(FC) for the four possible scenarios ${\sigma }_{{\rm{Up}}}^{{\rm{Activating}}},{\sigma }_{{\rm{Down}}}^{{\rm{Activating}}},$ ${\sigma }_{{\rm{Up}}}^{{\rm{Inactivating}}}\,{\rm{and}}\,{\sigma }_{{\rm{Down}}}^{{\rm{Inactivating}}}$ where, for example,

$${\sigma }_{{\rm{Up}}}^{{\rm{Activating}}}={\log }_{2}\frac{ \% \,{\rm{upregulated}}\,{\rm{genes}}\,{\rm{overlapping}}\,{\rm{an}}\,{\rm{activating}}\,{\rm{CoRE}}}{ \% \,{\rm{all}}\,{\rm{genes}}\,{\rm{overlapping}}\,{\rm{an}}\,{\rm{activating}}\,{\rm{CoRE}}}$$

and the other effect sizes were similarly computed.

Statistical significance was assessed by randomly selecting 100,000 times a set of genes with the same number of genes as the total number of differentially expressed genes and computing the percentage of those genes overlapping a CoRE. An empirical P value was calculated as the probability that a random set of genes has a percentage of overlapping genes higher than the real set of differentially expressed genes.

WGS

Library preparation

DNA was extracted from cells using a DNeasy Blood & Tissue kit (Qiagen, 69504) following the manufacturer’s protocol. Library preparation was performed using TruSeq DNA PCR-Free (Illumina, 2001596) or TruSeq DNA Nano (Illumina, 2001596) kits. Libraries were sequenced on a NovaSeq 6000 system (Illumina) with a PE150 configuration.

Mapping and processing

Paired-end fastq files for each sample were aligned jointly to human_g1k_hs37d5 from the 1000 Genomes Phase 3 using bwa mem (v.0.7.17) and sorted with samtools (v.1.10)⁹⁰ using the sort command. Mismatches in read pairing were fixed using the fixmate command. Duplicate reads were identified using the genome analysis toolkit (GATK)^91,92 with the MarkDuplicatesSpark command. Base quality scores were corrected using BaseRecalibrator and ApplyBQSR from the GATK suite.

CNV calling

CNVs were identified using the Control-FREEC software⁹³ (v.11.6). CNVs for each sample were called with respect to the RPE TP53^−/− control sample by taking as input the sample and the control bam files that were previously generated. The software was run using the following parameters:

ploidy = 2,3,4;

breakPointThreshold = .08;

intercept = 0;

window = 10000;

mateOrientation = FR;

sex = XX.

All other parameters were kept as default.

Statistical significance for each detected CNV was then computed using the assess_significance.R script provided with the Control-FREEC software.

Finding a consensus CNV set in RPE TP53
^−/− 20-weeks post-WGD tumours

The CNVs obtained using Control-FREEC having both a Kolmogorov–Smirnov P value and a Wilcoxcon P value below 0.01 were selected. Copy number changes of the same type (gain or loss) were merged on the basis of their overlap (±50 kb) across the three tumour samples using the merge function of bedtools (v.2.30.0)^64,65.

Mutation calling

SNVs were called using the Mutect2 (ref. ⁹¹) algorithm from GATK. Blacklisted regions were derived from ref. ⁹⁴ and excluded using the -XL option. F1R1 counts calculated in the pre-processing phase were provided with the option –f1r2-tar-gz. Germline variants were filtered out using the –germline-resource option with the gnomAD database (b37 version)⁹⁵. Filtering of the mutations was performed using the FilterMutectCalls command setting –contamination-estimate to 0 and using the read orientation model computed at pre-processing with the –ob-priors option. VCF files were then converted to MAF using vcf2maf⁹⁶ and assembled into a single file. Only single-nucleotide polymorphism variants were considered in the analysis.

Mutations having the following FILTER values were removed: normal_artifact, germline, multiallelic and clustered_events. Then, all mutations having a gnomAD_AF value above 0.01 or were already reported in dbSNP (dbSNP_RS == ‘novel’) were removed. The VAF in the tumour sample and in the control sample, respectively, were calculated as t_vaf = t_alt_count/t_depth and n_vaf = n_alt_count/n_depth. Mutations such that t_vaf < 2n_vaf were removed. For the remaining mutations, the overlap between samples was checked (n_samples). Mutations found in only one sample and having FILTER != ‘PASS’ or t_depth < 6 were removed. Conversely, if a mutation is found in only one sample but FILTER == ‘PASS’ and t_depth >= 6, it was kept.

Oncogenicity of the variants was assessed using OncoKB⁹⁷.

To study how many mutations were gained after WGD and to compare them to the ones accumulated in the same time frame without WGD induction, the number of mutations found in 6 weeks post-WGD and 6 weeks control samples were counted, with the following exclusion criteria: mutations already detected in the TP53^−/− RPE control sample; mutations shared between 6-weeks post-WGD and 6-weeks control samples, as these mutations were most likely already present in the TP53^−/− RPE control sample and went undetected; mutations shared between 6-weeks control and WGD samples for the same reason.

ChIP–seq and analysis

ChIP–seq

ChIP was performed using a SimpleChIP Enzymatic Chromatin IP kit (Cell Signaling Technology, 91820S) following the manufacturer’s protocol. In brief, 2–4 million cells per condition were fixed in 1% formaldehyde for 10 min at room temperature. The reaction was quenched with 1× glycine and the cell pellet was washed with PBS. Cells were lysed, nuclei were digested with 1,000 U micrococcal nuclease for 20 min at 37 °C and briefly sonicated at 80 V peak incidence power, 200 cycles per burst and 5% duty factor for 90 s on an E220 focused-ultrasonicator (Covaris) to disrupt the nuclear membrane. Next 1–5 μg digested chromatin, with or without 0.1–0.5 μg digested mouse chromatin for spike-in normalization, was incubated at 4 °C overnight with one of the following antibodies, at the recommended dilutions: anti-acetyl-histone H3 (Lys27) (Cell Signaling Technology, 8173), anti-trimethyl-histone H3 (Lys9) (Cell Signaling Technology, 13969), anti-CTCF (active motif, 61311), anti-trimethyl-histone H3 (Lys4) (Cell Signaling Technology, 9751) or trimethyl-histone H3 (Lys27) (Cell Signaling Technology, 9733). Antibody-bound chromatin was precipitated using protein G magnetic beads and crosslink reversal was performed using NaCl and proteinase K. Resulting DNA was purified using spin columns. Library preparation for sequencing from the chromatin immunoprecipitated DNA was performed using a NEBNext Ultra II DNA Library Prep kit for Illumina (New England BioLabs, E7645) following the manufacturer’s protocol. DNA fragments were processed for end repair, followed by stubby adaptor ligation. DNA was size-selected using AMPure XP beads (Beckman-Coulter, A63881). Adaptor-ligated DNA was PCR-amplified using indexed NEBNext Multiplex Oligos for Illumina (New England Biolabs, E7335). The resulting libraries were sequenced on a NextSeq 500 system (Illumina) in a PE37/38 configuration.

ChIP–seq data analysis

Fastq files were processed using the nfcore/chipseq pipeline (v.1.2.2)^98,99 (https://nf-co.re/chipseq/) with default parameters, aligning the reads to the hg19 genome. For each sample, the fold change against the input experiment was then computed using the bamComapre command from the deepTools package (v.3.5.1)¹⁰⁰ setting –scaleFactorsMethod readCount, –extendReads, –operation ratio and –binSize 100). MACS3 (ref. ¹⁰¹) was used for peak calling. Peaks with a q value lower than 0.1 and a minimum fold change of 1.5 were retained. A consensus set of peaks for CTCF and H3K9me3 was then created from control and WGD samples from the RPE TP53^−/− and CP-A TP53^−/− cell lines separately by aggregating peaks that were closer than 10 kb for H3K9me3 and that directly overlapped for CTCF in the two samples. The maximum ChIP–seq signal (fold change over input) in control and WGD was associated for each consensus peak in the collection.

Spike-in normalization

Spike-in normalization on the CTCF signal was performed as previously described¹⁰².

CTCF peaks at Hi-C boundaries

Insulation boundaries shared between control and WGD samples from RPE TP53^−/− samples identified from Hi-C were associated to CTCF peaks by taking all the peaks lying in on the genomic bin of the boundary (±10 kb). Boundaries were then associated to a CTCF score representing the sum of all the CTCF peak signals at the boundary.

Comparing ChIP–seq signal differences to Hi-C compartment differences

Fold change signals computed at the previous step were scaled for each 10/50 bp bin (having value x) computing the log₂(x + 1) value. Scaled values were then binned at 50 kb resolution, taking the average signal in the bin. Finally, the 50 kb binned signal was normalized by dividing each bin by the chromosome median value. For each RPE TP53^−/− 20-weeks post-WGD tumour Hi-C sample, the binned Calder ranks at 50 kb were matched to the binned ChIP–seq values for each histone modification. Finally, differences in Calder rank (Δrank) were compared with differences in ChIP–seq signal (Δhm₁, Δhm₂, …) for each tumour against the RPE TP53^−/− control sample.

Correlation between CoRE regions and differences in ChIP–seq signal

To study the relationship between subcompartment repositioning events and epigenetic changes in RPE TP53^−/− post-WGD tumours, the average ChIP–seq signal difference for each histone mark was assigned to each CoRE (average(Δhm₁), average(Δhm₂), …), where $\Delta {{\rm{hm}}}_{1}={{\rm{hm}}}_{1}^{{\rm{Tumour}}}-{{\rm{hm}}}_{1}^{{\rm{Control}}}$ for each 50 kb bin in the CoRE region. The Spearman correlation was computed between subcompartment repositioning scores and the average epigenetic differences of the CoREs. To estimate the significance of the observed correlations, an empirical P value was computed by randomly sampling a number of regions of the same size of the observed CoREs across the genome 1,000 times and recomputing the correlation value. A P value was obtained, corresponding to the number of times the expected correlation was greater or equal in absolute value to the one observed, divided by the total number of random trials.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Source link

ByAUTHOR