Autonomous transposons tune their sequences to ensure somatic suppression - Nature

Cell culture and generation of stable cell lines

Table of Contents

Flp-In T-REx HEK293 (Thermo Fisher Scientific, catalogue no. R78007) cells were maintained according to the manufacturer’s recommendations. Cells were cultured in DMEM with glutamax supplemented by Na-Pyruvate and High Glucose (Thermo Fisher Scientific, catalogue no. 31966-021) in the presence of 10% fetal bovine serum (FBS; Thermo Fisher Scientific, catalogue no. 10270106) and penicillin/streptomycin (Thermo Fisher Scientific, catalogue no. 15140-122). Before their introduction, transgene cells were cultured at a final concentration of 100 µg ml⁻¹ zeocin (Thermo Fisher Scientific, catalogue no. R250-01) and 15 µg ml⁻¹ blasticidin (Thermo Fisher Scientific, catalogue no. A1113903). For generation of stable cell lines, pOG44 (Thermo Fisher Scientific, catalogue no. V600520) was cotransfected with pcDNA5/FRT/TO (Thermo Fisher Scientific, catalogue no. V652020) containing the gene of interest at a 9:1 ratio. Cells were transfected with Lipofectamine 2000 (Thermo Fisher Scientific, catalogue no. 11668019) on a six-well-plate format with 1 µg of DNA (that is, 900 ng of pOG44 and 100 ng of pcDNA5/FRT/TO + GOI) according to the transfection protocol provided by the manufacturer. All transgenes were cloned with an N-terminal His₆-biotinylation sequence-His₆ tandem (HBH) tag that allows rapid and ultraclean purification without the use of antibodies. We also added a 3× FLAG tag immediately before the HBH tag to increase the versatility of the construct, which we refer to as the 3FHBH tag. Twenty-four hours following transfection, cells were split among three wells of a six-well plate at dilution ratios of 1:6, 2:6 and 3:6 to allow efficient selection of hygromycin B (Thermo Fisher Scientific, catalogue no. 10687010). Hygromycin selection was started 48 h following the transfection time point, with a final concentration of 150 µg ml⁻¹, and refreshed every 3–4 days until control, non-transfected cells on a separate plate were totally dead. Induction of the transgene was performed overnight at a final concentration of 0.1 µg ml⁻¹ doxycycline (DOX). Cells were validated by immunoblotting of whole-cell lysates.

An endogenous biotin acceptor peptide affinity tag and a FLAG tag were inserted into the Safb gene locus for mouse and fly cell lines using CRISPaint. The mouse Flp-In 3T3 cell line was purchased from Thermo Fisher Scientific (catalogue no. R76107) and cultured according to the manufacturer’s instructions. Vells were cultured in DMEM (Thermo Fisher Scientific, catalogue no. 31966-021) in the presence of 10% FBS (Thermo Fisher Scientific, catalogue no. 10270106) and penicillin/streptomycin (Thermo Fisher Scientific, catalogue no. 15140-122). The Drosophila S2R+ -MT::Cas9 cell line was purchased from DGRC (DGRC stock no. 268) and cultured in S2 medium (Thermo Fisher Scientific, catalogue no. 21720024) in the presence of 10% FBS (Thermo Fisher Scientific, catalogue no. 10270106). For CRISPaint⁵⁶ constructs (see Supplementary Table 2 for a list of single-guide RNAs), cells were cotransfected with three plasmids according to the CRISPaint protocol on the six-well-plate format using FuGene HD (Promega, catalogue no. E2311). Twenty-four hours following transfection, cells were expanded on 10 cm culture plates to facilitate efficient selection of puromycin (Thermo Fisher Scientific, catalogue no. A1113803). Puromycin selection is provided in the tag construct and is driven by expression from the gene locus (in this case, either the mouse or fly Safb1 gene locus). Puromycin selection was started 48 h following transfection, at 1 µg ml⁻¹ final concentration, and was refreshed every 2 days and, in total, was maintained until all untransfected 3T3 or S2 cells were dead. Cells were validated by immunoblotting of whole-cell lysates.

The HeLa cell line (ACC57) was purchased from Deutsche Sammlung von Mikroorganismen und Zellkulturen and maintained in the same medium as the Flp-In 3T3 cell line, but with the addition of non-essential amino acids (Thermo Fisher Scientific, catalogue no. 11140050).

Mouse N2A cells were maintained in DMEM, and stably expressing 3× FLAG-Cas9 or 3× FLAG-SAFB1 (Extended Data Fig. 10g) was created by cotransfection of cells with plasmids expressing the protein of interest (Cas9, SAFB1 or control) under the EF1alpha promoter flanked by PiggyBack inverted repeats, together with a plasmid expressing PiggyBac transposase. In this design, because neomycin resistance was coupled to transgene expression via an IRES element, cells were selected with 1 mg ml⁻¹ geneticin until none remained in control transfected cells.

Cell lines (human Flp-In T-REx HEK293, human HeLa, human HCT116, mouse Flp-In 3T3, mouse N2A and fly S2R+) were all purchased from vendors or repositories or provided by colleagues (as described above), and no further authentication of cell lines was performed following purchase. Routine mycoplasma contamination tests were performed on all cell lines using the Jena Biosciences Mycoplasma (PCR-based) detection kit (Jena Biosciences, no. PP-401).

FLASH

Cells on 15 cm dishes were washed with 6 ml of ice-cold PBS and UV-crosslinked with 0.199 mJ cm⁻² UV-C light, after which they were pelleted, snap-frozen in liquid nitrogen and stored at −80 °C until use. Pellets were resuspended in 600 µl of 1× native lysis buffer (NLB) with protease inhibitors and briefly sonicated in a Bioruptor water bath sonicator (30 s on, 30 s off, five cycles at 4 °C). Lysates were then centrifuged at 20,000 relative centrifugal force (rcf) for 10 min at 4 °C to remove insoluble material. Supernatant was transferred to a fresh tube with 25 µl of MyONE C1 streptavidin beads (Thermo) and incubated in a cold room with end-to-end rotation for 1 h. Beads were washed once with high-salt buffer (HSB), once with non-denaturing buffer (NDB), treated with 0.02 U µl⁻¹ RNase I (Thermo) in 100 µl of NDB for 3 min at 37 °C and immediately placed on ice to stop the reaction. Beads were then washed once each with HSB and NDB. RNA ends were repaired with T4 polynucleotide kinase, after which barcoded s-oligos were ligated with T4 RNA ligase 1 for 90 min at 25 °C. The 3′ phosphate at the 3′ end of each s-oligo was removed with recombinant shrimp alkaline phosphatase (NEB, M0371) and beads were washed once each with lithium dodecyl sulfate buffer, protein lysis buffer and HSB, and finally with NDB. RNA was released by treatment with proteinase K and purified using Oligo Clean and Concentrator columns (Zymo). Reverse transcription was carried out with SuperScript III and samples then treated with RNase H (NEB) to phosphorylate the 5′-end of the cDNA molecule. Following a final round of purification with Oligo Clean and Concentrator columns (Zymo), cDNA was circularized with CircLigaseII (Lucigen) and amplified with Q5 polymerase (NEB). PCR products were purified with solid-phase reversible immobilization beads, quality controlled with Bioanalyzer and subjected to high-throughput sequencing.

FLASH data processing

Paired-end reads were merged with bbmerge.sh v. 38.72 using the following command: bbmerge.sh in1 = {R1.fastq.gz} in2 = {R2.fastq.gz} out = {merged.fastq.gz} outu1 = {unmerged.R1.fastq.gz} outu2 = {unmerged.R2.fastq.gz} ihist = {histogram.txt} adapter1=AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCCAACAATCTC adapter2=AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGG –mininsert=1. Short inserts (below 20 nt, following removal of the unique molecular identifier (UMI) and internal index) were removed with bbduk.sh v. 38.72 bbduk.sh in = {infile} out = {out} minlen=34. The UMI was removed from reads and written to the header with UMI_tools v.1.0.0: umi_tools extract –bc-pattern=NNNXXXXXXNNNNN -I {IN.fastq.gz} -−3prime –stdout = {OUT.fastq.gz}, followed by separation of replicates with flexbar v.3.5.0: flexbar -r INPUT.fastq.gz -b barcodes.fa –barcode-trim-end RTAIL –barcode-error-rate 0.2 –zip-output GZ. Reads were aligned first to abundant RNAs such as transfer RNA, small nuclear RNA, small nucleolar RNA and ribonuclear RNA, then to the genome with bowtie2 v.2.3.5: bowtie2 –no-unal –un-gz -L 16 –very-sensitive-local -x bt2_index -U fastq_in.fastq.gz -o bam_out.bam. Unaligned reads were remapped to the genome with bbmap.sh v.38.72 to capture spliced reads: bbmap.sh -Xmx50G in = {fastq_in} out = {bam_out} outu = {unmapped_out} ref = {reference.fa} sam=1.3 mappedonly=t mdtag=t trimreaddescriptions=t nodisk. Finally, PCR duplicates were removed using UMI-tools: umi_tools dedup -I in_bam -S out_bam –spliced-is-unique –soft-clip-threshold 3 –output-stats = {stats}. Coverage files were generated with bamCoverage v.3.3.1: bamCoverage -b bam –filterRNAstrand [forward | reverse] –binSize 1 –normalizeUsing CPM –exactScaling -o out_file.

UMAP of FLASH data

For construction of the UMAP, peak calling was carried out on all profiles using HOMER: findPeaks {tag_directory} -style factor -strand separate -o {peaks.txt} -i {background_tag_directory}. Peaks from all profiles were then merged with: mergePeaks -strand -d given -matrix {peaks1.txt peaks2.txt …} > merged.peaks.txt. A count matrix, using all alignments from all profiles against merged peaks, was then created with featureCounts v.2.0.1: featureCounts -F SAF -Q 10 –primary -s 1 -T 12 -a {merged_peaks} -o {merged_peaks.counts.txt} {all_bam_files}. The count matrix was imported into a Jupyter notebook with pandas: peaks = pd.read_csv(“merged_peaks.counts.txt”, sep = ”\t”, index_col = ”Geneid”), scaled with sklearn.preprocessing.StardardScaler: peaks_scaled = StandardScaler().fit_transform(peaks), which was then used to create the UMAP: peaks_scaled_mapper = umap.UMAP(n_neighbors=15, random_state=42).fit(peaks_scaled), and plotted using umap.plot.points function. Clusters were called with HDBSCAN: clusterable_embedding = umap.UMAP(n_neighbors=30, min_dist=0.0, n_components=14, random_state=42).fit_transform(peaks_scaled), then hdbscan_labels = hdbscan.HDBSCAN(min_samples=100, min_cluster_size=600, core_dist_n_jobs=1).fit_predict(clusterable_embedding).

Sample and library preparation for RNA-seq

Flp-In T-REx HEK293 and HeLa ACC57 cells were transfected at a final concentration of 5 nM each (in the case of triple knockdown, total siRNA concentration became 15 nM and hence single-knockdown transfections were increased to 15 nM with the addition of 10 nM negative control siRNA) using Silencer Select siRNAs (Thermo Fisher Scientific, catalogue no. 4427037 for 1 nM scale) and RNAiMAX (Thermo Fisher Scientific, catalogue no. 13778030) on six-well plates (around 200,000 were used per replicate). Silencer Select siRNAs are 21 nt long, chemically modified (the exact modification is proprietary; Thermo Fisher) and reduce overall off-target effects by up to 90% without compromising potency. This modification also exaggerates strand bias, which correlates with better knockdown, and therefore they are 5- to 100-fold more potent than other siRNAs. The siRNA ID for human SAFB1 is s12452, for SAFB2 is s18599 and for SLTM is s36384. Cells were harvested on the second day of knockdown.

The Silencer Select siRNAs used were s29362 for MPP8 was s23449 for TASOR.

Flp-In 3T3 cells were first reverse transfected (roughly 100,000 per replicate) with 5 nM siRNA, boosted with the same amount 24 h following knockdown (forward transfected) and harvested on the third day following initial transfection. The siRNA ID for mouse Safb1 is s104978, for Safb2 is s104977 and, because the human SLTM siRNA also targets mouse mRNA, the same siRNA was used.

Drosophila S2R+ cells (DGRC no. 150) were transfected with control dsRNA against GFP or Saf-B using FuGENE HD (Promega) for 3 days, after which cells were harvested for RNA isolation.

Total RNA from human, mouse or Drosophila cells was extracted with the Quick-RNA MicroPrep kit (Zymo). Polyadenylated RNA was isolated from total RNA with the Dynabeads mRNA Purification Kit (Thermo). Purification was carried out twice to enrich poly(A)+ RNA. Sequencing libraries were generated using the KAPA Stranded RNA-Seq Library Preparation Kit (Roche).

Isolation of nuclear and cytoplasmic RNA for RNA-seq

Forty-eight hours following siRNA transfection (control or SAFB1 + SAFB2 + SLTM, 5 nM each), approximately 1 million Flp-ln T-REx HEK293 cells per replicate were trypsinized and either used directly for RNA isolation (total sample) or resuspended with a buffer containing 0.5% Igepal CA-630 to separate nuclear and cytoplasmic fractions, as described in ref. ⁵⁷. Nuclear and cytoplasmic RNAs were isolated with the Quick-RNA MicroPrep kit (Zymo). Ribo-depleted RNA-seq samples were prepared using the KAPA RNA HyperPrep Kit with RiboErase (HMR) (no. KK8560, Roche).

Transient transfections in rescue experiments and sample preparation for qPCR detection

SAFB triple knockdown was performed on Flp-In T-REx HEK293 cells as described above, and then FuGENE HD forward transfected with WT or truncation mutants as shown in Extended Data Fig. 7f while at the same time refreshing the medium 6 h following transfection of siRNAs. Transgenes were induced on day 1 of knockdown with 0.1 µg ml⁻¹ DOX for 24 h. On day 2 of knockdown, total RNA extracts were prepared with the Zymo Quick-RNA Kit and first-strand cDNA synthesis was carried out with PrimerScript RT Master Mix (TaKaRa, no. RR036A). Quantitative real-time PCR was performed using the oligos listed in Supplementary Table 1 with the Blue S’Green qPCR Kit (Biozym, no. 331416).

ONT direct RNA-seq

Isolation of polyA-enriched mRNA from Flp-ln T-REx HEK293 cells treated with either control siRNA or siRNAs against SAFB1, SAFB2 and SLTM (5 nM each) for 2 days was carried out using the Dynabeads mRNA DIRECT purification kit (Thermo Fisher Scientific) following the manufacturer’s instructions, with minor modifications. In brief, approximately 4 × 10⁶ cells were subjected to the standard protocol and hybridization of the beads/mRNA complex was carried out for 10 min on a Mini Rotator (Grant-bio). DNA containing supernatant was removed and the beads were resuspended with 2 × 2 ml of buffer A following a second wash step with 2 × 1 ml of buffer B. Purified RNA was eluted with 10 µl of preheated elution buffer (10 mM Tris-HCl pH 7.5) for 5 min at 80 °C. Quantification of isolated mRNA was performed using a Qubit Fluorometer together with the RNA HS Assay kit (Thermo Fisher Scientific). For direct RNA-seq, 700 ng of freshly isolated polyA-enriched mRNA was processed according to the manufacturer’s protocol (no. SQK-RNA002). Final sequencing libraries were then loaded on R9.4 flow cells and sequenced on MinION and PromethION sequencers.

Retrotransposition assay

The transfection and experimental timeline for the retrotransposition assay was followed as described in ref. ¹⁸. Initially around 200,000 HeLa cells were transfected, with the same siRNAs and under the conditions listed above, on a six-well plate with 5 nM final concentration each of negative control, SAFB1, SAFB2 and SLTM siRNAs. The following day, knockdown HeLa cells were transfected with 200 ng of plasmids pYX015 (based on JM111, which has a point mutation in ORF1p) for background control and pYX017 (pCAG-driven L1RP) for L1 activity in triplicates, using Lipofectamine 2000 on a 48-well plate in triplicate. Twenty-four hours following reporter construct transfection, 2.5 µg ml⁻¹ puromycin selection was started and maintained for 3 days (that is, day 5 of knockdown). Cells were washed with PBS before lysing with 40 µl of passive lysis buffer from the Dual-Luciferase Reporter Assay System (Promega, catalogue no. E1960). Half of the lysate was transferred to a 96-well, reading-compatible plate and measured using an Omega Lumistar machine.

RNA–FISH

FISH was carried out in HCT116 cells transfected with control versus siRNAs against SAFB1 and SLTM (no SAFB2 expression was detected in HCT116 cells) for 48 h using the Stellaris RNA–FISH kit (https://www.biosearchtech.com/assets/bti_stellaris_protocol_adherent_cell.pdf). Probes against L1Hs were synthesized by LGC Biosearch Technologies (see Supplementary Table 2 for sequences). Probes against GAPDH were sourced from LGC Biosearch Technologies (SMF-2026-1), provided by M. Bothe. Probes were used at a concentration of 125 nM and hybridized for 16 h at 37 °C. Samples were imaged using a Leica Stellaris 8 confocal microscope.

EMSA with recombinant Halo, Halo-SAFB1^RRM and Halo-TRA2B^RRM

The RNA-binding domain of TRA2B (residues 111:201) and SAFB1 (residues 386:485) were cloned into a plasmid encoding 10× His-TEV-Halo. Three constructs (Halo only, Halo-TRA2B^RRM and Halo-SAFB1^RRM) were then expressed using BL21-CodonPlus(DE3)-RIL bacteria, which were induced when an optical density of roughly 0.6 was reached, with 0.2 mM isopropyl-ß-d-thiogalactopyranoside for 4 h at 37 °C, then collected by centrifugation. Bacteria were resuspended with lysis buffer (50 mM HEPES pH 8.0, 300 mM NaCl, 5 mM imidazole and 0.05% Igepal CA-630) and disrupted with a Branson sonifier, clarified by centrifugation and filtered through a 0.45 µm membrane. Cleared lysates were incubated with cOmplete His-Tag Purification Resin (Roche), washed extensively with lysis buffer and incubated with 0.5 µM OregonGreen (Promega) on beads in lysis buffer at room temperature for 30 min for fluorescent labelling of proteins. Beads were first washed extensively with lysis buffer, then with high-salt wash buffer (50 mM Tris.Cl pH 8.0, 1 M NaCl, 5 mM imidazole) and lastly with lysis buffer. Proteins were eluted with elution buffer (50 mM Tris.Cl pH 8.0, 100 mM KCl, 200 mM imidazole). Eluates were pooled, dithiothreitol (DTT, 1 mM final concentration) and TEV protease (home-made, 6× His-tagged, approximately 1:100) were added and samples dialysed against 25 mM Tris.Cl pH 7.4, 50 mM KCl, 5% glycerol and 1 mM DTT overnight in a cold room (about 8 °C). Dialysed eluates were then incubated with cOmplete His-Tag Purification Resin (Roche) for removal of TEV protease and undigested proteins, and flowthrough was centrifuged at 23,000 rcf for 30 min and filtered through a 0.22 µm membrane to remove particulate matter. The UV spectra showed no significant absorption at 260 nm and were used to quantify purified proteins, which were then normalized and their quality checked with PAGE and Coomassie staining (Fig. 4a). Concentrations used in EMSAs were: Halo-TRA2B^RRM (lanes 2–6: 3.6, 7.2, 14.4, 57.6 and 102.4 µM, respectively); Halo-SAFB^RRM (lanes 7–11: 3.6, 7.2, 14.4, 57.6 and 102.4 µM, respectively); and Halo (lane 12: 102.4 µM). Lane 1 contained only those probes with no added protein.

The RNA probes were prepared by in vitro transcription. Briefly, a plasmid containing the relevant sequence TAATACGACTCACTATAGGGAAGAAGAAGAAGAAGAAGAAGAT^ATC, in which the T7 promoter sequence is underlined, was digested with EcoRV (site of digestion, indicating that the last nucleotide of the final RNA is marked—indicated by ^), purified and in vitro transcribed using a HighYield T7 RNA Synthesis Kit (Jena Biosciences, no. RNT-101) with either 1 mM (final) CTP/UTP/GTP/ATP or 1 mM CTP/UTP/GTP and 1 mM N6-Methyl-ATP (Jena Biosciences, no. RNT-112-S), completely replacing ATP. RNA was cleaned up using SPRI beads to remove the plasmid and other potential high-molecular-weight products, then with the OCC-5 kit (Zymo). RNA was then oxidized using freshly prepared sodium periodate (250 mM in water, final concentration 10 mM; Sigma, no. 311448) in 60 mM NaOAc pH 5.5 for 1 h on ice, with tubes kept in the dark. After a further clean-up with OCC-5, RNA was then labelled with CF 647 Hydrazide (Sigma, no. SCJ4600046; 10 mM in water, 0.8 mM final concentration in approximately 120 mM NaOAc, pH 5.5) at room temperature overnight. RNA was purified with OCC-5, eluted in water and normalized to 5 µM. EMSAs were carried out in 25 mM Tris.Cl pH 7.4, 50 mM KCl, 5% glycerol and 1 mM DTT with an RNA probe of around 100 nM and the indicated concentration of the protein of interest. Following incubation of RNA and proteins on ice for 30 min, mixtures were loaded directly on a Nature 8% polyacrylamide gel cast with 0.5× Tris-borate-EDTA (final) and run in 0.5× Tris-borate-EDTA in a cold room for 45 min at 100 V (gels were prerun at 100 V for 15 min). Proteins and RNA were sequentially visualized on the same gel using a Typhoon Scanner with appropriate excitation lasers and emission filters.

In vitro unmethylated and methylated RNA-binding assay

Nuclei were isolated from wt-HCT116 cells using a buffer containing 0.5% Igepal CA-630, following Lubelsky and Ulitsky⁵⁷, and snap-frozen in liquid nitrogen until use. Nuclei were resuspended with 500 µl of 25 mM Tris.Cl pH 7.4, 150 mM KCl, 2 mM MgCl₂, 0.5% Igepal CA-630, 5% glycerol, 5 mM β-mercaptoethanol, 1× protease inhibitors and 1× PhosSTOP and sonicated with a Branson sonifier. Next, 15 µl of TURBO-DNase was added followed by incubation at 25 °C for 20 min and then by the slow addition to the lysate of 1.5 m of base buffer (25 mM Tris.Cl pH 7.4, 50 mM KCl, 5% glycerol) to bring the KCl concentration to 75 mM and Igepal CA-630 concentration to 0.125% (final). Lysate was incubated with 50 µl of Pierce Control Agarose Resin (no. 26150) for 20 min, with rotation in a cold room, and spun down at full speed for 10 min at 4 °C to remove insoluble material. A 949 bp fragment of L1 ORF2 was amplified from pYX017 using primers AATAATACGACTCACTATAGCGTATCACCACCGATCCCACAG (T7 promoter underlined) and GGCTGAGACGATGGGGTTTT and in vitro transcribed using a HighYield T7 RNA Synthesis Kit (Jena Biosciences, no. RNT-101) with either 1 mM (final) CTP/UTP/GTP/ATP or 1 mM CTP/UTP/GTP and 1 mM N6-methyl-ATP (Jena Biosciences, no. RNT-112-S), completely replacing ATP. RQ1 DNase (Promega) was added to each reaction with incubation for for 20 min at 37 °C, after which RNA was cleaned up using RCC-25 (Zymo) and oxidized with freshly prepared sodium periodate (250 mM in water, final concentration 10 mM; Sigma, no. 311448) in 60 mM NaOAc pH 5.5 for 1 h on ice, with tubes kept in the dark. After a further clean-up with RCC-25, RNA was then labelled with biotin Hydrazide (Sigma, no. 87639; 50 mM in DMSO, 2 mM final concentration in approximately 120 mM NaOAc, pH 5.5) at room temperature overnight. RNA was purified with RCC-25, eluted in water and quantified with Nanodrop, then 5 µg of each RNA or buffer was incubated with 25 µl of MyONE C1 streptavidin beads in base buffer + 0.1% Igepal CA-630 for 1 h at room temperature and washed twice with base buffer + 0.1% Igepal CA-630. The nuclear lysate was incubated with these beads for 1 h at 16 °C, with shaking at 1,100 rpm. Beads were washed and transferred from fresh tubes with base buffer + 0.1% Igepal CA-630. Proteins bound to the beads were eluted with base buffer + 0.1% Igepal CA-630 + 2 µl of RNaseA + T1 (no. EN0551, Thermo Fisher Scientific) for 30 min at 30 °C and demonstrated by immunoblotting.

RNA blotting

HCT116 cells were transfected with 5 nM siRNA (as indicated in Fig. 2h) then, 48 h later, were either transfected with a plasmid encoding L1Hs and driven by a minimal EF1alpha (without an intron) promoter or mock transfected. Twenty-four hours later (72 h post siRNA transfection), cells were trypsinized and resuspended with a buffer containing 0.5% Igepal CA-630, essentially as described in ref. ⁵⁷. The cytoplasmic fraction was purified with RNA Clean & Concentrator columns (Zymo), 2 µg of which was loaded onto 1.2% agarose gel and electroblotted to a nylon membrane. DIG-labelled probes against ORF2 were prepared with in vitro transcription (see Supplementary Table 2 for primers) and probe hybridization, washes and imumunodetection were carried out as described in the manual of the DIG Northern Starter Kit (Roche, no. 12 039 672 910).

p-SR (1H4) and DHX9 FLASH in SAFB-depleted cells

Flp-In T-REx HEK293 cells were transfected with either control siRNA or siRNAs against SAFB1, SAFB2 and SLTM 48 h following transfection, then washed with PBS and UV-crosslinked with 0.2 mJ cm⁻² UV-C light on ice. Nuclei were isolated as described in ref. ⁵⁷, resuspended in 1× NLB + 5 mM MgCl₂ with protease and phosphatase inhibitors and sonicated using a Branson sonifier. Following centrifugation, to remove insoluble material the supernatant was incubated with an agarose resin (Pierce, no. 26150) for 20 min in a cold room followed by further incubation with Dynabeads Protein G beads prebound to p-SR antibody (10 µl per IP; 1H4, Santa Cruz, no. sc-13509) for 90 min in a cold room. The supernatant from 1H4 IP was used for DHX9 IP (2.5 µl per IP; abcam, no. ab26271). The FLASH protocol was identical to that described above, except that all HSB washes were replaced with NLB and s-oligos were pre-dephosphorylated to skip the recombinant shrimp alkaline phosphatase treatment that could dephosphorylate SR proteins on the beads, potentially leading to their elution.

RIP–qPCR

Flp-In T-REx HEK293 cells were crosslinked with 0.2% formaldeyhde for 10 min at room temperature, extensively washed with PBS, resuspended with 1× NLB and sonicated using a Branson sonifier. The lysate was centrifuged at 23,000 rcf for 10 min at 4 °C to remove insoluble material and the supernatant then incubated with an agarose resin (Pierce, no. 26150) for 30 min in a cold room. Following brief centrifugation, the supernatant was used for IP with Dynabeads Protein G beads coupled to either an antibody against SAFB1 (10 µl per IP; Santa Cruz, no. sc-393403) or control IgG (Santa Cruz, no. sc-2025) overnight in a cold room. Beads were washed with 1× NLB and bead-bound RNA was eluted with proteinase K, as described above, purified using RCC-5 (Zymo) and utilized for RT–qPCR.

Generation of the Dnmt3c-null allele

Dnmt3C knockout animals were generated as described in ref. ⁵⁸. For specific abolition of enzymatic activity we designed a sgRNA against the methyltransferase domain of Dnmt3C targeted to exon 15 with the following protospacer sequence: 5′-GGACATCTCACGATTCCTGG-3′. P0 animals were genotyped using Sanger sequencing following PCR with primers 5′-CTGGCCGGCTCTTCTTTGAG-3′ and 5′-GGAAATCATTCCCACCTGTCAGC-3′. The founding animal was chosen based on a 31 bp deletion, which resulted not only in a frameshift mutation beginning at codon 598 but simultaneous removal of a PfoI restriction enzyme digestion site for straightforward genotyping. The founder mutation was subsequently backcrossed into the C57BL/6 J background. Homozygous knockout males were validated as infertile, with significantly smaller and disordered testes by P42, as reported previously⁵¹. The generation of these experimental animals was regulated following ethical review by Yale University Institutional Animal Care and Use Committee (protocol no. 2020-20357) and was performed according to governmental and public health service requirements. No sample size selection, randomization or blinding was performed.

Direct antibody labelling

The Mix-n-Stain CF488 A Antibody Labelling Kit (Biotium, no. 92253) and Mix-n-Stain CF555 Antibody Labelling Kit (Biotium, no. 92254) were used to label rabbit antihuman SAFB1/SAFB antibody (LSBio, LS-C286411) and rabbit anti-LINE-1-ORF1p antibody (abcam, no. ab216324), respectively. The standard protocol listed on the product website was followed, including the ultrafiltration protocol, with minor modifications. In brief, 25–35 μg of antibody was placed in the ultrafiltration vial provided and centrifuged at 14,000g for 2 min to remove all liquid. Depending on the initial amount of antibody, antibodies were eluted in 1× PBS to a final concentration of 0.75 ng μl⁻¹ and the appropriate volume of 10X Mix-n-Stain Reaction Buffer added. The entire solution was transferred to the vial containing the dye and the labelling reaction allowed to proceed at room temperature (22–23 °C) in the dark for 30 min. Finally, 150 μl of storage buffer was added to each reaction with storage in aliquots of 50 μl at −20 °C until use.

Testis sectioning and Immunofluorescence microscopy

Testes from P25 Dnmt3C homozygous and heterozygous mutant males were dissected and embedded in O.C.T. compound (Tissue-Tek). Using cryosectioning, 8 μm sections were obtained with a Leica CM3050S and spotted onto Fisherbrand Superfrost Plus Microscope Slides (Fisher Scientific, no. 12-550-15) and stored at −80 °C until use. For immunofluorescence detection, slides were thawed at room temperature for over 10 min before fixing in 4% paraformaldehyde for 8 min. Permeabilization and blocking were performed at room temperature for 1 h with blocking buffer (5% bovine serum albumin (BSA), 0.2% Triton X-100 and PBS). Sections were incubated with directly labelled antibodies overnight at 4 °C, followed by three 5 min washes in 1× PBS and mounting with VECTASHIELD PLUS Antifade Mounting Medium and DAPI (Vector Laboratories, no. H-2000). Images were acquired using a Leica THUNDER Imaging System at ×40 magnification.

Mass spectrometry

Flp-In T-REx HEK293 cells stably expressing SAFB1, SAFB2 or SLTM (same cell lines used for FLASH) were induced with 0.1 µg ml⁻¹ DOX for 16 h in triplicate, lightly crosslinked with formaldehyde (0.016% final) at room temperature for 10 min, extensively washed with PBS, resuspended with HMGT-K200 buffer (25 mM HEPES-KOH pH 7.4, 10 mM MgCl₂, 10% glycerol, 0.2% Tween-20) and homogenized using a water bath sonicator. Following centrifugation, supernatants were then incubated with MyONE C1 streptavidin beads to pull down proteins of interest. Beads were washed with HMGT-K200, 20 mM Tris-Cl pH 7.4 and 1 M NaCl and finally with 20 mM Tris-Cl pH 7.4 and 50 mM NaCl, then submitted to the in-house MS-facility for further processing. Silver gel staining was performed using a SilverQuest Silver Staining Kit (Thermo Fisher Scientific, no. LC6070) for SAFB1 to ensure that conditions were sufficiently stringent in comparison with GFP pulldown (Extended Data Fig. 7b).

On-beads digest and mass spectrometry analysis

Twelve samples were boiled at 95 °C and 500 rpm for 10 min, followed by tryptic digest including reduction and alkylation of cysteines. The reduction was performed by the addition of tris(2-carboxyethyl)phosphine at a final concentration of 5.5 mM at 37 °C on a rocking platform (500 rpm) for 30 min. To perform alkylation, chloroacetamide was added at a final concentration of 24 mM at room temperature on a rocking platform (500 rpm) for 30 min. Proteins were then digested with 200 ng of trypsin (Roche) per sample, shaking at 800 rpm and 37 °C for 18 h. Samples were acidified by the addition of 1.3 µl of 100% formic acid (2% final concentration), centrifuged and placed on a magnetic rack. Supernatants containing the digested peptides were transferred to a new low-protein-binding tube. Peptide desalting was performed on self-packed C18 columns in a Tip. Eluates were lyophilized and reconstituted in 19 µl of 5% acetonitrile and 2% formic acid in water, briefly vortexed and sonicated in a water bath for 30 s before injection into nano-liquid chromatography–tandem mass spectrometry (nano-LC–MS/MS).

LC–MS/MS instrument settings for shotgun proteome profiling and data analysis

LC–MS/MS was carried out by nanoflow reverse-phase liquid chromatography (Dionex Ultimate 3000, Thermo Scientific) coupled online to a Q-Exactive HF Orbitrap mass spectrometer (Thermo Scientific), as reported previously⁵⁹. In brief, LC separation was performed using a PicoFrit analytical column (75 μm internal diameter × 50 cm length, 15 µm Tip ID; New Objectives) and packed in house with 3 µm of C18 resin (Reprosil-AQ Pur, Dr Maisch). Peptides were eluted using a gradient from 3.8 to 38% solvent B in solvent A over 120 min at a flow rate of 266 nl min⁻¹. Solvent A was 0.1% formic acid and solvent B comprised 79.9% acetonitrile, 20% H₂O and 0.1% formic acid. Nanoelectrospray was generated by the application of 3.5 kV. A cycle of one full Fourier transformation scan mass spectrum (300–1,750 m/z, resolution 60,000 at m/z 200, automatic gain control target 1 × 10⁶) was followed by 12 data-dependent MS/MS scans (resolution of 30,000, automatic gain control target 5 × 10⁵) with a normalized collision energy of 25 eV. To avoid repeated sequencing of the same peptides, a dynamic exclusion window of 30 s was used.

Raw MS data were processed with MaxQuant software (v.1.6.17.0) and searched against the human proteome database UniProtKB UP000005640 (containing 75,074 protein entries, released May 2020). The parameters of MaxQuant database searching were a false discovery rate of 0.01 for proteins and peptides, a minimum peptide length of seven amino acids, a first-search mass tolerance for peptides of 20 ppm and a main search tolerance of 4.5 ppm. A maximum of two missed cleavages was allowed for the tryptic digest. Cysteine carbamidomethylation was set as a fixed modification whereas N-terminal acetylation and methionine oxidation were set as variable modifications. The MaxQuant-processed output files can be found in Supplementary Table 3, showing peptide and protein identification, accession numbers, percentage sequence coverage of the protein and q-values.

IP

Native whole-cell extracts prepared using 0.5× NLB were incubated with ProtG Dynabeads (Life Technologies, no. 10004D) coupled to 1 μg of either SAFB antibody (‘Antibodies’) or IgG (mouse; Santa Cruz, no. sc-2025) in a cold room for 150 min. Beads were washed twice in 0.5× NLB for 5 min then once with NDB. RNase-treated samples were resuspended in 90 µl of NDB to which 10 µl of RNaseA + T1 mix (Thermo Scientific, no EN0551) was added. Samples were then incubated at 20 °C for 15 min and washed twice with 0.5× NLB. Elution from the beads was performed in 1× protein-loading dye by incubation for 5 min at 95 °C with shaking. Interaction partners were detected using the antibodies against proteins shown in Extended Data Fig. 7 (‘Antibodies’).

Immunofluorescence

Cells were crosslinked with 4% methanol-free formaldehyde in PBS at room temperature for 10 min, permeabilized with 0.5% Triton X for 10 min then blocked with 5% BSA in PBS for 30 min at room temperature. Primary antibodies (further details in ‘Antibodies’) were diluted in PBS with 0.1% Triton X and 1% BSA and incubated with fixed cells at 4 °C for about 16 h. Fluorescently labelled secondary antibodies with the appropriate serotype were used to demonstrate target proteins. Hoechst 33342 was used to stain DNA.

Antibodies

The following antibodies were used: AFB1 (Santa Cruz, no. sc-393403), SAFB2 (Santa Cruz, no. sc-514963), SAFB1/2 (HET) (human: Merck/Sigma-Aldrich, no. sc05-588; mouse: LSBio, no. LS-C2886411), SLTM (Invitrogen, no. PA5-59154), ORF1p (human: abcam, no. ab230966; mouse: abcam, no. ab216324), TASOR (Sigma-Aldrich, no. HPA006735), 1H4 (p-SR) (Merck/Sigma-Aldrich, no. MABE50), RBM12B (Bethyl, no. A305-871A-T), RBMX (Cell Signaling Technology, no. 14794 S), NCOA5 (Bethyl, no. A300-790A-T), ZNF638 (Sigma-Aldrich, no. ZRB1186), ZNF326 (Santa Cruz, no. sc-390606), TRA2B (Bethyl, no. A305-011A-M), U2AF2 (U2AF65; Santa Cruz, no. sc-53942), TUBULIN (Santa Cruz, no. sc-32293), SRRM1 (abcam, no. ab221061), SRRM2 (SC35) (Sigma-Aldrich, no. S4045), SON (Sigma-Aldrich, no. HPA023535), DHX9 (abcam, no. ab183731), U1-70K (SySy, no. 203011), PRP8 (Santa Cruz, no. sc-55533), RNAPII (Creative Biolabs, no. CBMAB-XB0938-YC), IgG normal mouse (Santa Cruz, no. sc-2025), SRSF1 (Santa Cruz, no. sc-33652), SRSF2 (abcam, no. ab204916), SRSF3 (Elabscience, no. E-AB-32966), SRSF7 (MBL, no. RN079PW), RB1 (Cell Signaling Technology, no. 9309 S), TRA2B (Santa Cruz, no. sc-166829) and YTHDC1 (Proteintech, no. 14392-1-AP).

TE expression analysis

RNA-seq data from human (HEK293, HeLa, HCT116), mouse (3T3) and Drosophila (S2) cells were mapped to their respective genome (hg38, mm10 and dm6, respectively) using the snakePipes non-coding-RNA-seq pipeline⁶⁰. Internally this pipeline uses TEtranscripts²³, which estimates both gene and TE transcript abundance in RNA-seq data and conducts differential expression analysis on the resultant count tables, which is carried out by DESeq2 (ref. ⁶¹). The outputs of this analysis can be found in Supplementary Tables 4–11.

SAFB peak annotation and TE enrichment

Overlapping SAFB1, SAFB2 and SLTM regions called by HOMER on FLASH data were merged using the function IRanges::reduce(), resulting in a single set of 29,806 SAFB-bound genomic intervals (SAFB peaks), 23,136 of which were located inside GENCODE-annotated genes (within-gene SAFB peaks). All GENCODE v.29 genes located on standard chromosomes were used as a control set (n = 58,721). repeatMasker annotation was downloaded from the UCSC genome browser, and the fraction of total length contributed by different transposable elements was calculated for 23,136 SAFB peaks and 58,721 GENCODE-annotated genes, separately for TEs inserted in sense and antisense orientation. Enrichment was calculated for a subset of sense and antisense TEs by dividing the TE fraction in peaks (that is, observed TE fraction) by that in whole genes (that is, fraction expected if SAFB peaks were distributed randomly on transcripts), followed by log₂-transformation of values.

Short-read RNA-seq data analysis

Raw RNA-seq reads were subject to adaptor and quality trimming using cutadapt 4.1. Default options were used, except for -q 16 –trim-n -m 25 -a AGATCGGAAGAGC -A AGATCGGAAGAGC.

Trimmed reads from human and mouse cell lines were mapped to human GRCh38 (HEK293, HeLa and HCT116 cell lines) and mouse GRCm38 (3T3 cell line) genomes using the STAR 2.7.9a aligner⁶². To improve the sensitivity of spliced read detection and quantification, mapping was done in two passes. In the first pass, all reads were mapped simultaneously to the STAR genome index built with GENCODE gene models (v.29 for human, v.19 for mouse) using default options, with the exception of –outFilterMismatchNoverReadLmax 0.05 –outSAMtype None. In the second pass, each sequenced library was mapped to a genome index with GENCODE gene models extended with new splice junctions detected in the first pass (–sjdbFileChrStartEnd pass1.SJ.out.tab). Other non-default STAR options used included –outFilterMismatchNoverReadLmax 0.05 –quantMode GeneCounts –alignIntronMax 1000000 –alignMatesGapMax 2000000 –sjdbOverhang 100 –limitSjdbInsertNsj 2000000.

Trimmed reads from the fruitfly S2 cell line were mapped to the dm6 genome assembly using STAR 2.7.4a, and reads were counted using featureCounts (subread package v.2.0.0).

Differential gene expression

Differential gene expression analysis was performed using the DESeq2 package⁶¹ on reverse-stranded gene counts from the STAR alignment step. Genes with fewer than ten mapped reads were discarded; lfcThreshold = 1 and alpha = 0.05 were used for calling of differentially expressed genes, and results were shrunk using lfcShrink(…, type = “ashr”).

Differential exon usage

To avoid assignment of exonic reads to SAFB peaks, within-gene SAFB peak fragments or entire peaks overlapping GENCODE v.29-annotated exons were masked and ignored in exon usage analysis. The 22,129 peaks remaining (intronic SAFB peaks) were assigned to their host genes and RNA-seq reads were counted on both annotated exons and intronic SAFB peaks using the function Rsubread::featureCounts() with default arguments, except for countMultiMappingReads = FALSE, strandSpecific = 2, juncCounts = TRUE, and isPairedEnd = TRUE. Differentially expressed SAFB peaks were identified using the DEXSeq R package⁶³ and, for each gene, the peak with the lowest DEXSeq P value was used as a reference for gene fragmentation. In total, 5,394 affected genes were fragmented into pre- and post-peak parts. Exonic read counts were aggregated separately for pre- and post-peak fragments and their differential expression measured using DESeq. Genes hosting SAFB peaks with DEXSeq P_adjusted < 0.05 and log-fold change above 2 were classified as (genes with) upregulated peaks (n = 878) whereas those hosting peaks with DEXSeq P_adjusted > 0.05 and log-fold change between −0.5 and 0.5 were used as the control set (n = 1,457).

Differential splice junction usage

The number of RNA-seq reads supporting each splice junction was counted in the second STAR alignment pass (SJ.out.tab file). Splice junctions that could not be unambiguously assigned a host gene, or that were supported by fewer than ten reads in total across all treatments and replicates in a given cell line, were ignored. Differentially used splice junctions were identified using DEXSeq, with default settings; splice junctions were treated as feature IDs and host genes as group IDs.

Splice site strength quantification

For each gene in the human genome, the probability of each nucleotide acting as a splice donor or acceptor was estimated using SpliceAI²⁶, with default options. SpliceAI scores were matched to splice junctions detected and quantified by STAR.

Splice site to TE distance measurement

Distances between splice sites and nearest upstream or downstream TEs were calculated for a set of ten repeat families (L1, L2, Alu, SVA, ERVL, ERV1, TcMar-Tigger, MIR, Simple_repeat, hAT_Charlie) as follows: (1) all GENCODE genes were flattened using the function IRanges::reduce() in R; (2) STAR-detected splice junctions and repetitive elements outside annotated genes were dropped; and (3) for each remaining splice donor and acceptor, the distance (in nucleotides) to the nearest sense or antisense TE within the same flattened gene was measured separately for each of the ten repeat families. Donors and acceptors within TEs were assigned the distance of 0 nt.

New splice acceptors within SAFB peaks in human tissues

The number of reads supporting splice junctions in the GTEx consortium tissue data was extracted using the recount3 R package⁶⁴. Tissues with fewer than 1 billion spliced reads were excluded from further analysis. Alternative splicing was quantified in an intron-centric manner—that is, splicing index was calculated separately for each splice donor and acceptor. We extracted all splice junctions located within an annotated human gene, with splice donor annotated in GENCODE v.29 and splice acceptor sited within a fully intronic SAFB peak (n_peaks = 16,929). A further 21,693 such splice junctions were filtered for junction where the donor participated in multiple events, had a splicing index above 1% in at least one tissue and was supported by at least 500 reads in all 27 tissues (that is, used ubiquitously), resulting in a highly stringent set of of 1,104 splice junctions.

p-SR and DHX9 FLASH analysis

FLASH reads uniquely mapping to the hg38 genome were counted using featureCounts on two custom gene annotation reference sets. The first of these contained exons and SAFB peaks, with exons prioritized over SAFB peaks in the case of overlaps. SAFB peaks were assigned to their host genes and treated as exons for read counting. The second reference contained genes fully fragmented into exons, repetitive elements and introns, with exons prioritized over repeats and introns, and repeats prioritized over introns where their genomic coordinates were overlapping. Whereas the first reference allows for increased sensitivity when quantifying FLASH signal on known SAFB-binding regions, the latter sacrifices sensitivity (because it contains many short genomic fragments) for the power of recognizing regions of increased binding outside of SAFB peaks, or in SAFB peaks not called by the peak-calling software. DEXSeq analysis was performed separately on exon/peak and exon/repeat/intron counts. Regions with adjusted P < 0.05 were considered differentially bound.

Alternative polyadenylation sites

Aligned ONT direct RNA-seq performed on control and triple KD samples was screened for their end coordinates, under the assumption that these are derived from the close proximity of a polyadenylation site. Genomic coordinates of this collection of almost 1.5 million single-nucleotide-resolution read end sites were extended by 50 nt upstream and downstream, and overlapping intervals were collapsed into a total of 274,330 putative polyadenylation regions. The number of control and triple KD reads ending in each of these regions was counted and, for each gene, the fraction of ONT reads ending in each of its polyadenylation regions was calculated separately for control and triple KD libraries. Genes supported by at least 20 reads in which the contribution of at least one polyA isoform was changed by at least 20 percentage points between triple KD and control were considered differentially polyadenylated. In total, 14,148 genes (4,433 of genomic length over 50 kb) were supported by 20 or more reads, and 247 (231 longer than 50 kb) showed differential polyA site usage.

Locus-specific L1 quantification

Raw reads from HEK293 fractionation RNA-seq libraries were aligned to the hg38 genome using bwa aln, and alignments further processed with L1EM⁶⁵, both with default options. L1EM counts from categories ‘only’, ‘3prunon’ and ‘passive_sense’ were summed. These total read counts were combined with read counts on individual genes (GENCODE v.29 annotation), and DESeq2 differential gene expression analysis was performed together on gene and L1 counts, treating L1 elements as independent genes.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Source link

ByAUTHOR