Strange IndiaStrange India

Critical reagents

RNA linker

The RNA linker is a single-stranded chimeric oligonucleotide with 17 DNA nucleotides at its 5′ end (ssDNA; Extended Data Fig. 2e) and 10 RNA nucleotides at its 3′ end (ssRNA; Extended Data Fig. 2e), denoted as 5′-/5OH/CGAGGAGCGCTTNNNNNrArUrArGrCrArUrUrGrC/3OH/-3′, where A, C, G, T and N denote DNA nucleotides, rA, rC, rG and rT denote RNA nucleotides and NNNNN denotes five randomized DNA nucleotides that serve as a UMI. The RNA linker was synthesized by Integrated DNA Technologies (IDT).

The RNA linker is designed for efficient ligation with (1) RNA through the RNA linker ssRNA, and (2) the first set of cell barcodes through the RNA linker ssDNA, which is complementary to the seven nucleotide (nt) overhang in the first set of cell barcodes.

DNA linker

The DNA linker is a hybridized product of two DNA strands, with the top strand being 5′-/5Phos/CTAGACACTGTGCGTATCTNBAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA/3OH/-3′, where N denotes a random base and B denotes any base except A, and with the bottom strand being 5′-/5OH/CGAGGAGNNNNNACAACGCACAGTGTCTAGT/3OH/-3′, where NNNNN denotes five randomized DNA bases that serve as a UMI. Following hybridization, the DNA linker contains 15 bp double-stranded DNA and 36 nt (top ssDNA) and 15 nt unhybridized ssDNA (bottom ssDNA), and a single base (T) overhang at the bottom strand (Extended Data Fig. 2g). The 36 nt top ssDNA is reverse complementary to 10X barcodes in the Chromium Next GEM Single Cell 3′ Reagent kit (PN-1000268). The two strands of the DNA linker were synthesized by IDT.

The DNA linker is designed for (1) efficient ligation with fragmented chromosomal DNA through the 1 nt overhang of the DNA linker by a sticky-end ligation, (2) efficient ligation with cell barcodes through the 15 nt bottom ssDNA of the DNA linker, and (3) efficient hybridation with 10X barcodes through the polyA sequence within the top ssDNA.

Cell barcodes

The cell barcodes contain three sets of barcodes referred to as the first, second and third sets. Every set of cell barcodes has three components, namely a 7 nt top-strand overhang, a 14 bp dsDNA region and a bottom-strand overhang (7 nt for the first and second sets, 11 nt for the third). The 14 bp dsDNA region contains a unique sequence to every cell barcode (double-stranded N14; Extended Data Fig. 2d,i). Every set of cell barcodes contains 96 unique barcodes, each being unique in this 14 bp dsDNA (Supplementary Table 2).

In the current version of MUSIC (v.1.0), each set of cell barcodes contains 96 unique barcodes based on their dsDNA regions, resulting in a total of 884,736 unique sequence combinations. The three sets of cell barcodes are designed to maximize ligation efficiency for sequential ligation of the first set of cell barcodes with the RNA and DNA linkers, the second set with the first set, and the third set with the second set. Optimal ligation efficiency is achieved by the complementarity of the overhang sequences (Extended Data Fig. 2d,i). Out-of-order ligations, such as that of the third set with the first set of cell barcodes, are minimized because the overhang of the third set does not complement with that of the first set.

In addition, the third set of cell barcodes is also designed to complement the 22 nt sequence at the 3′ side of the index adaptors.

The first and second sets of cell barcodes were synthesized by Sigma-Aldrich and the third by IDT.

10X barcodes

The 10X barcodes are included in the Chromium Next GEM Single Cell 3′ Reagent kit (PN-1000268) from 10X Genomics. Each 10X barcode is an 82 nt oligonucleotide with a partial (22 nt) Illumina Read1 sequence (Read1), a 16 nt unique barcode sequence (N16, 10X GEM barcode), a 12 nt UMI (N12, 10X UMI), a 30 nt polyT sequence, a V (A, C or G) and an N (any base) (10X barcode; Extended Data Fig. 2j, k). The 10X GEM barcode is shared among the barcodes of the same GEM; the 10X UMI is unique to every 10X barcode.

Index adaptors

The index adaptors contain three segments, namely the 24 nt Illumina P7 sequence, an 8 nt unique identifier sequence called I7 and the 34 nt Illumina Read2 sequence (Supplementary Table 2). In this release, MUSIC v.1.0 uses eight distinct I7 barcodes providing a total of approximately 83.5 million complex barcodes (8 (I7 barcodes) × 3.5 million (10X barcodes)). Eight index adaptors are used for each library construction, each of which has a unique I7 sequence. We call these eight index adaptors a set of index adaptors, each of which can hybridize with the complementary read2 sequence in the third set of cell barcodes to initiate a PCR reaction.

Meanwhile these serve as sample barcodes. We designed a total of three sets of index adaptors to allow for the construction of three libraries from three input samples and sequencing them together (Supplementary Table 2). These three sets of index adaptors share P7 and read2 sequences and differ by their I7 sequences; they also serve as a sample index to differentiate the three samples. The index adaptors were synthesized by IDT.

Universal adaptor

The universal adaptor contains an Illumina P5 sequence and an Illumina read1 sequence (Supplementary Table 2). The universal adaptor can hybridize with 10X barcodes through their complementary read1 sequence to initiate a PCR reaction. The universal adaptor was synthesized by IDT.

Cell culture

H1 human embryonic stem cells and E14 mouse embryonic stem cells were obtained from the 4D Nucleome Consortium and cultured according to 4D Nucleome Consortium-approved protocols ( In brief, H1 cells were grown at 37 °C under 5% CO2 on Matrigel (Corning, 354277)-coated dishes. Cells were maintained in complete mTeSR medium prepared from basal medium (Corning, 85851) with 5× supplement (Corning, 85852). Medium was replaced daily. Cell passage numbers were kept below P10. E14 cells were cultured on plates coated with 0.1% gelatin (EMD, SF008) in serum-free 2i/LIF medium; this medium was made from base medium (1:1 mixture of NeuroBasal medium (Gibco, 21103-049) and DMEM/F12 medium (Gibco, 11320-033) supplemented with 0.5× N2 supplement (Gibco, 17502-048), 0.5× B27 supplement (Gibco, 17504-044) and 0.05% bovine serum albumin (BSA) fraction V (Gibco, 15260-037)), supplemented with 1 µM PD0325901 (Reprocel, 04-0006-02C), 3 µM CHIR99021 (Reprocell, 04-0004-02C), 0.15 mM monothioglycerol (Sigma, M6145-25ML) and 1,000 U ml−1 LIF (Cell Guidance Systems, GFM200). Medium was replaced daily, and cell passage number was kept below P10.

Crosslinking and nuclei isolation for cell lines

After cells had become confluent in a 10 cm dish, medium was removed and washed once with PBS. Accutase (1 ml; EMD, SF006) was added, with incubation for 3 min at 37 °C to dissociate cells. PBS (10 ml) was used to generate a single-cell suspension by pipetting. Cell pellets were formed by centrifugation at 330g for 3 min. Next, 10 ml of 2 mM disuccinimidyl glutarate (DSG) dissolved in PBS was added to crosslink and resuspend the cells in a LoBind tube, with incubation at room temperature for 45 min under gentle rotation. Following incubation, cells were collected by centrifugation at 1,000g for 4 min to remove DSG solution, washed once with PBS and then centrifuged again at 1,000g for 4 min to remove supernatant. Following washing, cells were thoroughly resuspended in 10 ml of PBS containing 3% formaldehyde and incubated for 10 min with gentle rotation. The crosslinking reaction was stopped by the addition of 3 ml of 2.5 M glycine per 10 ml of 3% formaldehyde, with incubation for 5 min with rotation. Cells were then centrifuged at 1,000g for 4 min to remove supernatant. Next, cells were washed twice with ice-cold PBS containing 0.5% BSA (w/v) and centrifuged at 1,000g for 4 min. Following this wash, cells were resuspended in PBS with 0.5% BSA (w/v), each cell aliquot then containing 5 million cells in a 1.5 ml tube. Cell pellets were obtained by centrifugation at 1,000g for 5 min, snap-frozen in liquid nitrogen and stored at −80 °C.

Frozen cells were thawed on ice and resuspended in 1.4 ml of cell lysis buffer A for every 5 million cells, as previously described12 (50 mM HEPES pH 7.4, 1 mM EDTA pH 8.0, 1 mM EGTA pH 8.0, 140 mM NaCl, 0.25% Triton X-100, 0.5% IGEPAL CA-630, 10% glycerol,  proteinase inhibitor cocktail). For the mixed-species experiment, equal amounts of human H1 (2.5 million) and mouse E14 (2.5 million) cells were resuspended together. Following 10 min incubation on ice, cell pellets were collected by centrifugation at 900g for 4 min at 4 °C. Cell pellets were then resuspended in 1.4 ml of cell lysis buffer B (10 mM Tris-HCl pH 8.0, 1.5 mM EDTA, 1.5 mM EGTA, 200 mM NaCl and proteinase inhibitor cocktail) and kept on ice for 10 min. The nuclei thus isolated were collected at 900g for 5 min at 4 °C. Two hundred microlitres of rCutSmart buffer (NEB, B7204S) containing 0.25% SDS was used to thoroughly resuspend and permeabilize the nuclei, with incubation at 62 °C for 10 min with Eppendorf Thermomixer C. Following incubation, 60 µl of rCutSmart buffer containing 10% Triton X-100 (w/v) was mixed with the SDS solution and the reaction incubated at 37 °C for 15 min while shaking at 800 rpm. Treated nuclei were centrifuged at 900g for 2 min at 4 °C to remove supernatant and washed once with rCutSmart buffer.

Crosslinking and nuclei isolation from postmortem brain

Each 50 mg of postmortem human brain frontal cortex sample was kept on ice in a 1.5 ml LoBind tube and chopped into smaller pieces by pestle. Brain samples were transferred into a 15 ml LoBind tube and incubated at room temperature for 45 min with gentle rotation in 10 ml of 2 mM DSG dissolved in PBS. Following incubation, tissue samples were centrifuged at 1,000g for 4 min and washed once with PBS to remove DSG solution. Following washing, samples were thoroughly resuspended in 10 ml PBS containing 3% formaldehyde and incubated for 10 min with gentle rotation. The crosslinking reaction was stopped by the addition of 4 ml of 1.25 M glycine, followed by incubation for 5 min with rotation. Samples were then centrifuged at 1,000g for 4 min and washed twice with ice-cold PBS containing 0.3% BSA (w/v).

A Chromium Nuclei Isolation kit (10X Genomics, 1000494) was used to isolate nuclei from crosslinked cortex samples according to the manufacturer’s user guide). Specifically, 50 mg of frozen tissue was placed in a prechilled sample-dissociation tube, then 400 µl of the lysis buffer provided was added to the tube and tissues were dissociated until homogeneous using a plastic pestle. Next, 600 µl of lysis buffer was added to the tube and the contents mixed ten times by pipetting. Following 10 min incubation on ice, the solution was evenly loaded into two nuclei isolation columns and centrifuged at 16,000g for 20 s at 4 °C. The flowthrough in the collection tube containing the nuclei was vortexed for 10 s at 3,200 rpm to resuspend nuclei. The collection tube was then centrifuged for 3 min at 500g and 4 °C to pellet the nuclei and the supernatant was removed. The nuclei were resuspended in 500 µl of debris removal buffer provided with the kit by pipetting 15 times, then centrifuged at 700g for 10 min at 4 °C and the supernatant removed. The nuclei were resuspended twice in 1 ml of wash and resuspension buffer. The supernatant was removed following centrifugation at 500g for 5 min at 4 °C, which left a purified pellet of isolated nuclei (Supplementary Fig. 1a,b). All the following steps were identical for both cell lines and human cortex samples.

Ligation of the RNA linker with RNA

Nuclei were resuspended in 250 µl of 5′ phosphorylation master mix (T4 PNK buffer, 500 U ml−1 T4PNK, 1 mM ATP, 1 U µl−1 RNAse inhibitor (Roche, 3335399001)) with incubation at 37 °C while rotating at 800 rpm for 1 h to phosphorylate the 5′ ends of RNA. Nuclei were washed once with PBS wash buffer 1 (PBS, 1 mM EDTA, 1 mM EGTA and 0.1% Triton X-100) and three times with PBS wash buffer 2 (PBS, 0.5% BSA (w/v) and 0.1% Triton X-100). The RNA linker is a single-stranded chimeric oligo with the DNA 5′ hydroxyl group end and the RNA 3′ hydroxyl group (5′-OH-CGAGGAGCGCTTNNNNNrArUrArGrCrArUrUrGrC-OH-3′). A RNA ligation mix was made with 4 µM RNA linker, T4 RNA ligation buffer, 400 U ml−1 T4 RNA ligase 1, 15% PEG 8000, 1 mM ATP and 1 U µl−1 RNAse inhibitor. Isolated nuclei were thoroughly mixed with 250 µl of the RNA ligation mix to ligate the RNA linker with nuclear RNA. The mixture was incubated at 25 °C for 2 h then at 16 °C overnight, with an intermittent mixing at 800 rpm (30 s on and 270 off). Following ligation, nuclei were washed once with PBS wash buffer 1 and three times with PBS wash buffer 2.

Chromatin digestion

All washed nuclei were resuspended in a digestion master mix (300 µl of rCutSmart buffer containing 30 µl of 5,000 U ml−1 HpyCH4V with 1 U µl−1 RNAse inhibitor). This master mix was kept for 3 h at 37 °C while rotating at 800 rpm. Nuclei were collected at 900g for 2 min with the supernatant removed. Nuclei were further washed once with 900 µl of PBS wash buffer 1 and three times with 900 µl of PBS wash buffer 2.

Ligation of the DNA linker with DNA

To create the sticky end for DNA linker ligation, the nuclei was suspended in 250 µl of dA-tailing reaction master mix (NEBNext dA-tailing reaction buffer, 200 U ml−1 Klenow fragment, 1 U µl−1 RNAse inhibitor) and incubated at 37 °C while rotating at 800 rpm for 1.5 h. Next, nuclei were washed once with PBS wash buffer 1 and three times with PBS wash buffer 2. The DNA linker is a hybridized product of two DNA strands, with the top strand being 5′-Phos-CTAGACACTGTGCGTATCTNBAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA-OH-3′ and the bottom-strand 5′-OH-CGAGGAGNNNNNACAACGCACAGTGTCTAGT-OH-3′. The DNA linker contains 14 bp dsDNA, 36 nt top ss DNA and 15 nt bottom ssDNA (Extended Data Fig. 2g,h). A DNA ligation master mix comprised 4.5 µM DNA linker, 0.2× of 2× Instant Sticky-end Ligase Master Mix (NEB, M0370), 0.8× of 5× Quick Ligase Buffer (NEB, B6058S), 6% (v/v) 1,2-propanediol (Sigma-Aldrich, 398039) and 1 U µl−1 RNAse inhibitor). To ligate the DNA linker with the sticky-end DNA, all nuclei were thoroughly mixed with 250 µl of DNA ligation master mix. The ligation reaction was carried out at 20 °C for 6 h with an intermittent mixing at 1,600 rpm (30 s on and 270 s off).

Ligation of cell barcodes

To phosphorylate the 5′ end of the linker, nuclei were resuspended in 250 µl of 5′ phosphorylation master mix and incubated at 37 °C while rotating at 800 rpm for 1 h. Nuclei were then washed once with PBS wash buffer 1 and three times with PBS wash buffer 2. Nuclei were resuspended in 900 µl of PBS wash buffer 2 with 0.2 U µl−1 RNase inhibitor and filtered through a 10 µM cell strainer (pluriStrainer, 43-10010-50). Six microlitres of the nuclear suspension was stained with 6 µl of ethidium homodimer-1, and nuclei counted using a Countess II Automated Cell Counter (ThermoFisher). A total of 288 barcodes were taken from Hawkins et al.56 and split into sets 1, 2 and 3. Each barcode takes the form of 7 nt_overhang-dsDNA-7 nt_overhang (Extended Data Fig. 2d,i and Supplementary Table 2). Those 288 barcodes collected were then termed the cell barcodes. Three ligation master mixes were prepared, each containing one set of barcodes (5.4 µM), 0.2× of 2× Instant Sticky-end Ligase Master Mix, 0.8× of 5× Quick Ligase Buffer, 6% (v/v) 1,2-propanediol and 0.8 U µl−1 RNAse inhibitor. The ligation master mixes were named mixes 1, 2 and 3, corresponding to barcode sets 1, 2 and 3, respectively.

First round of split-pooling

Up to 100,000 nuclei were collected for split-pooling to ensure that the majority were labelled with unique cell barcode combinations. The nuclear suspension was made up to 1,144 µl with PBS wash buffer 2 and 24 µl of RNAse inhibitor and subsequently split into 96 wells. To ligate barcode set 1 with the RNA and DNA linkers, nuclei in each well are incubated overnight in ligation master mix 1 at 20 °C with an intermittent mixing at 1,600 rpm (30 s on and 270 off). Following overnight incubation the reaction was quenched by the addition of 60 µl of quenching buffer (PBS, 50 mM EDTA, 50 mM EGTA, 0.1% Triton X-100) and incubated for 10 min at 20 °C. Nuclear solutions from the 96 wells were pooled into a 15 ml LoBind tube; 95 µl of quenching buffer was then added to each well to rinse and collect any remaining nuclei, with pooling into the same 15 ml tube. Nuclei were centrifuged at 900g for 4 min and then transferred to a 1.5 ml tube with 0.5 ml of the remaining supernatant. PBS wash buffer 2 (500  µl) was used to rinse the 15 ml tube and collect residual nuclei into the same 1.5 ml tube. Nuclei were washed three times with 900 µl of PBS wash buffer 2 by centrifugation at 900g for 2 min.

Second and third rounds of split-pooling

Pooled nuclei were subjected to the same split-pooling procedure as in the first round, except that ligation master mixes 2 and 3 were used for the second and third rounds, respectively, replacing ligation master mix 1.

Addition of complex barcodes

We used a combination of two sets of barcodes to jointly differentiate individual molecular complexes, referred to as the complex barcodes. The first set of barcodes are those 3.5 million oligos provided in the Chromium Next GEM Single Cell 3′ Reagent kit (PN-1000268; 10X barcodes). Each oligo is an 82-base oligonucleotide with a 16 nt barcode and 12 nt UMI (10X BC + UMI; Extended Data Fig. 2j,k). The second set of barcodes is composed of eight barcodes (index barcodes), each being 8 nt (I7 in Extended Data Fig. 1h and Supplementary Table 2).

The nuclei were resuspended in 250 µl of 3′ dephosphorylation buffer (PNK buffer, 0.5 U µl−1 T4PNK, 1 U µl−1 RNAse inhibitor) and incubated at 37 °C for 1 h, with rotation at 800 rpm, to convert any 2′, 3′ cyclic phosphate on RNA to 3′-OH. Nuclei were washed once with PBS wash buffer 1, three times with PBS wash buffer 2 and centrifuged at 900g for 2 min. To add polyA sequences to all RNA molecules, nuclei were resuspended in polyA tailing buffer (E. coli poly(A) polymerase reaction buffer, 0.08 U µl−1 E. coli poly(A) polymerase, 1 mM ATP, 1 U µl−1 RNase inhibitor). The mixture was kept at 37 °C for 10 min while rotating at 800 rpm. Following the addition of polyA tails, nuclei were thoroughly resuspended in PBS with 0.04% BSA (w/v) and filtered through a 10 µM cell strainer (pluriStrainer, 43-10010-50) into a 1.5 ml tube to obtain isolated nuclei. Six microlitres of the nuclei-containing solution was stained with 6 µl of ethidium homodimer-1, and nuclei counted using a Countess II Automated Cell Counter. Five thousand single nuclei were transferred to a Covaris microtube-15, which was then filled to 15 µl with PBS and 0.04% BSA (w/v). Nuclei were sonicated using a Covaris M220 Focused-ultrasonicator (water temperature 6 °C, incident power 50 W, duty factor 5) for 5 min to release chromatin complexes.

To add 10X barcodes to polyadenylated RNA and the top ss end of the DNA linker, sonicated nuclei complexes were transferred into a 1.5 ml LoBind tube and mixed with 25 µl of water, 18.8 µl of reverse transcription (RT) reagent B, 2 µl of reducing agent B and 8.7 µl of RT enzyme C. The mixture was transferred to one well in chromatin immunoprecipitation sequencing G, which was then loaded onto the 10X Chromium controller according to steps 1.1–1.5 in the protocol of the Chromium Next GEM Single Cell 3′ Reagent kit. The retrieved droplets were transferred to a PCR tube for complementary DNA synthesis according to the 10X protocol. The droplets were dispersed, with the aqueous phase obtained according to step 2.1 in the 10X protocol.

The aqueous phase containing nucleic acids was filled to 200 µl with nuclease-free water and split into eight aliquots in LoBind 1.5 ml tubes; next, 25 µl of 2× reverse crosslinking buffer (400 mM NaCl, 0.4% SDS, 50 mM EDTA, 50 mM EGTA, 0.04 U µl−1 proteinase K) was added to each tube and the ensuing reverse crosslinking reaction was incubated at 50 °C for 2 h, then at 55 °C overnight, with shaking at 800 rpm. In each aliquot the reverse crosslinked nucleic acids were purified using the Monarch RNA purification kit (NEB, 76307-460) with elution into 21 µl of nuclease-free water. Eluted DNA and RNA molecules were incubated at 55 °C for 15 min with isothermal amplification buffer II, 0.32 U µl−1 Bst 3.0 DNA polymerase, an additional 6 mM MgSO4, 1.4 mM dNTP Mix and 0.5 U µl−1 RNasin. The product was purified with 1.8× RNA clean Ampure beads (Beckman Coulter Life Science, A63881) and eluted into 20 µl of nuclease-free water. PCR was performed for each aliquot with 2.5 µl of 10 µM shared Universal Adaptor (P5 and read1 in Extended Data Fig. 1h) and 2.5 µl of 10 µM aliquot-specific primer in 25 µl of NEBNext Ultra II Q5 Master Mix. The aliquot-specific primers are the eight index adaptors (‘Critical reagents’). PCR was carried out in 13–14 cycles. Amplified DNA was purified with 1.2× Ampure beads and eluted into 12.5 µl of nuclease-free water. Purified DNA solutions from the eight aliquots were combined and loaded into five lanes of 4% E-gel (Invitrogen, G401004). DNA bands between 300 bp and 1.2 kb were excised. DNA was extracted using the NEB Monarch gel purification kit (NEB, T1020S) with two columns and eluted in 30 µl of elution buffer.


The molarity of the sequencing library was measured using a Qubit 4.0 Fluorometer (Invitrogen, Q33238) and Qubit dsDNA HS assay kit (Invitrogen, Q33231). Fragment size distribution was assessed using an Agilent bioanalyser with high-sensitivity DNA chromatin immunoprecipitation sequencing. The library was sequenced by UC San Diego IGM Genomics Center using an Illumina NovaSeq 6000. The sequencer was set to read a 28 bp sequence next to the universal adaptor as Read1, an 8 bp index sequence from the I7 region inside the index adaptor and a 150 bp sequence next to the index adaptor as Read2.

Computational analysis

The MUSIC-docker data-processing pipeline

We developed MUSIC-docker to process MUSIC sequencing data using Docker to encapsulate a Snakemake57 pipeline, ensuring cross-platform execution. This handles I7 index-split, paired-end fastq files, processes them separately into RNA and DNA sequences, adds cell and complex barcodes and maps to the genome, removing PCR duplicates and deriving processed files. Detailed documentation is available at and Supplementary Note 14.

The raw sequencing output (.bcl) was converted to FASTQ files with bcl2fastq, producing eight FASTQ files with 28 bp read1 and 150 bp read2. Read1 includes the 10X barcode and 10X UMI, with read2 containing cell barcodes, the RNA and DNA linkers and insert sequences.

The demultiplexing step extracts cell and complex barcode information and fragment identity information, creating separate FASTQ files for RNA and DNA reads individually in which the read sequence will be the insert and the read name will be the fragment identity information.

To address potential artificial sequences introduced by sequencing errors and experimental design, we remove consecutive As or Gs from the 3′ end of DNA inserts if their length is greater than 20 bp. For RNA reads we detect the ssDNA region of the RNA linker sequence CGAGGAGCGCTT and remove any sequence following it. We use cutadaptor (v.2.8)58 with the parameter ‘-q 15 -m 20’. DNA and RNA inserts are mapped to the genome using bowtie2 (v.5.4.0)59 with the parameters ‘bowtie2 -p 10 -t –phred33 -x’, and bwa (v.0.7.17)60 mem with parameters ‘-SP5M’, respectively. Uniquely mapped reads are selected for downstream analysis.

Following reads mapping, PCR duplicates, identified by shared 10X UMI and mapped coordinates, are removed using customized script. This script sorts the BAM file, scans it once and flags duplicates if they meet specific criteria: (1) it maps to a location within 8 bp of the previous read; (2) it shares the same cell and molecular barcodes with the previous read; and (3) its UMI exhibits a Levenshtein distance of less than 2 bp from the UMI of the previous read. All identified PCR duplicates are subsequently removed to ensure the integrity of downstream analyses.

Finally, deduplicated BAM files from each I7 index library are merged into a comprehensive, sorted BAM file capturing essential information for both DNA and RNA, including cell and molecular barcodes (10X and I7 index) and insert mapping location.

Mixed-species experiment

Based on a previously published method12, we assigned a cell to a species if 95% of its DNA reads could map to a single species. Cells with uniquely mapped, non-duplicated DNA reads of less than 1,000 were classified as ambient cells. To calculate single-cluster mixed-species rate we assigned a cluster to a single species if more than 99% of its uniquely mapped non-duplicated DNA reads came from that species.

Parsing multiplex clusters into pairwise interactions with normalization

Each cluster is a collection of reads that share the same cell, 10X and I7 barcodes (CB + GEM + I7). A cluster of two reads (cluster size 2) corresponds to a pairwise interaction and a cluster of three or more reads corresponds to a multiplex cluster. Each multiplex cluster is decomposed into pairwise interactions with a normalization procedure that adjusts for the total number of combinations of pairwise interactions from a multiplex cluster, as previously described12,22. For homotypical clusters, which are clusters containing exclusively DNA or RNA reads, each homotypical cluster of size N is first decomposed into all non-overlapping pairs and then each pair is normalized by a factor of 1/N. For heterotypic clusters, which are clusters containing both DNA and RNA reads, each heterotypic cluster is first decomposed into all DNA–RNA pairs and then each pair is normalized by a factor of 1/(M + N), where M and N are the numbers of DNA and RNA reads, respectively. This normalization process removes the difference in the number of pairwise decompositions from different-sized clusters, thus ensuring that larger clusters are not inflated in regard to the number of decomposed pairwise interactions12,22.

To generate a two-dimensional contact map of DNA–DNA interactions we first define the size of each unit, typically represented as genomic bins, for rows and columns. The weight assigned to bin [i, j] is determined by the total number of clusters containing DNA reads mapped to both the ith and jth bin, and this is normalized by cluster size as previously described. To compare DNA–DNA contact differences within a specific genomic region, cluster sizes are calculated using only DNA reads mapped to that region, considering small, medium and large clusters. If the DNA–DNA contact map is generated for multiple cells, the weighted sum of all clusters is calculated. Similar methods are applied to derive RNA–DNA two-dimensional contact maps, with the calculation of the weight of each interaction adjusted accordingly.

To determine RAL at specific genomic bins, representing one-dimensional RNA–chromatin contacts for a particular RNA of interest, we calculate the total weighted RNA–DNA interactions involving the RNA of interest and DNA ends mapped to the desired genomic bins. For ensemble maps, RAL values from individual cells are summed.

Finally, for visualization of the two-dimensional contact maps of DNA–DNA or RNA–DNA interactions, the raw contacts obtained following previously mentioned procedures are scaled up using a linear factor, typically 100 or 1,000. This amplification step prevents the application of logarithmic transformation to decimal values. Following amplification, a logarithmic transformation is performed to enhance visualization of the contact maps.

Comparison of MUSIC DNA–DNA contacts with Micro-C data

To achieve consistency in the scale of total contacts between MUSIC and Micro-C, contact maps obtained from both methods underwent a standardized transformation. Initially the contact maps derived from Micro-C or MUSIC were logarithmically scaled; these were then normalized to their respective maximum values, resulting in all contacts being constrained within the range of [0, 1]. The Micro-C data used in this study were obtained from the 4DN portal61 (4DNFI2TK7L2F)34. To extract raw contacts from the .hic file, the straw62 tool was employed.

To calculate the compartment score (PC1 score) for both Micro-C and MUSIC DNA–DNA contact matrices we first computed the expected contact matrix. Subsequently we determined the PC1 score from the correlation matrix of the observed/expected ratio matrices, in which the expected matrix was derived by calculating average contact frequencies as a function of genomic distances. For comparison of PC1 correlations between Micro-C and MUSIC, the calculations were performed individually on each chromosome. The reported correlation represents the median of these correlations across all chromosomes.

nsaRNA and pre-mRNA RAL

Pre-mRNAs are RNA reads that exhibit an overlap of at least 15 bp with gene introns and are classified as protein-coding RNAs. The calculation of RNA–DNA cluster weight involves the inclusion of all nsaRNAs and pre-mRNAs, along with their associated DNA reads. RNA–DNA clusters of size not exceeding 1,000 were selected for the computation of RAL. The H1 A/B compartment data used in this study were obtained from the 4DN data portal (4DNFID162B9J)2.

Calculation of genomic distance versus contact frequency curve from MUSIC data

The relationship between contact frequency and genomic distances within chromosomal arms was systematically examined. Initially the genomic distance range (from 10 bp to 150 megabases) was divided into 500,000 equally sized bins. Subsequently, for DNA reads originating from DNA–DNA or RNA–DNA clusters, the genomic distances of all intrachromosomal pairwise interactions were determined and the frequencies for each genomic distance bin were computed. These frequencies were then normalized based on the weight assigned to each interaction. Normalized frequencies were calculated for each genomic bin in every cluster and single cell. Finally, genomic distances versus contact frequencies were aggregated across all clusters from all single cells.

We used cooltools (v.0.5.4)63 to generate the curve in Micro-C data representing genomic distance versus contact frequency. We downloaded the .mcool file for H1 Micro-C from the 4DN data portal (4DNFI9GMP2J8)34.

Preprocessing and filtering of single brain cells

For each brain sample we applied standard MUSIC_docker pipelines to obtain valid RNA and DNA reads information for each single cell. To select high-quality single cells for robust analysis and interpretation of our data, we removed cells with fewer than 100 RNA reads or fewer than 5,000 DNA reads for downstream analysis on brain samples.

Transcriptome merging of brain samples

We first constructed the single-cell RNA expression count matrix by calculating the number of RNA reads mapped to each human gene (GENCODE64, v.36; chrM genes are excluded) for all single cells from all 14 brain samples. We then constructed one Seurat object (Seurat v.4.3.0)65,66 for each brain sample with the parameters ‘min.cells = 2 and min.features = 200’ (filtering out genes expressed in no more than two cells and filtering out cells with no more than 200 expressed genes). The count matrix from all brains was then integrated using RunHarmony from the harmony R package (v.0.1.1)60 based on STransform processed data, and regressed out on factors including individual library and experimental batches.

Expression of brain genes

For comparison of gene expression in brain cells we used the LogNormalize method from the Seurat R package. Raw reads counts were first normalized by library size and then log-transformed.

Single-cell clustering and cell type identification

The integrated brain object was then subjected to dimensionality reduction by UMAP methods based on the first 20 principal components from PCA using the Seurat R package. All cells are then clustered in an unsupervised method using a shared nearest-neighbour graph based on k-nearest neighbours (k = 20) calculated from the top two coordinates of UMAP. Clusters were then derived by optimization of the modularity function using the function FindClusters with parameter resolution at 0.05. Excitatory and inhibitory neurons were clustered by first extracting the subset of cells and reclustering at a resolution of 1.

We assigned cell types to each cluster by the known cell-type-specific marker gene expression level. Cell-type-specific marker genes are based on previous publications: for major brain cell types38, subclusters (Azimuth)67 and vascular cells68. Each cluster is assigned to one cell type (A) if two times the average expression of all marker genes of cell type A plus the proportion of cells in the cluster expressing marker genes for cell type A are higher than any other cell types. We designed this score that takes both cell type marker gene expression level and proportion of cells expressing the genes into consideration, but with a greater emphasis on expression level.

Single-cell LCS-erosion score and transcriptomic age calculation

We calculated the LCS-erosion score for each cell by determining the middle point of the genomic distances of the top ten most frequently contacting genomic bins. To derive the contact frequency versus genomic distances heatmap for all frontal cortex cells we first generated 149 genomic bins spanning from 5,000 bp to 150 megabase pairs, with the nth bin spanning the genomic region \(({2}^{{\log }_{2}(5,000)+\frac{\{{{\rm{l}}{\rm{o}}{\rm{g}}}_{2}(1.5\times 1{0}^{8}\}-{{\rm{l}}{\rm{o}}{\rm{g}}}_{2}(5,000))}{150}\times (n-1)}\), \({2}^{{\log }_{2}(5,000)+\frac{\{{{\rm{l}}{\rm{o}}{\rm{g}}}_{2}(1.5\times 1{0}^{8}\}-{{\rm{l}}{\rm{o}}{\rm{g}}}_{2}(5,000))}{150}\times n})\), where n = 1,2,3,..,150. Subsequently, for DNA reads originating from DNA–DNA or RNA–DNA clusters, the genomic distances of all intrachromosomal pairwise interactions were determined and the frequencies for each genomic distance bin calculated. These frequencies were then normalized against bin size. For each single cell the total contact frequency was normalized against total frequency within each cell before plotting the heatmap. Cells exhibiting an LCS-erosion score exceeding 3 × 105 were classified as LCS-eroded cells, with the remainder considered LCS-preserved.

For assessment of the transcriptomic-based biological age of cells we implemented the SCALE methodology described in a previous study43. This approach involves using a list of human ageing-associated genes obtained from Initially we determined the direction (either +1 or −1) of each marker gene based on its correlation with the chronological age of the sample. This direction depended on whether gene expression was positively (+1) or negatively (−1) correlated with age. Subsequently, for each marker gene, we assigned a weight computed as the proportion of cells expressing that gene multiplied by its directional value. Finally we calculated the transcriptomic age for each cell using the dot product of gene expression z-scores and corresponding gene weights.

LCS-erosion score-associated transcriptome functions

To identify genes whose expression is significantly associated with chromatin LCS-erosion score, we applied analysis of variance (ANOVA) between the expression of each gene and LCS-erosion score across all cells. Genes with P < 0.01, F > 1 and the absolute value of Spearman correlation between LCS-erosion score and gene expression greater than 0.1 were considered significant. To identify cell-type-specific LCS-erosion score-associated genes we applied ANOVA individually within each cell type. We used the R package gprofiler2 (ref. 69) for pathway enrichment analysis. Enriched wikipathway, KEGG and REACtom pathways are shown.

eQTL analysis

All brain cell-type-specific eQTL data were downloaded from ref. 45. We compiled a combined dataset by selecting eQTLs with nominal P < 1 × 10−4 and removing those in endothelial cells and pericytes. MUSIC DNA–DNA contacts between eQTL and their target genes are read pairs where one end is mapped (one base overlap) over the eQTL and the other over the gene promoter (2.5 kb flanking region from the transcriptional start site). The global chi-square test was performed on a 6 × 6 table to test the overall association of cell-type-specific DNA–DNA contacts with cell-type-specific eQTL–gene pairs. The 6 × 6 table was then reduced to six 2 × 2 tables to test that association in each cell type (chi-square test). The 95% confidence interval of the odds ratio was calculated as exp(log(odds ratio) ± 1.96 × SELOR), where SELOR is the standard error of log(odds ratio).

Analysis of XIST–chromatin interactions

To derive the one-dimensional XIST-genome RAL we extracted all clusters containing at least one XIST RNA. For each cluster with M XIST RNA reads and N DNA reads we derived M × N paired XIST–DNA interactions, each of which was then normalized according to its cluster size 1/(M + N). We then binned the whole genome at 1 Mb resolution. The XIST RAL for each bin is the total weight of all XIST–DNA interactions with DNA ends overlapped with that bin.

To derive the two-dimensional RNA–DNA contact map for chromosome X we extracted all RNA and DNA reads that can map to chromosome X. Next, for any matched molecular complex barcode with M chromosome X RNA reads and N chromosome X DNA reads, we again first derived all combinations of RNA–DNA interactions, with adjusted weight indicating the reverse of cluster size (1/M + N). We binned chromosome X at 1 Mb resolution; M[i, j] then represents the sum of weighted interactions whose RNA ends mapped to the ith bin and DNA ends mapped to the jth column.

XAL stratification and corresponding genomic distance versus contact frequency

XIST–chromosome X association levels are numerically represented as the number of XIST-attached chromosome X DNA bins; each individual cell will have one XAL value. To assess the differences in chromatin organization between XIST+ and XIST clusters, we stratified all brain cells into four groups based on their XAL value: group 1, with zero XAL, which includes all cells with no detectable XIST–chromosome X association; and groups 2–4, with increasing XAL values, which include all cells with an XIST–chromosome X association, split equally between these three groups. To derive genomic distances and contact frequency relationships we follow the methods introduced in ‘Calculation of genomic distance versus contact frequency curve from MUSIC data’.

Human sample acquisition

The acquisition of postmortem brain tissue was conducted at Banner Sun Health Research Institute with Institutional Review Board approval (study 1132516, investigator T. Beach). Informed consent was obtained from all tissue donors.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Source link


Leave a Reply

Your email address will not be published. Required fields are marked *