Strange India All Strange Things About India and world

PDMS chip fabrication

Moulds were 3D-printed by Proto Labs with WaterShed XC 11122 at high resolution (Supplementary Fig. 1). Reusable moulds were assembled by taping 3D-printed molds to glass slides (5″ × 4″; Ted Pella). Sylgard 184 PDMS monomer and curing agent (4019862, Ellsworth Adhesives) were mixed at a 10:1 (w/w) ratio. The mixture was degassed using a desiccator connected to a vacuum pump, poured over the mold and degassed again until there were no air bubbles. The mold was incubated for at least 16 h at 50 °C. Individual PDMS chips were cut along the lines that form the outer rectangle on the design in Extended Data Fig. 1a. The 5-mm diameter elution well and trailing electrolyte and leading electrolyte reservoirs were made with a biopsy punch (Extended Data Fig. 1b). Before the plasma treatment, glass slides (4″ × 3″; Ted Pella) and the feature side of the PDMS slabs were thoroughly cleaned with tape to remove any dust particles. PDMS chips and glass slides were plasma cleaned with a 115 V Expanded Plasma Cleaner (Harrick Plasma) connected to a Dry Scroll Pump (Agilent) for 2 min at high radio frequency level. The plasma-treated surfaces of the glass and PDMS slabs were immediately brought together to form a covalent bond. Bonded chips were heated at 80 °C on a heat block for at least 2 h to enhance bonding.

PDMS chip preparation for Ribo-ITP experiments

To ensure clean, RNAse-free chips, we pre-treated the channels and reservoirs of the Ribo-ITP chip by sequential treatment with the following solutions: RNaseZap (100% concentrate), nuclease-free water, 1 M NaOH, nuclease-free water, 1 M HCl, nuclease-free water, 10% (w/v) benzophenone in acetone (for 10 min, replenishing channels as needed to avoid bubble accumulation), methanol and 0.1% Triton X-100. The channel was completely dried after final treatment by fully vacuuming out any remaining liquid in the channel. After securing the chips to a ProteinSimple 302/365nm UV Transilluminator with tape, we added 10% polyacrylamide prepolymer mix (Supplementary Table 6) to the size-selection channel through the elution well. Similarly, 5% polyacrylamide prepolymer mix was loaded into the extraction channel through branch channel 2. To catalyse the polymerization of polyacrylamide on chip, we used a photoactivatable azo-initiator, 2,2′-azobis[2-methyl-N-(2-hydroxyethyl)propionamide] (VA-086, Wako Chemicals), at 0.5% final concentration in the prepolymer mixes. UV-driven polymerization (wavelength of 365 nm) was performed for 1 min followed by a 30-s break. This on–off UV cycle was repeated two more times for a total UV exposure time of 3 min. UV intensity was measured as approximately 8.9 mW cm2 using a G&R Labs Model 200 UV Light meter with a 365-nm probe. To avoid dehydration of the polyacrylamide gels after polymerization, we filled any open channels and reservoirs with storage buffer (Supplementary Table 6) until use. The chips were protected from light and used within 6 h of preparation.

ITP setup

The prepared ITP chip was placed on a Dark Reader blue light transilluminator (Clare Chemical) and secured with tape. Storage buffer was removed from the channels and reservoirs using a vacuum. Leading electrolyte pluronic solution (LEp) and MOPS trailing electrolyte pluronic solution (TEp) (Supplementary Table 6) were kept on ice throughout the loading procedure. Pipet tips (200 µl) were kept at −20 °C until the time of the experiment to facilitate manipulation of the pluronic-containing LEp and TEp solutions, which solidify within a minute above 4 °C. Of LEp, 80 µl was loaded in leading electrolyte reservoir 3, filling the reservoir to the top as well as the small section of the channel between the elution well and leading electrolyte reservoir 3 (Extended Data Fig. 1b). Leading electrolyte reservoir 2 was filled with 30 µl LEp, ensuring contact with the polyacrylamide gel present in branch channel 2. The elution well was filled with 20 µl of running buffer (RB). Fluorescent marker oligonucleotides containing a 5′ ATTO fluorophore and 3′ ddC blocking modification (Supplementary Table 7) were added to the sample followed by dilution with sample dilution buffer. The mixture was loaded into the lysate channel through leading electrolyte reservoir 1. Finally, leading electrolyte reservoir 1 was filled with 30 µl LEp and 70 µl TEp was added to the trailing electrolyte reservoir. The negative electrode was placed in the trailing electrolyte reservoir and the positive electrode in the leading electrolyte reservoir. Positive and negative electrodes were placed in leading electrolyte reservoir 3 and the trailing electrolyte reservoir, respectively. A constant current of 300 mA with a maximum voltage of 1.1 kV (Keithley 2410 Sourcemeter) was applied to the channel. Once the trailing end of the fluorescent markers entered the 5% polyacrylamide gel, the branch channel electrode—with a lower current output due to a 510 kΩ (Xikon) resistor on a custom circuit board—was manually applied in leading electrolyte reservoir 1 for approximately 10 s. When the leading edge of the shorter fluorescent marker reached the end of the size-selection channel, the current was suspended. The elution reservoir was thoroughly washed twice with 30 µl nuclease-free water and refilled with 10 µl dephosphorylation buffer (Supplementary Table 6). Current was applied again until the longer fluorescent marker began to enter the elution well. Finally, the purified sample with a 10-µl volume was collected from the elution well into a low-bind PCR tube and immediately stored at −80 °C.

PAGE and conventional extraction of RNA

Control inputs were prepared as a master mix then aliquoted. For gel extraction samples, input RNA was first processed using Qiagen miRNeasy Micro Kit per the manufacturer’s instructions. RNAs were separated by electrophoresis using 15% TBE-urea polyacrylamide gel (EC6885BOX, Invitrogen). Gel slices were excised and crushed using sterile pestles, followed by soaking in gel extraction buffer57 (Supplementary Table 6) on dry ice for 30 min. Samples were then incubated overnight at room temperature, gently transferred on a tabletop shaker and protected from light. Residual gel pieces were removed by centrifugation for 1 min at 21,130g through a Corning 0.22-μm sterile filter tube. The recovered eluate was precipitated overnight at −20 °C (300 mM sodium acetate (pH 5.2), 5 mM MgCl2, 1.5 µl Glycoblue and 75% ethanol). Samples were pelleted by centrifugation at 4 °C for 1 h at 21,130g.

Gel imaging and quantification

To quantify yield, samples were run on a 15% TBE-urea polyacrylamide gel and visualized using the fluorescent marker oligonucleotides or by SYBR gold staining. Specifically, gels were imaged using Typhoon FLA 9500 (GE Healthcare) with a 473-nm excitation wavelength and low pass band filter compatible with ATTO 488 fluorophore and SYBR gold stain. For high-resolution imaging, pixel size was minimized (10–25 µm) and photomultiplier tube (PMT) settings were optimized by using the scanning feature of Typhoon to avoid image oversaturation, typically resulting in a value between 250 and 500 V. The images were analysed using ImageJ software v.1.52 (NIH). The raw integrated density (RID) for background signal (RIDbackground) was measured by quantifying average RIDs from representative blank areas. RIDbackground was normalized to account for the ratio of the target (Asample) to the background area (Abackground) such that

$${{\rm{Background}}}_{{\rm{normalized}}}={{\rm{RID}}}_{{\rm{background}}}\times ({A}_{{\rm{sample}}})/({A}_{{\rm{background}}})$$

The normalized background value was subtracted from all samples to quantify normalized sample RID values. The percent yield was defined as the ratio of the normalized RID values to the mean of background-normalized input samples. For display purposes only, the contrast and brightness of some images were adjusted in ImageJ and exported as tiff files for figures. The full scans are displayed in Supplementary Fig. 1.

Yield comparison between on-chip method and conventional RNA extraction

Input controls and experimental samples were prepared with a final total amount of 40 ng, 20 ng, 2 ng, 400 pg or 40 pg of ZR small RNA ladder (R1090, Zymo Research) including 17, 21, 25 and 29-nt RNA oligonucleotides. Ribo-ITP was performed as described, with a final elution in 12 µl RB. Samples for gel extraction were first processed with the miRNeasy Micro kit (Qiagen), followed by extraction using the crush-and-soak approach57. Only the 25-nt and 29-nt bands were extracted. For 40 ng and 20 ng samples, fluorescent marker oligonucleotides were spiked into each sample and a final 15% TBE-urea polyacrylamide gel was run as described above. Only the 25-nt and 29-nt bands were quantified to determine the final yield.

To quantify yield for the ultra-low input samples (2 ng, 400 pg and 40 pg inputs), all experimental and input control samples were brought to 16 µl with nuclease-free water. Subsequently, 2 µl T4 polynucleotide kinase (PNK) buffer, 1 µl T4 PNK (NEB) and 1 µl ATP [γ-32P]-3000 Ci mmol1 (10 mCi ml1) (NEG002A500UC, Perkin Elmer) were added and incubated for 30 min at 37 °C. After incubation, unincorporated nucleotides were removed with the RNA Clean and Concentrator-5 kit (R1013, Zymo Research) according to the manufacturer’s instructions. RNA was eluted with 14 µl nuclease-free water, mixed with 2× denaturing gel loading dye (Supplementary Table 6) and denatured for 90 s at 80 °C. The samples were electrophoresed, then the gel was incubated in nuclease-free water for 5 min followed by a 30 min incubation in a 30% methanol and 5% glycerol solution. Both incubations were done on a rocking platform at room temperature. After the incubations, the gel was placed between pre-wetted cellophane sheets (1651779, Bio-Rad) and dried for 2 h in a GelAir drying system (Bio-Rad). The dried gel in cellophane was exposed for at least 12 h to a BAS-IP MS phosphor screen (28956475, GE Healthcare). The phosphor screen was imaged with a Typhoon FLA 9500 (GE Healthcare) using 500 V PMT at 50-µm resolution. The image was visualized and quantified using ImageJ software; only the 25-nt band was quantified as described above. All samples were processed in quadruplicate, with the exception of the Ribo-ITP sample with an RNA ladder input of 20 ng (n = 3).

Cell culture

Human K562 cells were grown in RPMI 1640 medium (Gibco) supplemented with 10% fetal bovine serum (Gibco) and 1% penicillin–streptomycin (Gibco) and incubated at 37 °C with 5% CO2 to a density of approximately 2.5 × 105 cells per ml. Cells were regularly tested for mycoplasma contamination. The identity of the K562 cell line was authenticated using short tandem repeat profiling from the American Type Culture Collection.

Size selection of purified RNA

To demonstrate the size-selection capacity of our on-chip approach, we prepared an MNase-digested RNA sample from K562 cells. In brief, 3 µl MNase (M0247S, NEB) was added to a clarified K562 lysate from approximately 5 million cells and digested for 30 min at 37 °C, followed by RNA extraction with the miRNeasy Micro kit (Qiagen) per the manufacturer’s instructions. Ribo-ITP inputs contained 100 ng of the digested, purified RNA. Ribo-ITP was performed as described, with modifications to the collection method. Once the fluorescent marker band reached the interface of the 5% and 10% polyacrylamide gels, the current was suspended and RB was replaced with 12 µl of fresh RB. Ribo-ITP continued until the first fluorescent marker reached the edge of the elution well. The 12 µl of RB in the elution well was collected as fraction 1. The well was washed twice with RB then refilled with 12 µl RB. Current was applied again until the front edge of the trailing fluorescent marker began to enter the elution well, and the 12 µl RB elution was collected as fraction 2. The elution well was refilled with 12 µl RB and Ribo-ITP was continued for 2 min. The final 12 µl elution was collected as fraction 3. Control inputs were prepared with the same amounts of bulk RNA and fluorescent markers, then brought to 12 µl with RB. Gel electrophoresis, imaging and quantification were performed as described.

Ribosome profiling sample preparation and monosome isolation

Approximately 10 million K562 cells were pelleted, washed twice with PBS and immediately flash-frozen in liquid nitrogen. Cells were lysed in 400 µl cold lysis buffer (Supplementary Table 6) for 10 min on ice and pipetted to homogenize. The lysates were clarified by centrifugation at 1,300g for 10 min at 4 °C. Clarified supernatants were digested with 5 µl MNase (M0247S, NEB) and incubated for 30 min at 37 °C. Digestions were stopped with 20 mM ribonucleoside vanadyl complex (S1402S, NEB). The samples were then loaded onto 20–50% sucrose gradients and ultracentrifuged in a SW41 Ti swinging-bucket rotor (331362, Beckman) at 38,000 rpm for 2.5 h at 4 °C. The samples were fractionated using a Biocomp gradient fractionator. RNA was extracted from the monosome fractions with the miRNeasy Micro kit (Qiagen). One-third of the eluate was electrophoresed through a 15% TBE-urea polyacrylamide gel. The ribosome footprints of approximately 17–35 nt were gel extracted using the crush-and-soak method as described. Final sample resuspension after ethanol precipitation was in 18 µl of nuclease-free water. The purified RNA was dephosphorylated with 1 µl of T4 polynucleotide kinase (NEB) in 1× T4 PNK buffer for 1 h at 37 °C. Dephosphorylated ribosome footprints were then ethanol precipitated (300 mM sodium acetate, 2.5 volumes of ethanol and 1.5 µl of GlycoBlue) overnight at −20 °C. Precipitated RNA was eluted in 10 µl nuclease-free water. The RNA was normalized to 350 ng in 6 µl of nuclease-free water before library preparation.

For 100-cell ribosome profiling experiments, K562 cells were pelleted, washed twice with PBS and diluted to 100 cells in 5 µl cold lysis buffer containing cycloheximide. MNase stock (2,000 gel units per microlitre; NEB) was diluted 1:50 and 1 µl of the dilution was added to the samples. Digestion was performed for 30 min at 37 °C in a thermal cycler with a heated lid. EGTA (1 µl) was added to a final concentration of 10 mM to inhibit further digestion. Samples were placed on ice until processing through Ribo-ITP. Three replicates each were prepared for conventional ribosome profiling and Ribo-ITP.

For single-cell Ribo-ITP experiments, cells were pelleted at 300g for 5 min. The cells were washed with 1× PBS and resuspended to achieve approximately 1 × 106 cells in 1 ml of PBS containing 0.1% BSA (Sigma) with 1 µg ml−1 DAPI. The cells were passed through a strained cap to attain a single-cell suspension and sorted with the Sony MA9000 Cell Sorter or BD FACSAria Fusion Flow Cytometer into Eppendorf LoBind 96-well plates containing 5 µl cold lysis buffer with cycloheximide. Singlet cells were defined by gating on FSC-A/SSC-A, SSA-H/SSC-W, FSC-A/FSC-H and FSC-A/histogram. Live cells were selected using DAPI-negative gating. The plates were sealed and flash frozen in liquid nitrogen immediately after the sort was completed. The lysate was incubated at 37 °C with 1 µl of a 1:150 dilution of MNase stock (2,000 gel units per microlitre; NEB) for 30 min or 1 µl of a 1:300 dilution of RNase I (100 U µl−1; Ambion) for 15 min. The MNase digestion was stopped by adding EGTA to a final concentration of 10 mM. Sodium dodecyl sulfate was added to a final concentration of 0.1% to samples digested with RNase I. The lysates were held on ice until processing through Ribo-ITP.

Mouse oocyte isolation

All experiments using mice by the Mouse Genetic Engineering Facility were approved by the Institutional Animal Care and Use Committee at the University of Texas at Austin (protocol ID: AUP-2022-00114). Mice were housed at 22 °C (range 20–24 °C) under 12 h of light–dark cycles. The humidity was not controlled. Oocytes were collected from superovulated C57BL/6J female mice (approximately 8 weeks old) as previously described18. One hour after human chorionic gonadotropin (hCG) injection, the ovaries were placed in a 3-cm dish containing FHM medium (F1114, Cytospring), and germinal vesicle-stage oocytes were released by scraping the surface of the ovaries with #5 Dumont forceps (Roboz). MII-stage oocytes were isolated from the oviducts approximately 14 h after hCG injection. Cumulus cells were removed from the oocytes by treatment with 1 mg ml−1 hyaluronidase (H3884, Sigma) in FHM medium. Both germinal vesicle-stage and MII-stage oocytes were rinsed through three drops of FHM medium and then through three drops of 20 mg ml−1 BSA (A3311, Sigma) in PBS (SH30028.02, Hyclone). The oocytes were placed individually in 0.2-ml PCR tubes using a finely pulled glass pipette under a stereomicroscope and flash frozen in liquid nitrogen. The liquid volume transferred with the oocytes was less than 0.5 µl. No statistical analyses were used to determine sample size. Given the observational nature of the study, no randomization or blinding was used.

In vitro fertilization using CAST/EiJ sperm

Sperm was frozen from CAST/EiJ male mice as previously described58 and stored in liquid nitrogen. For in vitro fertilization, oocytes were isolated from C57BL/6J female mice approximately 15 h after hCG injection, and in vitro fertilization was performed using thawed CAST/EiJ sperm59. One-cell, two-cell, four-cell and eight-cell embryos were collected 21.5, 39, 62 and 69 h after hCG injection, respectively. Fertilized oocytes were cultured overnight to the two-cell stage in a 150 µl drop of HTF medium (mH0113, Cytospring). For development to the four-cell and eight-cell stages, two-cell embryos were cultured in KSOM medium (K0114, Cytospring). Embryos were placed individually into 0.2-ml PCR tubes and flash frozen in liquid nitrogen. All samples were processed with Ribo-ITP within 48 h of collection.

A working lysis buffer solution was prepared by adding 1 µl of the MNase (NEB) (1:50 dilution) per 5 µl lysis buffer. To lyse the mouse samples, 6 µl of working lysis buffer was added directly to the frozen cell-containing droplet. Digestion was immediately performed for 30 min at 37 °C in a thermal cycler with a heated lid. EGTA (1 µl) was added to a final concentration of 10 mM to inhibit further digestion. Samples were placed on ice until processing through Ribo-ITP.

Ribosome profiling library preparation and sequencing

Conventional ribosome footprint libraries following monosome isolation (that is, 350 ng RNA samples in 6 µl nuclease-free water) were generated using the Clontech SMARTer smRNA-seq kit using eight PCR cycles (Takara Bio). Of the PCR, 30 µl was purified with AMPure XP beads (A63880, Beckman Coulter) according to the manufacturer’s instructions and eluted with 30 µl nuclease-free water. The final size selection was performed with the BluePippin system (Sage Science) using 3% dye-free agarose cassettes (BDQ3010, Sage Science).

For Ribo-ITP experiments with human K562 cells and mouse samples, the D-Plex Small RNA-seq kit (C05030001, Diagenode) with minor modifications was used as detailed below. The dephosphorylation reaction was supplemented with 0.5 µl T4 PNK (NEB) and the reaction was incubated for 25 min. For reverse transcription, the template switching oligo was diluted 1:2 in nuclease-free water. All 100-cell human samples and three of the MII-stage oocytes were processed using the single index module; whereas the other mouse samples were processed using the unique dual index module. Half of the complementary DNA (cDNA) was amplified for 17 PCR cycles and a 1:4 dilution of the resulting library was assessed by the Agilent Bioanalyzer High Sensitivity DNA kit. The concentrations of the target peaks were used to pool samples with approximately equimolar representation. AMPure XP bead cleanup (1.8×) was performed followed by size selection using 3% agarose, dye-free gel cassettes with internal standards (BDQ3010, Sage Science) on the BluePippin platform. Tight parameter settings of the 173–207-bp range were used for samples prepared with the single index module. Tight parameter settings of the 183–217-bp range were used for samples prepared with the unique dual index module. For the RNase I-digested single-cell libraries, final size selection was performed by PAGE purification of 200-bp products. Samples were sequenced on an Illumina NovaSeq 6000. For mouse samples, five, five, five, three, three and four biological replicates were used for germinal vesicle, MII, one-cell, two-cell, four-cell and eight-cell stages, respectively.

Single-cell and single-embryo RNA-seq

Total RNA-seq libraries were prepared with Smart-seq3 V.3 (ref. 60), with modifications. Unfertilized mouse samples (germinal vesicle and MII) and in vitro-fertilized mouse samples (one-cell, two-cell, four-cell and eight-cell stage) were lysed and reverse transcribed as described. cDNA was pre-amplified with 13 PCR cycles and bead purified with AMPure XP (1.8×) with a final elution in 5 µl nuclease-free water. Of pre-amplified cDNA, 1 µl was assessed by the Bioanalyzer High Sensitivity DNA kit to confirm successful pre-amplification and proper size profile. Another 1 µl was assessed on Qubit using the double-stranded DNA high sensitivity (HS) assay to quantify the pre-amplified cDNA. Samples were diluted with nuclease-free water and normalized to 600 pg inputs (100 pg µl−1) and subjected to tagmentation and post-tagmentation PCR. The tagmentation and subsequent PCR were scaled up 6×: precisely, 600 pg pre-amplified cDNA was tagmented with 6 µl of tagmentation mix, 9 µl of Nextera Index primers were added and 18 µl of tagmentation PCR mix was used. Sixteen PCR cycles were performed followed by equivolume sample pooling (12 µl of each PCR product) and AMPureXP purification at a 1× ratio. The final library size distribution and concentration were assessed with the HS DNA Bioanalyzer. Sequencing was performed with Nova Seq 6000 with paired-end reads (using 100 cycle kits: 60 + 40). For germinal vesicle, MII, one-cell, two-cell, four-cell and eight-cell stages, four, four, four, four, two and four biological replicates were sequenced, respectively.

Computational processing of ribosome profiling data

Ribosome profiling data were processed using RiboFlow61. We extracted the first 12 nt from the 5′ end of the reads using UMI-tools62 version 1.1.1 with the following parameters: “umi_tools extract -p “^(?P.{12})(?P.{4}).+$”–extract-method=regex”. The 4 nt downstream of the unique molecular indexes (UMIs) are discarded as they are incorporated during the reverse transcription step. Conventional ribosome profiling samples did not include UMIs.

Next, we clipped the 3′ adapter AAAAAAAAAACAAAAAAAAAA, from the Ribo-ITP data, using cutadapt63 version 1.18 with the parameters “-a AAAAAAAAAACAAAAAAAAAA–overlap=4–trimmed-only”. For conventional ribosome profiling data, we removed the poly(A) tails and the first 3 nt of the reads using “cutadapt -u 3 -a AAAAAAAAAA–overlap=4–trimmed-only”.

After UMI extraction and adapter trimming, reads were aligned to ribosomal and transfer RNAs using Bowtie2 (ref. 64) version The unaligned reads were mapped to a manually curated transcriptome. We retained alignments with mapping quality greater than 2 followed by deduplication using UMI-tools when applicable. In deduplication of external libraries without UMIs, a set of reads with the same length that were mapped to an identical nucleotide position were collapsed into a single read. As the last step, .ribo files were created using RiboPy61 version 0.0.1. All subsequent analyses used ribosome footprints that were 29–35 nt in length.

For analyses involving nucleotide-resolution data, we determined the A-site offset for each ribosome footprint length using translation stop site metagene plots. Specifically, for each read length, we identified the highest peak upstream of the translation stop site and used the distance to the annotated stop site as the offset.

To assign ribosome footprints to coding reading frames (0, 1 and 2), we first calculated the distance between the 5′ end of the footprint and the first nucleotide of the coding sequence and took modulo 3 of the distance. Next, ribosome footprints were partitioned by their length and the 2 nt upstream and 1 nt downstream of the 3′ end of the footprint. For each group, we determined the total number of reads, assigned to each reading frame, giving us three numbers (S0, S1 and S2) where Si is the total number of footprints in the frame i. We cyclically shifted these numbers so that the maximum number was assigned to the first component. After cyclic shifts, we aggregated all triplets component-wise. The resulting triplet (T0, T1 and T2) provides the adjusted reading frames where Ti is the corrected number of footprints in the frame i. We compared the resulting reading frame distribution (T0, T1 and T2) to the randomly distributed frames, where the expected value is (T0 + T1 + T2)/3 for each frame (Chi-squared statistic, P < 2.2 × 10−16 for all experiments).

Computational processing of RNA-seq data

The 5′ adapter sequence ‘ATTGCGCAATG’ was clipped from the first read in the pair using cutadapt63 version 1.18. Clipped reads shorter than 8 nt were removed using: ‘cutadapt -j 4–trimmed-only -m 8 -g ATTGCGCAATG’. We then extracted the next 8 nt corresponding to the UMIs from the first read in the pair and appended them to the headers (of FASTQ files) of both read pairs using UMI-tools with the following parameters: ‘umi_tools extract–bc-pattern NNNNNNNN’. After UMI extraction, we used the second read in the pair (40 nt) for all subsequent analyses.

After filtering out reads aligning to a reference of rRNAs and tRNAs, the remaining reads were aligned to a transcriptome reference in which SNPs were masked with Ns (see the next section for details); thereafter, we retained only the alignments with mapping quality greater than 2. We then collapsed reads that aligned to the same transcript using their respective UMIs: ‘umi_tools dedup–per-contig–per-gene’. For each transcript, we counted the number of reads aligning to the coding sequence. We used Bowtie2 (ref. 64) for all alignments and SAMtools65 version 1.11 for processing BAM files.

Comparison with polysome profiling

The transcripts with validated changes in polysomal association between germinal vesicle-stage and MII-stage oocytes were obtained from Supplementary Figs. 2 and 3 of Chen et al.18. Of the 29 genes with quantitative reverse transcription PCR-validated changes in polysomal association, 28 had the reported direction of effect when comparing the mean of the centred log ratio (clr)66 across the replicates. Specifically, let M be the geometric mean of all the genes with non-zero counts and let g be the raw counts for a specific gene. Then, clr of g is computed as \({\rm{c}}{\rm{l}}{\rm{r}}(\,g)={\rm{l}}{\rm{n}}\frac{g}{M}\).

The relationship between RNA expression and poly(A) tail length

Previously, poly(A) tail length in germinal vesicle-stage mouse oocytes was measured using both short-read sequencing (TAIL-seq)67 and PacBio sequencing (PAIso-seq)68. The processed data were obtained from and For each gene, we averaged the poly(A) tail measurements across replicates and transcripts.

To determine the effect of oligo(dT) priming on our RNA expression measurements, we reanalysed the only publicly available data from mouse zygotes that did not use poly(A) selection in its RNA measurements (SUPeR-seq)69. We downloaded the gene-level expression data from GSE53386 and calculated the mean fragments per kilobase per million mapped reads across the five replicate experiments from wild-type mouse zygotes. We quantified the difference in RNA expression between these measurements and ours using the log2 ratio of the normalized values.

We found a very weak correlation between measured poly(A) tail length and our RNA expression measurements from germinal vesicle-stage oocytes (Spearman correlation of −0.04 for Tail-seq and 0.07 for PAIso-seq; Extended Data Fig. 10a,b). To rule out the possibility that the observed weak correlation may be due to poor reliability of the poly(A) tail measurements, we compared PAIso-seq and Tail-seq measurements and found that they were moderately correlated with each other for the set of transcripts that were measured more than once (Spearman correlation coefficient of 0.42; P < 2.2 × 10−16).

Even though poly(A) tail length does not systematically confound measured RNA expression, the abundance of transcripts with extremely short poly(A) tails can still be underestimated. Indeed, the subset of genes with the shortest average poly(A) tail length (less than 35 nt corresponding to the lowest 1% in TAIL-seq and the lowest 3.7% in PAIso-seq) had significantly lower RNA expression measurements (Extended Data Fig. 10c; Wilcoxon rank sum test P < 2.2 × 10−16; Extended Data Fig. 10c).

The observed lower expression of transcripts with the shortest poly(A) tails could stem from a technical artefact of using poly(A) selection. Alternatively, mRNAs with the shortest poly(A) tails may have intrinsically lower expression. To differentiate these two alternatives, we compared our measurements with SUPeR-seq, a method that does not rely on poly(A) selection69. As expected, SUPeR-seq measurements were highly correlated with our own measurements (Spearman correlation coefficient of 0.82; P < 2.2 × 10−16; Extended Data Fig. 10d). More importantly, when comparing the difference in measured RNA abundance between the two methods, there was only a minimal association as a function of poly(A) tail length (Extended Data Fig. 10e; Kruskal–Wallis rank sum test; P value of 0.016).

Together, these results suggest that there is not a systematic bias in Smart-seq3-based RNA expression measurements as a function of poly(A) tail length. However, the expression of genes with the shortest poly(A) may be slightly underestimated.

The relationship between translation efficiency and poly(A) tail length

Mouse embryos from early one-cell, two-cell and eight-cell stages were previously used to determine poly(A) tail length using an improved version of TAIL-seq ( This HDF5 file contained tag counts aggregated by poly(A) tail lengths. We calculated the mean poly(A) tail length of each gene using the instructions by the authors and rhdf5 package version 2.42.0 (

For each gene and embryonic stage, we first calculated the density of ribosome footprints and RNA-seq reads across the coding region. These values were then normalized using the centred log ratio66 and were averaged across replicates. Translation efficiency of a gene in a given embryonic stage was defined as the ratio of the normalized ribosome occupancy to RNA expression. The bootstrap confidence interval for translation efficiency was calculated by sampling with the replacement of the replicate Ribo-ITP and RNA-seq experiments and repeating the described calculation.

Allele-specific ribosome occupancy and RNA expression analysis

Given that mouse embryos were obtained by crossing the strains C57BL/6J (maternal) and CAST/EiJ (paternal), we used known strain-specific SNPs to determine the parental origin of the RNA molecules. This allowed us to determine whether the ribosome occupancy or RNA expression of a gene exhibits a maternal or paternal (that is, allele-specific) bias as detailed below.

A list of strain-specific SNPs was obtained in VCF format from We extracted 210,004 distinct SNPs that overlapped with transcript annotations. To avoid alignment biases, we modified our transcriptome reference sequences by masking SNP positions with Ns. Mouse sequencing data were aligned to this masked transcriptome reference.

For allele-specific analyses, we considered the 85,339 SNPs within the coding sequences of transcripts. Given that transcripts in oocytes should solely contain maternal SNPs, we used the data from the MII-stage oocytes to construct a simple error correction model. Specifically, 2.67% and 0.40% of reads contained non-maternal sequences in ribosome profiling and RNA-seq experiments, respectively. These values were used as estimates of the sequencing error percentage (error).

We define the paternal ratio as


For one-cell to eight-cell embryos, we then calculated the error-corrected paternal ratio, \({{\rm{paternal}}}_{{\rm{corrected}}}\) as:

$${{\rm{paternal}}}_{{\rm{corrected}}}=\frac{300\times {{\rm{paternal}}}_{{\rm{observed}}}-{\rm{error}}\times 100}{300-4\times {\rm{error}}}$$

where, \({{\rm{paternal}}}_{\mathrm{observed}}\) is the uncorrected percentage. We derived this equation from the below model under the assumption that sequencing errors are random:

$${{\rm{paternal}}}_{{\rm{observed}}}={{\rm{paternal}}}_{{\rm{corrected}}}\times \frac{100-{\rm{error}}}{100}+(100-{{\rm{paternal}}}_{{\rm{corrected}}})\times \frac{100-{\rm{error}}}{3\times 100}.$$

For each embryonic stage, to identify the transcripts whose paternal ratios are significantly different in ribosome profiling compared with RNA-seq, we first aggregated all SNP-containing reads for each transcript across replicates. We retained transcripts with more than ten reads in both ribosome profiling and RNA-seq experiments including at least three maternal and paternal reads. We used a two-sample test for the equality of proportions with continuity correction (prop.test in R; see chapter 3 of ref. 70 for details).

Transcripts with 95% confidence intervals of difference in paternal ratios (derived from the test for the equality of proportions), overlapping with the interval (−0.05, 0.05), were filtered out. After adjusting the P values using the false discovery rate method, we retained the transcripts with adjusted P < 0.2. We further removed transcripts with paternal reads in the MII stage as these probably indicate positions that are prone to alignment errors. As the final step, we applied bootstrapping to establish robustness of the conclusions. Specifically, we randomly sampled replicates with replacement and repeated the statistical testing procedure described above. Twenty-four transcripts with a false discovery rate less than 0.2 in at least 66 out of 100 bootstrap samples were deemed as having differential allelic ratios. There were a total of 187 coding region SNPs differentiating the two alleles among this set of 24 of these transcripts. A list of these SNPs is provided in Supplementary Table 6.

For the analyses described in Fig. 6 and Extended Data Fig. 9, we considered RNA expression data from four-cell and eight-cell embryonic stages. We discarded the genes with fewer than ten parent-of-origin differentiating reads across all replicates of the given embryonic stage. To define genes that display allele-specific bias in expression, we used a bootstrapping approach. For each sample, we randomly selected two or four replicates with replacement from the four-cell and eight-cell RNA expression data, respectively. Reads were then combined across replicates and SNP positions. A gene was deemed paternally or maternally biased if the ratio of the paternal-to-maternal allele supporting reads were greater than 70% in at least 800 out of 1,000 bootstrap samples. The remaining genes were considered as biallelic (or unbiased). In total, we identified 2,239 and 3,707 biallelic, 191 and 334 maternally biased and 103 and 253 paternally biased genes in four-cell and eight-cell stages, respectively. Furthermore, 37 and 47 genes had expression from only one of the alleles in the four-cell and eight-cell stages, respectively. This subset of allele-specific genes was defined as monoallelic. Finally, to define a more high-confidence set of allele-biased genes, we required support from multiple SNPs such that the ratio of SNPs with \(\frac{{\rm{Number}}\;{\rm{of}}\;{\rm{reads}}\;{\rm{from}}\;{\rm{paternal}}\;{\rm{alleles}}+1}{{\rm{Number}}\;{\rm{of}}\;{\rm{reads}}\;{\rm{from}}\;{\rm{paternal}}\;{\rm{alleles}}+1+{\rm{number}}\;{\rm{of}}\;{\rm{reads}}\;{\rm{from}}\;{\rm{maternal}}\;{\rm{alleles}}+1}\) > 0.5 was at least 60%. We found that 195 and 323 genes were supported by multiple SNPs in four-cell and eight-cell stages, respectively. To compare the translation efficiency distribution of different gene groups, we used the non-parametric Wilcoxon rank sum test. To estimate the magnitude of the effect size, we report fold changes defined as the ratio of the median translation efficiency of the allele-specific genes to that of biallelic genes.

Differential expression and translation efficiency analysis

Reads that aligned to coding regions were extracted for all experiments. To determine transcripts with the highest variability in ribosome occupancy across developmental stages, a variance-stabilizing transformation (VST), as described in ref. 71, was applied to centred log ratio of ribosome occupancies (‘FindVariableFeatures’ function with the selection method ‘vst’ in the Seurat package v4 (ref. 72)). Using the threshold ‘vst.variance.standardized’ > 4.8, we obtained 50 genes.

For every pair of consecutive developmental stages, differential RNA expression and translation efficiency was determined using DESeq2 (ref. 73). For calculation of differential translation efficiency, we used the interaction term between the developmental stage and the measurement modality (ribosome profiling or RNA-seq). Default parameters were used for read count normalization and estimation of gene-specific dispersion. Effect size moderation was carried out using the approximate posterior estimation for a generalized linear model74. The adjusted P value cut-off was set to 0.01 to determine a set of transcripts with significant changes in RNA expression and translation efficiency. Gene set enrichment analyses for gene ontology terms were carried out using FuncAssociate ( with default settings75.

Proteomics data and comparison with RNA-seq and ribosome profiling

Tandem Mass Tag-labelling-based proteomics abundance data for one-cell to morula-stage embryos were obtained from Gao et al.6. Measurements in all three modalities were available for 3,287 proteins and were used in further analysis. Ribosome occupancy and RNA expression were converted to read density by dividing the read counts by the length of the coding region of each transcript. These values were normalized using a centred log-ratio transformation as implemented in Seurat v4 (ref. 72). The translation efficiency was defined as described above. The similarity between RNA expression, ribosome occupancy, translation efficiency and protein abundance was measured using rank correlation with Spearman’s correction76,77. The measurement reliability for each modality was estimated using replicate to replicate correlation coefficients (0.53 for translation efficiency, 0.71 for ribosome profiling, 0.79 for RNA-seq and 0.8 for mass spectrometry6).

Weighted transcript region length distribution

For the transcript regions, 5′ UTR, CDS and 3′ UTR, the distribution of weighted region lengths was calculated as follows. First, for each transcript, we determined the ratios of region lengths to the transcript length. Next, we multiplied these ratios with the number of ribosome occupancies in the transcript, giving us weighted ratios of the regions. Then, for each region, we calculated the sum of their weighted ratios across transcripts. Finally, let w5′ UTR, wCDS and w3′ UTR be the weighted (w) sums of the regions 5′ UTR, CDS and 3′ UTR, respectively. For a region r, the weighted length percentage of r is \(\frac{{w}_{r}}{{w}_{{5}^{{\prime} }{\rm{UTR}}}+{w}_{{\rm{CDS}}}+{w}_{{3}^{{\prime} }{\rm{UTR}}}}\times 100.\)

Characterization of SNP effects on allele-specific ribosome occupancy

All SNPs differentiating the paternal allele from the maternal allele were extracted for the set of transcripts with evidence of differential allele-specific ribosome occupancy (Fig. 4c). These were annotated by their position within the transcript (5′ UTR, CDS and 3′ UTR) and various functional classes as detailed below using bedtools version v2.29.2 (ref. 78) with the following options: ‘bedtools intersect -wa -wb’.

SNPs within 5′ UTRs were annotated as candidates for generating translation initiation sequences by matching a 9-nt sequence centred around the SNP to the regular expression ‘[ATCG]+[AG]{1}[ATCG]{2}[ACG]{1}TG[AG]{1}[ATCG]+’ with R base::grepl. SNPs in which either the maternal or paternal allele matched the regular expression were selected. A mouse PD-31 FACS-seq dataset reporting efficiency of non-canonical translation initiation sequences for −4 to +4 (ref. 79) was used to score the efficiency of candidate 5′ UTR initiation sites.

Analyses of RBP motifs

Enrichment or depletion of RBP heptamer motifs was determined by the Transite v1.16.0 k-mer transcript set motif analysis method55 (Benjamini–Hochberg P < 0.0001; Supplementary Table 5) and annotated with oRNAment56 RBPs with consensus sequences matching the heptamer. RBP synonyms in the oRNAment database were replaced with their standard gene names (DAZ3: DAZL; SF2: SRSF1; B52: SRSF6; Fusip1: SRSF10; and ZNF326: ZFP326).

SNPs were intersected with the BED file of oRNAment motifs using bedtools78 v2.29.2 after subtracting one from the BED start coordinate to ensure that sequences had the same length as the oRNAment position weight matrices. We filtered 102 out of 1,403 SNPs that intersect RBP motifs due to multiple SNPs being in close proximity (less than 5 nt). The maternal and paternal sequences were scored using the oRNAment matrix similarity score in R v4.0.4. For RBPs with more than one position weight matrix, the maximum of the absolute difference in scores was computed to identify the consensus motif most impacted by a given SNP. Then, RBPs sharing the same consensus motif and overlapping the same SNP were collapsed into a single annotation by computing the median difference in score. Robust standardization of the median difference in score was performed (centre to median, divided by the interquartile range). SNPs predicted to alter RBP binding were selected using the 95th percentile of the absolute, standardized score difference. Finally, RBPs were discarded if they had no mouse homologue or had no detectable expression during the stages of development analysed (A1CF, BOLL, ELAVL4, MSI1, KHDRBS3, PABPC5, RBM4B, TIA1, EIF4G2, RBFOX3, BRUNOL6, RBM23 and SRSF8).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Source link


Leave a Reply

Your email address will not be published. Required fields are marked *