Strange IndiaStrange India


Sample acquisition

De-identified tissue samples were collected with previous consent in strict observance of the legal and institutional ethical regulations. Protocols were approved by the Human Gamete, Embryo, and Stem Cell Research Committee (institutional review board) at the University of California, San Francisco. Two sets of samples included twins: GW20_31 and GW20_34; GW22 and GW22T.

Single-cell RNA sequencing capture and processing

Brain dissections were performed under a stereoscope with regards to major sulci to identify cortical regions. Of note, all dissections were performed by the same individual (T.J.N.) to enable reproducibility and comparison between samples. Tissue was incubated in 4 ml of papain/DNAse solution (Worthington) for 20 min at 37 °C, after which it was carefully triturated with a glass pipette, filtered through a 40-µm cell strainer and washed with HBSS. The GW22 and GW25 samples were additionally passed through an ovomucoid gradient (Worthington) in order to minimize myelin debris in the captures. The final single-cell suspension was loaded onto a droplet-based library prep platform Chromium (10X Genomics) according to the manufacturer’s instructions. Version 2 was used for all samples except for GW19_2, GW16, and GW18_2 for which version 3 chemistry was used. cDNA libraries were quantified using an Agilent 2100 Bioanalyzer and sequenced with an Illumina NovaSeq S4.

Quality control and filtering

We filtered cells using highly stringent quality control (QC) metrics. In brief, we discarded potential doublets using the R package scrublet29 for each individual capture lane, then required at least 750 genes per cell and removed cells with high levels (>10%) of mitochondrial gene content. These strict metrics for quality control preserved no more than 40% of cells for downstream analysis, and re-analysis of the data for specific brain structures or cell types may benefit from less stringent QC for additional discovery. Our goal was to obtain clean populations with a high validation rate for a better understanding of arealization signatures. The resulting ~700,000 cells passing all thresholds were used in downstream analyses.

Clustering strategy

We used a recursive clustering workflow to understand the cell types present in our dataset. In order to minimize potential batch effects and to increase detection sensitivity of potential rare cell populations, we performed Louvain–Jaccard clustering on each individual sample first. After initial cell type classification, we sub-clustered all the cells belonging to a cell type to generate the most granular cell subtypes possible. We then correlated subtypes between individuals based upon the gene scores in all marker genes to bridge any batch effects, and iteratively combined clusters across all individuals and cell types. For this study, we combined the clusters within a single cell type across all individuals once, and again with all clusters from all individuals and cell types, resulting in two iterative combinations. The annotations at each step are preserved in the supplementary tables to enable reconstruction at any point in the pipeline.

Hierarchical clustering of clusters

Cluster hierarchies are generated from matrices correlating all clusters to one another using Pearson’s correlation in the space of gene scores for all marker genes across all groups. Hierarchical clustering is performed within Morpheus (https://software.broadinstitute.org/morpheus) across all rows and columns using one minus the Pearson correlation for the distance metric.

Constellation plots

To visualize and quantify the global relationships and connectedness between cell types, cell type subclusters, or cell type-area groups, we implemented the constellation plots described in ref. 1, by adapting the code made available at https://github.com/AllenInstitute/scrattch.hicat/. In brief, we represented each group of cells as a node, whose size is proportional to the number of cells contained within it. Each node is positioned at the centroid of the UMAP coordinates of its cells. Edges represent relationships between nodes, and were calculated by obtaining the 15 nearest neighbours for each cell in principal component analysis space (principal components 1:50), then determining, at each cluster, the fraction of neighbours belonging to a different cluster. An edge is drawn between 2 nodes if >5% of nearest neighbours belong to the opposite cluster in at least one of them. An edge’s width at the node is proportional to the fraction of nearest neighbours belonging to the opposite node, with the maximum fraction of out-of-node neighbours across all clusters represented as an edge width of 100% and equal to node width. The full code adaptation and implementation of this analysis is described in the function buildConstellationPlot in this paper’s associated Github repository.

Quantification of constellation plots

Constellation plots were quantified by using a summary of the input values described above. For each cell type or area connection, the number of edges between two groups was multiplied by the average fraction of cells meeting the threshold for a connection within the group. This resulting number was called the connectivity index.

Module eigengene calculations

Module eigengenes were calculated for numerous gene sets using the the R package WGCNA30. Scores were generated for each set of up to 10,000 randomly subsetted cells from the group using the function moduleEigengene function, Scores were calculated based on the intersection of the gene set of interest and genes expressed in the subset of cells. For the area-specific signatures, differential expression was performed as described above, and the gene signatures from late stage neurons across all areas were used to calculate module eigengenes for the radial glia and IPC populations.

Area-specific markers and gene score calculations

The expression profiles of cells from each subcluster or cortical area were compared to those of all other cells using the two-sided Wilcoxon rank-sum test for differential gene expression implemented by the function FindAllMarkers in the R package Seurat and selected based on an adjusted P-value cut-off of 0.05. Adjusted P-values were based on Bonferroni correction using all features in the dataset. We performed this step separately for each cell type and each individual, since we observed that gene specificity was highly dynamic throughout the developmental process. We then combined the individual gene lists of each cell type and area, and annotated the stage(s) at which each gene appeared to be specific. We binned individuals into three stages: early (GW14, GW16 and GW17), middle (GW18, GW19 and GW20) and late (GW22 and GW25). We ranked upregulated genes by specificity by calculating their gene score, which we defined as the result of a gene’s average log fold-change × enrichment ratio, in turn defined as the percentage of cells expressing the gene in the cluster of interest divide by the percentage of cells expressing in the complement of all cells. Dot plots used to visualize the expression of distinct marker genes across cell types and/or cortical areas were generated the custom function makeDotPlot available in our code repository, which makes use of the Seurat function DotPlot. In brief, for each gene, the average expression value of all non-zero cells from each group (cortical area) is scaled using the base R function scale(), yielding z-scores. Scaling is done to enable the visualization of genes across vastly different expression ranges on the same colour scale.

Transcription factor annotation

Areally enriched marker genes obtained as described above were annotated against a comprehensive list of 1,632 human transcription factors described in31 and downloaded from the transcription factor database AnimalTFDB332, available at http://bioinfo.life.hust.edu.cn/static/AnimalTFDB3/download/Homo_sapiens_TF.

Gene signature overlap and Sankey diagrams

To quantify the degree of (dis)similarity of molecular signatures across distinct cell types, cortical areas, and/or developmental stages, we calculated the overlap between all sets of cell type and area-specific gene markers at each stage, and visualized these comparisons using Sankey diagrams using the function ggSankey from the ggvis R package. We then calculated the proportion of genes for each node shared with every other node, and clustered nodes hierarchically by calculating their euclidean distances based on their proportions of shared genes. The code used to construct the overlap matrices, create the plots and quantify the results is described in the functions buildSankey and buildHeatmap in our Github repository.

RNA velocity

Velocity estimates were calculated using the Python 3 packages Velocyto v0.1722 and scVelo v0.2.223. Reads that passed quality control after clustering were used as input for the Velocyto command line implementation. The human expressed repeat annotation file was retrieved from the UCSC genome browser. The genome annotation file used was provided by CellRanger. The output loom files were merged and used in scVelo to estimate velocity. For the combined cortical analysis, cells underwent randomized subsampling (fraction = 0.5), and were filtered based on the following parameters: minimum total counts = 200, minimum spliced counts = 20 and minimum unspliced counts = 10. The final processed object generated a new UMI count matrix of 18,970 genes across 195,775 cells, for which the velocity embedding was estimated using the stochastic model. The embedding was visualized using UMAP of dimension reduction. The velocity genes were matched by cortical area and were estimated using the rank velocity genes function in scVelo. Computational analysis of the transcriptomic data described in detail above were performed using R 4.024 and Python 3, the R packages Seurat (version 2 and version 3)25,26, googleVis27, dplyr and ggplot228, the Python packages Velocyto v0.1722 and scVelo v0.2.223 as well as the custom-built R functions described. Our reproducible code is available in the Github repository associated with this manuscript.

Validation marker gene selection

Marker genes for validation with the spatial omics platform were chosen first by identify useful cell type markers within the dataset. SOX2 was chosen to mark radial glia, EOMES was chosen to mark IPCs, and BCL11B and SATB2 were chosen to marker excitatory neuronal populations with previously validated changing co-expression patterns. POLR2A was used as a positive control for the technology. The remaining genes were selected based upon their status as a specific marker gene for excitatory neuron clusters of interest.

Rebus Esper spatial ‘omics platform

Samples for spatial transcriptomics were dissected from primary tissue as described above. Samples were flash frozen in OCT following the protocol described in the osmFISH protocol33. Samples were then mounted to APS-coated coverslips, and fixed for 10 min in 4% PFA. Samples were then washed with PBS, and processed for spatial analysis. The spatially resolved, multiplexed in situ RNA detection and analysis was performed using the automated Rebus Esper spatial omics platform (Rebus Biosystems). The system integrates synthetic aperture optics (SAO) microscopy34, fluidics and image processing software and was used in conjunction with smFISH chemistry. Individual transcripts from target genes were automatically detected, counted, and assigned to individual cells, generating a cell × feature matrix that contains gene-expression and spatial location data for each individual cell, as well as registered imaging data, as follows.

Rebus Biosystems proprietary software was used to design primary target probes (22–96 oligonucleotides) and corresponding unique readout probes (assigned and labelled with Atto dyes) for each gene. The oligonucleotides were purchased from Integrated DNA Technologies and resuspended at 100 µm in TE buffer. Coverslips (24 x 60 mm, no. 1.5, catalogue (cat.) no. 1152460, Azer Scientific) were functionalized as previously published33. Fresh frozen brain tissue sections (10 µm) were cut on a cryostat, mounted on the treated coverslips and fixed for 10 min with 4% paraformaldehyde (Alfa Aesar, catalogue no.) in PBS at room temperature, rinsed twice with PBS at room temperature and stored in 70% ethanol at 4 °C before use. The sample section on the coverslip was assembled into a flow cell, which was then loaded onto the instrument. The hybridization cycles and imaging were done automatically under the instrumental control software. In brief, primary probes for all target genes were initially hybridized for 6 h and probes not specifically bound were washed away. Readout probes labelled with Atto532, Atto594 and Atto647N dyes for the first 3 genes were then hybridized, washed, counterstained with DAPI and then imaged with an Andor sCMOS camera (Zyla 4.2 Plus, Oxford Instruments) through a 20×, 0.45 NA dry lens (CFI S Plan Fluor ELWD, Nikon) with a 365-nm LED for DAPI and 532-nm, 595-nm and 647-nm lasers configured for SAO imaging. Multiple fields of view (FOVs) were imaged for each channel within the region of interest (ROI). Single z-planes with 2.8 µm depth of field were acquired for each FOV. After imaging, the first three readout probes were stripped and the readout probes for the next three genes were then hybridized, imaged and stripped. This process was repeated until readout was completed for all genes.

Using the Rebus Esper image processing software, the raw images were reconstructed to generate high-resolution images (equivalent or better than images obtained with a 100× oil immersion lens). RNA spots were automatically detected to generate high fidelity RNA spot tables containing xy positions and signal intensities. Nuclei segmentation software based on StarDist35 identified individual cells by finding nuclear boundaries from DAPI images. The detected RNA spots were then assigned to each cell using maximum distance thresholds. The resulting cell × feature matrix contains gene counts per cell along with annotations for cell location and nuclear size.

Kernel density estimation plots

Kernel density estimation plots were created from individual gene spot location maps retrieved from the spatial transcriptomics pipeline. They were created using the seaborn kdeplot function in Python with shading and cmap colouring. They were merged together for Fig. 4 with the Adobe Illustrator overlay and darken features, using 50% opacity.

Spatial co-expression analysis

Using the cell × feature matrices, we eliminated all spots with less than ten counts for signal. Pearson’s correlations were then performed across the genes within each dataset and filtered for self-correlation. Positive control (POL2RA) and non-excitatory neuron cell type markers (SOX2, EOMES and DLX6) were removed from the analysis. Interactions of 0.05 or more were preserved and visualized with Cytoscape v3.8.2 using a force-directed biolayout. Individual nodes were coloured by their colour in the merged image file in Fig. 4b.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.



Source link

By AUTHOR

Leave a Reply

Your email address will not be published. Required fields are marked *