Strange IndiaStrange India


Retrospective cohort design

The study cohort comprised decidua tissue from archival formalin-fixed, paraffin embedded (FFPE) blocks, sampled after elective pregnancy terminations from an outpatient clinic located within a large public hospital affiliated with an academic medical centre. Patients at this clinic reflect a diverse population. Although the patient population is predominantly low-income, women of all economic backgrounds are cared for at the clinic.

In the clinic, an ultrasound examination is performed to estimate GA, and a medical history is taken and logged as an electronic medical record (electronic clinical works) or handwritten forms. A board-certified gynaecologist reviewed medical records and specifically extracted the following details: age, ethnicity, body–mass index, gravidity, parity, previous terminations, smoking status, medications, HIV status, history of preeclampsia, chronic hypertension, diabetes mellitus, renal disease, autoimmune disease, multifetal pregnancy, and congenital anomalies (Supplementary Table 1). For procedures occurring at <14 weeks GA, suction aspiration is routinely used. For procedures at >14 weeks GA, a combination of suction aspiration and grasping forceps is used. After the procedure, tissue samples are routinely sent to pathology.

TMA construction

Whole tissue sections from individuals who underwent elective termination at 6–20 weeks of gestation were first reviewed by H&E staining to identify samples containing decidual tissue and spiral arteries. These regions were manually demarcated and assessed for suitability. Blocks containing decidua with vessels were selected, cored with a bore needle and assembled into the TMA used in this study. Archival tissue blocks from 74 individuals were initially selected by a board-certified perinatal pathologist (G.R.) to be included in the TMAs. The first TMA consisted of 205 cores (including 3 tonsil cores, 1 endometrium core and 1 myometrium core) of 1 mm in diameter and the second contained 86 cores of 1.5 mm in diameter). Unfortunately, cores from eight individuals did not end up containing decidua, and there was not sufficient tissue in the block for additional re-coring. We therefore had to exclude these samples from the analysis. The final cohort included 66 individuals, an exhaustive list of which is provided in Supplementary Table 1. Images from samples from six individuals did not have arteries and therefore were not included in analyses related to spiral arteries. Information on the histological characteristics of the blocks retrieved, including the presence of cell column anchoring villi, is in Supplementary Table 1. High-resolution scans of each core were uploaded to the Stanford Tissue Microarray Database (http://tma.im/cgi-bin/home.pl), a collaborative internal platform for designing, viewing, scoring and analysing TMAs. Sequential recuts of the main experiment were stained with H&E to aid in choosing the imaging ROIs and analysing data.

Antibody preparation

Antibody staining was validated as previously described11,58. In brief, each reagent was first tested using single-plex chromogenic immunohistochemistry (IHC) using multiple positive and negative FFPE tissue controls before metal conjugation. Antibodies were then conjugated to isotopic metal reporters as previously described11,22,23,24,58 with the exception of biotin-conjugated anti-PD-L1, for which a metal-conjugated secondary antibody was used. The performance of metal-conjugated antibody reagents were then tested within the complete MIBI-TOF staining panel under conditions identical to those in the main study and compared with representative single-plex chromogenic IHC to confirm equivalent performance. Representative stains and information for each marker is provided in the Supplementary Information and in Supplementary Table 7, respectively. After conjugation, antibodies were diluted in Candor PBS Antibody Stabilization solution (Candor Bioscience). Antibodies were either stored at 4 °C or lyophilized in 100 mM d-(+)-trehalose dehydrate (Sigma Aldrich) with ultrapure distilled H2O for storage at −20 °C. Before staining, lyophilized antibodies were reconstituted in a buffer of Tris (Thermo Fisher Scientific), sodium azide (Sigma Aldrich), ultrapure water (Thermo Fisher Scientific) and antibody stabilizer (Candor Bioscience) to a concentration of 0.05 mg ml–1. Information on the antibodies, metal reporters and staining concentrations is in Supplementary Table 7.

Tissue staining

Tissues were sectioned (4 μm in thickness) from tissue blocks on gold and tantalum-sputtered microscope slides. Slides were baked at 70 °C for 20 min followed by deparaffinization and rehydration with washes in xylene (3 times), 100% ethanol (2 times), 95% ethanol (2 times), 80% ethanol (once), 70% ethanol (once) and ddH2O with a Leica ST4020 Linear Stainer (Leica Biosystems). Tissues next underwent antigen retrieval, which was carried out by submerging sides in 3-in-1 Target Retrieval solution (pH 9, Dako Agilent) and incubating them at 97 °C for 40 min in a Lab Vision PT Module (Thermo Fisher Scientific). After cooling to room temperature, slides were washed in 1× PBS IHC washer buffer with Tween 20 (Cell Marque) with 0.1% (w/v) BSA (Thermo Fisher). Next, all tissue samples underwent two rounds of blocking, the first to block endogenous biotin and avidin with an Avidin/Biotin Blocking kit (BioLegend). Tissue samples were then washed with wash buffer and blocked for 1 h at room temperature with 1× TBS IHC wash buffer with Tween 20 and 3% (v/v) normal donkey serum (Sigma-Aldrich), 0.1% (v/v) cold fish skin gelatin (Sigma Aldrich), 0.1% (v/v) Triton X-100 and 0.05% (v/v) sodium azide. The first antibody cocktail was prepared in 1× TBS IHC wash buffer with Tween 20 and 3% (v/v) normal donkey serum (Sigma-Aldrich) and filtered through a 0.1 μm centrifugal filter (Millipore) before incubation with tissue overnight at 4 °C in a humidity chamber. After overnight incubation, slides were washed for 2 min in wash buffer. The next day, the antibody cocktail was prepared as described (Supplementary Table 7) and incubated with the tissues for 1 h at 4 °C in a humidity chamber. After staining, slides were washed twice for 5 min in wash buffer and fixed in a solution of 2% glutaraldehyde (Electron Microscopy Sciences) solution in low-barium PBS for 5 min. Slides were washed in low-barium PBS for 20 s then, using a linear stainer, through 0.1 M Tris at pH 8.5 (3 times), ddH2O (2 times) and then dehydrated by washing in 70% ethanol (once), 80% ethanol (once), 95% ethanol (2 times) and 100% ethanol (2 times). Slides were dried under vacuum before imaging.

MIBI-TOF imaging

Imaging was performed using a custom MIBI-TOF instrument with a Xe+ primary ion source, as previously described22,58. In total, 222 808 × 808 µm FOVs were acquired at approximately 600 nm resolution using an ion dose of 7 nA × h mm–2. After excluding 11 FOVs that contained necrotic or non-decidual tissue, or consisted of duplicate tissue regions, the final dataset consisted of 211 FOVs from 66 individuals.

Low-level image processing

Multiplexed image sets were extracted, slide background-subtracted, denoised and aggregate filtered as previously described22,23,24,58,59. For several markers, a background channel consisting of signal from the mass 128 channel was used. All parameters used as inputs for low-level processing are listed in Supplementary Table 7.

Feature annotation

Large tissue features were manually annotated in collaboration with a perinatal pathologist. Pseudo-coloured MIBI images stained with H3 to identify cell nuclei, VIM for decidual stromal cells, SMA and CD31 for vessels, cytokeratin 7 (CK7) for glands and the fetal cell columns, and HLA-G for EVTs were used to guide annotation. Serial H&E sections, and a H&E recut of the entire block, if necessary, were additionally used to supplement annotation. Labelling was performed in ImageJ and the annotated features were exported as binary TIF masks.

Single-cell segmentation

The Mesmer segmentation algorithm60 was adapted specifically to segment the cells in our dataset. First, training data were generated using a subset of 15 images out of 211 in our cohort, in addition to 10 decidua MIBI-TOF images from titration data. In total, 1,024 × 1,024 pixel crops were selected to encompass the range of different cell morphologies present. The markers H3, VIM, HLA-G, CD3, CD14 and CD56 were used to capture the major cell lineages present. Subsequently, a team of annotators parsed these images to identify the location of each unique cell using DeepCell Label, custom annotation software specifically developed for this task60 (https://github.com/vanvalenlab/deepcell-label). The manually annotated images were used to generate partially overlapping crops of 256 × 256 pixels from each image. In total, training data included 1,600 distinct crops with 93,000 cells. This dataset was used to retrain the Mesmer segmentation model, modifying the architecture to accept six distinct channels of input. The output from the network was then post-processed using the default model settings (Supplementary Information).

Segmentation post-processing

Examination of the images revealed that glandular cells and chorionic villus trophoblasts did not express any markers included in the training data; namely these cells were predominantly CK7+. This resulted in effectively nuclear-only segmentation being predicted by the convolutional neural network within these features. To account for this, segmented cells that overlapped with the gland mask were expanded radially by 5 pixels, and those in the cell column mask by 2 pixels. The number of pixels used for expansion was optimized to approximate the observed cell size, which was based on a systematic inspection of three images per GA. Objects <100 pixels in area were deemed noncellular and excluded from subsequent analyses. The final number of segmented events per FOV is provided in Supplementary Table 8.

Single-cell phenotyping and composition

Single-cell expression data were extracted for all cell objects and area-normalized. Single-cell data were linearly scaled with a scaling factor of 100 and ArcSinh-transformed with a co-factor of 5. All mass channels were normalized to the 99th percentile. To assign decidual cell populations (≥70% cell area in decidua) to a lineage, the clustering algorithm FlowSOM (Bioconductor FlowSOM package in R)29 was used, which separated cells into 100 clusters based on the expression of 19 canonical lineage-defining markers (Supplementary Information). Clusters were further classified into 21 cell populations, with proper lineage assignments ensured by manual examination of overlayed FlowSOM cluster identity with lineage-specific markers. Clusters containing non-biologically meaningful or distinct signals were assigned the label ‘other’. Treg cells were identified by thresholding T cells (FlowSOM clusters 43, 53 and 63) with the CD3 signal ≥ the mean CD3 expression of CD4+ T cells and >0.5 the normalized expression of FOXP3. Mast cells were identified as cells for which normalized expression of tryptase was >0.9. Mac2b (CD11c+) cells were identified as macrophages with >0.5 normalized expression of CD11c. Placental macrophages (Hofbauer cells) were defined as CD14+ >0.5 cells located within the cell column. Cells from FlowSOM clusters 4, 5 and 15 ubiquitously and predominantly expressed CK7 and were reassigned to the EVT2 subset if located within the cell column feature mask or as glandular cells otherwise (Supplementary Information). These thresholds were selected based on the distribution of lineage marker expression (Supplementary Information) and on systematic examination of the images by eye as expression patterns varied significantly between markers. For a comprehensive list of all single cells, their morphological features, markers expression, lineage classification, among others, see the Data availability section.

Definition of thresholds for functional marker positivity

Cells were considered positive for a functional marker if their scaled expression level was greater than or equal to a set threshold, as previously described22. Thresholds for individual functional markers were determined on the basis of examining the images by eye, as expression patterns varied significantly between markers (Supplementary Table 9 and Supplementary Information). To set the per-marker thresholds, five images for each functional marker were reviewed, and increasing threshold values were examined using custom software. Subsequently, cells defined as negative for a marker based on the determined threshold value were re-examined to ensure that the thresholds were representative. For Ki67 positivity, only cells that had a nucleus in the image were considered. Ki67 values were not normalized to the cell size because the Ki67 signal is exclusive to nuclei.

Two-colour IHC

Before staining, FFPE sections were incubated at 70 °C for 1 h. After deparaffination and antigen retrieval (Dako, S2367) was performed, endogenous horseradish peroxidase and alkaline phosphatase were blocked using BLOXALL (Vector Laboratories, SP-6000-100) for 30 min, followed by blocking buffer solution (95% 1× TBS IHC wash buffer with Tween 20, 1% Triton 10%, 1% gelatin 10%, 2% horse serum and 1% sodium azide 20 mg ml–1) for 1 h at room temperature. Double staining was performed using CD57 (mouse IgG) paired with CD49a (rabbit IgG). Sections were incubated at 4 °C overnight with the antibodies CD57 (clone NK/804, Abcam, ab269771; titre, 0.5 µg ml–1) and CD49a (clone E9K2J, CST, 15574T; titre, 1:1,500). The following day, secondary antibody (ImmPRESS Duet reagent; HRP anti-rabbit IgG and AP anti-mouse IgG; Vector Laboratories) was applied for 10 min at room temperature. Antibodies were revealed with Vector Blue AP substrate (Vector Laboratories, SK-5300) for 10 min in the dark followed by DAB HRP substrate (Vector Laboratories, SK-4105) for 40 s. For subsequent analyses and colour deconvolution, single-plex staining for CD57 and CD49a were performed on one slide each. For details on the method, buffers and solutions, refer to ref. 61.

The IHC slides were scanned using a NanoZoomer Digital Pathology Scanner 2.0RS (Hamamatsu) and analysed using QuPath (v.0.4.0). To score CD57+ NK cells (NK2) for expression of the tissue-residency marker CD49a, the two colours in the IHC slides were deconvolved with QuPath using single-plex staining as colour references. CD57+ NK cells were then manually annotated, in decidual regions only, by a board-certified pathologist. These cells were than manually scored for CD49a expression and counted.

Blinded manual artery staging

Arteries were categorized into five remodelling stages based on criteria adapted from a previously proposed four-stage model21. These criteria were used to describe spiral arteries observed in H&E and single-channel IHC images and were adapted to suit multiplexed MIBI data (Fig. 3a, details in Extended Data Fig. 3a). In total, 600 arteries were categorized according to these criteria by a single reviewer using only crops of MIBI pseudocolour overlays (SMA, VIM, CD31, H3 and HLA-G), including only the artery (as defined by a feature mask) and any EVTs in the lumen. The reviewer was blinded to the rest of the image, serial H&E sections, GA and any clinical data. Twelve partially captured arteries were excluded from the final dataset of 588 arteries.

Automated digitization of artery morphological features

The same format of cropped artery MIBI images that were manually scored by the reviewer were used to calculate a set of geometric parameters for several selected features. These features described the organization and structure of the vessel wall, the continuity of the endothelium and its thickness, and the presence and structure of intravascular EVTs. To capture these features, a structure of concentric circles we termed the ‘onion’ structure was defined, with the outer circle of this structure enclosing the artery and the inner circles dividing it into layers. This structure is described below using the two-dimensional cylindrical coordinate system, with the radial axis r, azimuthal (angular) axis ø, and origin of the axis at point (x,y). Point (x,y) is the user-defined artery centre. For an artery in the binary mask M, the following algorithm was used to create the onion structure (Extended Data Fig. 3c).First, define a circle enclosing the artery, centred at point (x,y) with radius a as follows: (x,y) was taken as the user-defined artery centre point; a, the radius is defined as the maximum distance between (x,y) and the edge of M, rounded up to the nearest integer multiple of n, such that a = I × n for an integer I. n is a user-defined thickness parameter for the onion layers

Second, define the inner circles comprising the onion layers by dividing the radius a of the outer circle into I equal sections of length n, creating layers along the radial r axis. The radii of the inner circles are then defined as 0,1 × n,2 × n,…(I – 1) × n.

Third, divide the onion into k equal sectors along the ø axis. k is a user-defined integer.

Fourth, subdivide each sector into segments. The sectors are internally divided by the circles, creating parts with four corners and four sides, with the two sides being straight (sector dividers), and the two sides being arcs (parts of circle circumferences). The arcs are replaced with secants (straight line connecting the ends of the arc), turning the segment into a trapezoid. The parameters n = 10 pixels and k = 100 were used to allow for segments large enough to contain a sufficient number of pixels to average the expression over.

Geometrical and protein morphology features were then extracted for each artery onion. For geometrical features, the following parameters were defined: (1) radius, the maximum distance between any pixel within the mask and the closest pixel on the edge of the mask; (2) perimeter, the Euclidean distance between all adjacent pixels on the edge of the artery mask; and (3) area, the total number of pixels within the artery mask.

For the protein morphology features, for the markers CD31, CK7, H3, HLA-G, SMA and VIM, the following parameters were defined. (1) Average signal: the weighted average over segments of marker expression, in which the weight of a segment corresponds to the number of pixels it contains. The weighted average was used to avoid smaller inner segments having a disproportionate effect on the average.(2) Thickness: for each sector, we calculated the distance d between the inner-most segment positive for the marker and the outer-most positive segment. Positivity was measured by comparing the mean signal over pixels the segment to a user-defined threshold.The mean and standard deviation of thickness were calculated as the mean and standard deviation of d over all sectors. (3) Radial coverage: the percentage of sectors positive for marker signal. A sector was considered positive if the mean signal over sector pixels acceded a user-defined threshold. (4) Jaggedness: this feature measures the extent jaggedness of an artery outline. To do so, first, a previously described skeletonization function62 is applied to the artery mask, and this function returns a ‘skeleton’ of the artery outline. This skeleton also assigns values to the outline pixels based on their distance from the core shape. Then, two different binarization thresholds are chosen: a non-branch threshold (a high value = 60 pixels, which indicates a greater topological distance) and a ‘branch’ threshold (a low value = 5 pixels, which indicates a smaller topological distance). The ratio between the total number of non-branch and branch pixels is the jaggedness.

Calculation of continuous SAR remodelling score δ

A supervised dimensionality reduction technique based on LDA63 (https://github.com/davidrglass) was applied using the per-artery digitized morphological features and manually assigned remodelling stage labels as inputs. All artery morphology feature values were standardized (mean subtracted and divided by the standard deviation) and all arteries were used as the training data. The LDA output was as follows (Supplementary Table 3): the optimal linear combination of a subset of features that maximized the separation by manual stage between arteries in LDA space; and the coordinates of each artery in LDA space.

To define the SAR trajectory, a fourth-degree polynomial was fitted to the artery coordinates in LDA space. To determine the optimal degree of the polynomial, polynomials with degrees 1–6 were fitted, and the degree that minimized the P value for separating δ distributions between arteries grouped using the manual remodelling stage (Extended Data Fig. 4c) was selected. The polynomial fit was implemented using the MATLAB function fit and resulted in the following polynomial: f(x) = 0.0005 × x4 – 0.01227 × x3 + 0.1363 × x2 – 0.4354 × x – 0.7425. The polynomial was then numerically interpolated on a dense 104-point grid, and the distance from each artery point in LDA space to the polynomial was calculated using this grid and the MATLAB exchange function distance2curve64. δ per artery was then calculated as the line integral from the curve origin to closest point to the artery on the curve (Extended Data Fig. 4a, inset). This integral was numerically calculated using a custom MATLAB script. δ values were linearly rescaled to the range 1–5 using the MATLAB function rescale.

Cell-type frequency as a function of GA and SAR

To examine cell-type frequencies within the decidua as a function of GA and SAR (Figs. 3 and 4), per-image cell frequency tables were constructed in which cell-type frequencies were calculated as the proportion of cells in the decidua feature mask of that image. Cells located in other feature masks (artery, gland, vessel or cell column masks) were not counted, nor were cells of an unassigned type (‘other’). To focus these analyses on cell populations strictly found in the decidua, muscle and glandular cells were also excluded; these cell types occasionally extended outside their artery and gland feature masks, respectively. Cell frequency as a function of GA for a cell type was defined as the per-image proportion values for that cell type, as a function of the GAs associated with the images. Similarly, cell frequency as a function of SAR for a cell type was defined as the per-image proportions of that cell type, as a function of the mean δ values per image. For the volcano plot in Fig. 3n, we fitted a linear regression model to the two above-described functions. All linear regression models were implemented using the MATLAB function fitlm and the volcano plot only shows points for which regression R2 ≥ 0.05. R2 and P values for all δ-based and GA-based regressions are provided in Supplementary Table 13. The ratio between R2 in the two regression models was used to classify trends as GA-driven, SAR-driven or synchronized. For example, the increase in EVTs out of all cells, R_EVT, was classified as GA-driven because R2 for R_EVT as a function of δ was 0.3 but only 0.1 for R_EVT as a function of GA (Extended Data Fig. 4d and Supplementary Table 10). Another example is the increase in macrophages out of immune cells, I_sumMac: it was classified as GA-driven because R2 for I_sumMac as a function of GA was 0.6 but only 0.1 for I_sumMac as a function of δ (Extended Data Fig. 4e and Supplementary Table 10). To determine the trend sizes depicted in Fig. 3n, the following calculation was used: denote the per-image frequencies of a cell type as V, and the corresponding per image temporal stamps (either GA or mean image δ) as X. Trend size is then calculated as the difference between the first and last time point in units of the mean: \(\frac{V(\max (X))-V(\min (X))}{{\rm{mean}}(V)}\).

NanoString GeoMx DSP

The experiment was performed using NanoString Technologies according to company manuals, details are below.

Slide preparation

Serial sections of the TMAs were cut into 5 µm FFPE sections and were mounted on SuperFrost Plus slides (Fisher Scientific, 12-550-15), air dried and baked overnight at 60 °C. Slides were then processed as specified by the NanoString GeoMx DSP Slide Preparation User Manual (NanoString Technologies, MAN-100 7). In brief, slides were dewaxed, underwent antigen retrieval and treated with proteinase K (Ambion, 2546) at 1 µg ml–1 concentration. Slides were then post-fixed. For RNA probe hybridization, slides were placed in a slide rack with Kimwipes damped with 2× SSC lining the bottom. Each slide was treated with 200 µl of NanoString Technologies whole transcriptome RNA probe mix at a concentration of 4 nM per probe in 1× buffer R (NanoString Technologies). A Hybridslip (Grace Biolabs, 714022) was applied over each slide. Slides were incubated at 37 °C overnight. After hybridization, slides were dipped in a 2× SSC with 0.1% Tween 20 (Teknova, T0710) to remove the coverslips. They were then washed twice in 2× SSC and 50% formamide (ThermoFisher AM9342) at 37 °C for 25 min followed by two washes in 2× SSC for 5 min each at room temperature. Slides were blocked in buffer W (NanoString Technologies) at room temperature for 30 min, followed by the application of 200 µl morphology marker mix for 1 h. Details of the morphology markers are provided in Supplementary Table 7.

Sample collection

Sample collection was performed as indicated in the GeoMx DSP instrument user manual (MAN-10088-03). Slides were loaded into the GeoMx DSP instrument and scanned. For each tissue sample, we selected ROIs corresponding to one of the following categories: artery (13), decidua (13), interstitial EVT (5), intravascular EVT (3); in total, 34 ROIs were selected (Supplementary Table 11). Morphology markers for SMA and VIM were used in conjunction with a serial H&E section to provide tissue context and to locate arteries and decidua on the platform. Artery, decidua and intravascular EVT ROIs were selected using the geometric selection tool, and interstitial EVTs were selected using a HLA-G+ mask. Intravascular EVTs were identified as HLA-G+ cells located within arteries. Each ROI was collected into a single well in a 96-well plate.

GeoMx DSP NGS library preparation and sequencing

Each GeoMx sample or well was uniquely indexed using an i5 × i7 dual-indexing system from Illumina. In total, 4 µl of a GeoMx DSP sample was used in a PCR reaction with 1 µM of i5 primer, 1 µM i7 primer and 1× NSTG PCR master mix. For the PCR amplification reaction, each 96-well plate was placed in a thermocycler programmed with the following protocol: 37 °C for 30 min, 50 °C for 10 min, 95 °C for 3 min, 18 cycles of 95 °C for 15 s, 65 °C for 60 s, 68 °C for 30 s, and final extension of 68 °C for 5 min. PCR assays were purified with two rounds of AMPure XP beads (Beckman Coulter) at 1.2× bead-to-sample ratio. Libraries were paired-end sequenced (2 × 75) on a NextSeq550 with up to 400 million total aligned reads.

Normalization and scaling of GeoMx counts data

Raw counts from each gene in each sample were extracted from the NanoString GeoMx NGS processing pipeline (Supplementary Table 11). Quality control was done according to the NanoString data analysis manual (MAN-10154-01) with default parameters as indicated in the manual. For each EVT sample, the counts were normalized using one of the manufacturer’s recommended approaches for normalizing GeoMx data: dividing all genes in each sample by the 75th percentile of expression in that sample, followed by multiplication by an identical scaling factor for all samples: the geometric mean of all 75th percentiles. This approach eliminates differences in counts between samples due to ROI-specific properties such as size and RNA-binding efficiency. The background due to nonspecific binding per sample was approximated with the geometric mean of the 100 negative control probes included in the probe mix, as recommended by NanoString Technologies. The above-described normalization step eliminated the correlation between background and ROI size for EVT samples. For artery and decidua samples, normalization was complicated by the fact that the ROI size was tightly correlated with SAR stage and therefore biologically meaningful trends in the data. This led to the correlation between ROI size and background not being entirely eliminated by normalization. We therefore used a background subtraction correction technique before normalization as recommended in the NanoString Technologies manual for such cases. The correction was performed by subtracting the geometric mean of negative probes from gene counts on a per-sample basis and proceeding with normalization as previously described.

Gene expression in artery as a function of GA and SAR

In brief, for each gene, we performed polynomial regressions of gene expression with δ and GA as the independent variables and used regression P values to determine which genes were trending and the ratio of regression R2 values to classify the trends as detailed below.

The NanoString Technologies RNA probes panel contains probes for 18,696 transcripts. For this analysis on artery samples, only genes with background-subtracted, normalized counts ≥10 in at least two arteries were considered. This resulted in 14,471 expressed genes. Each artery sample was assigned a remodelling score δ based on the δ of the sampled artery in the MIBI data. If several arteries were sampled, the assigned δ was the average δ values of the sampled arteries. Endothelial loss and SMA loss per sample were calculated similarly based on the corresponding MIBI values (Supplementary Table 11). The following steps were then performed on artery samples.

For all expressed genes, gene expression as a function of GA was defined as the background-subtracted and normalized counts for that gene, as a function of the GAs associated with the samples. Similarly, expression as a function of SAR for a gene was defined as the per-sample background-subtracted and normalized counts of that gene, as a function of the δ values per sample. A second-degree polynomial regression model was then fitted to the two above-described functions. The reason for using a second-degree polynomial instead of linear regression was to allow the regression models to capture non-monotonic trends in gene expression. All regression models were implemented using the MATLAB function fitnlm. Expression fold change was defined as the ratio between the maximum and the minimum of expression values. The centre of mass (COM) of the expression trajectory of a gene as a function of t (t being either GA or δ) was defined as the weighted mean of t values, where the weights are the expression values at the respective t.

Genes with a P value ≤ 0.05 and fold change ≥2 for either GA or δ regression were classified as trending genes. The ratio between R2 in the two regression models was used to classify trending genes as GA-driven, SAR-driven or synchronized. Trending genes with \({\log }_{2}({R}_{\delta }^{2}/{R}_{{\rm{GA}}}^{2})\ge 1\) and \({R}_{\delta }^{2}\ge 0.05\) were classified as SAR-driven, whereas genes with \({\log }_{2}({R}_{\delta }^{2}/{R}_{{\rm{GA}}}^{2})\le 1\) and \({R}_{{\rm{GA}}}^{2}\ge 0.05\) were classified as GA-driven. Other trending genes were classified as synchronized (Supplementary Table 4).

For visualization only, two fitted expression trajectories (one as a function of GA and another as a function of δ) were calculated per gene. These fitted expression trajectories were calculated as the values of the fitted second-degree polynomial model at five evenly spaced values of GA and δ, respectively. To compare fitted expression trajectories between genes, they were normalized by Z-scoring their value per gene (Fig. 3o).

See Supplementary Information for further details about analysis of NanoString data in decidua ROIs.

Coordinated gene expression by pathways in the artery

We set out to find gene pathways with coordinated expression trends among our genes of interest: genes trending with δ in arteries. To find these coordinated pathways, we first defined the pathways and then defined temporal coordination.

To define pathways, we used the R package msigdbr to obtain the lists of genes per pathway for the Gene Ontology by Biological Process database (7,481 pathways). We then cross-referenced the list of genes for each pathway with the genes of interest and discarded pathways with an intersection of fewer than ten genes. For the remaining pathways, we examined whether the pathway genes that appeared in the gene set of interest exhibited coordination in their expression as function of δ.

A group of coordinated genes was defined as a group of genes for which the COMs were significantly closer to each other than one would expect at random (see previous section for the definition of COM). Using the spread of COMs as a measure for coordination allowed us to leverage the raw data rather than fitted gene expression trajectories while still maintaining robustness against noise.

To calculate the extent of coordination between a group of N genes, we first calculated their median COM, denoted COMmed. Then, their COM dispersal was defined as the median of the absolute deviations from COMmed for the N genes, denoted CD. To determine whether the CD for the gene group, CDgroup, is significantly smaller than expected at random, we calculated the randomly expected CD, denoted CDrand. This was done by selecting N random genes without replacement and calculating their CD, 105 times to estimate the null distribution. The random CDrand was then calculated as the median over the CDs for randomized gene sets. The coordination score for our N genes group was then defined as log2(CDrand/CDgroup). The P value for the coordination score was defined as the number of times a randomized CD was smaller than CDgroup, divided by the number of randomizations (105). (1/number of randomizations) was then added to all P values to account for the finite number of randomizations. q values were calculated using the Benjamini and Hochberg method on P values, implemented using the MATLAB function mafdr.

The CD, coordination scores, P values and q values were calculated as described above for all 7,481 pathways. Pathways with coordination score ≥ 1.5 and P value ≤ 0.05 were considered to be coordinated (Supplementary Table 4).

Ridge regression for predicting GA from immune composition

Ridge regression was implemented using the sklearn Python package (sklearn.linear_model.Ridge, RidgeCV). Per-image immune frequencies were rescaled to the range 0–1 before model fitting using the sklearn scaling function. Images with fewer than ten immune cells were excluded (n = 8). A randomly derived test–train split of 30/70 was used, and GA distribution was verified to be equally represented in the test and train sets (Extended Data Fig. 6a). Ridge regression adds a regularization penalty to the loss function to prevent over or under representation of correlated variables, such as immune cell populations. The penalty used for the test set (0.81) was selected using leave-one-out cross-validation on the training set.

Cell–cell and cell–artery spatial enrichment analysis

To identify preferential colocalization of maternal immune cells in decidua, we measured the spatial proximity enrichment for all cell-type pairs, which evaluates the spatial organization of cell types relative to each other, as previously described22,23,24. Cells located in non-decidual feature masks (artery, gland, vessel or cell column masks) were not included in this analysis. The distances in pixels between all pairs of cells were calculated in each image. The resulting per-image distance matrices were binarized with a distance threshold (100 pixels or 39 μm in our case), and pairs of cells closer than 100 pixels from each other were considered a close interaction. To evaluate the number of close interactions between two cell types, this proximity matrix was subset column-wise by cell type A and subset row-wise by cell type B. The sum of the resulting submatrix quantified the number of close interactions between the cells of types A and B. To evaluate the significance of the number of close interactions, given the total number of cells in the image, tissue architecture and composition across the cohort, and total number of cells of types A and B in the image, a bootstrapping approach was used. For each of 100 bootstrapping iterations, the location of cells of type A was randomized across all cell locations (of any type) in the image while their total number was preserved. The number of close interactions with cells of type B was calculated for each randomized iteration. Repetitions of this process approached a null distribution for the number of close interactions between cells A and B. The enrichment score for cells A around cells B in the image was then calculated as the Z-score of the measured number of close interactions between A and B when Z-scored together with the random bootstraps. This analysis was extended to incorporate enrichment of cell types around spiral arteries. For each cell, the distance to the nearest spiral artery was considered. An additional column was added to the proximity matrix described above, which thresholded distances between cells and arteries with the same 100 pixel threshold. The above-described bootstrapping approach also provided a null distribution for artery proximity. Tools for this analysis were written in Python, with the bootstrapping accelerated using Cython. An intuitive, easy-to-use Jupyter Notebook interface was created to allow for easy implementation of this algorithm. For per-image spatial enrichment scores, see the Data availability statement. The code for this analysis is available at GitHub (https://github.com/angelolab/ark-analysis).

Cell–cell and cell–artery enrichment temporal trends and trending with GA or SAR or constant

For examining cell–cell and cell–artery enrichment within the decidua as a function of GA and SAR (Extended Data Figs. 6c and 2c), per-image enrichment score matrices E were calculated as described in the previous section, in which Ei,j is the enrichment score of cell type i around cell type j in the image. Enrichment as a function of GA was defined as the per-image enrichment, as a function of the GAs associated with the images. Similarly, enrichment as a function of SAR was defined as the per-image enrichment, as a function of the mean δ values per image. We fitted a linear regression model to the two above-described functions. All linear regression models were implemented using the MATLAB function fitlm. R2 and P values for all δ-based and GA-based regressions are provided in Supplementary Table 3. The ratio between R2 in the two regression models was used to classify trends as GA-driven, SAR-driven or synchronized like in Fig. 3n. Extended Data Fig. 6c only shows points for which regression R2 ≥ 0.05, P value ≤ 0.05, maximal absolute value of linear fit ≥ 2. Trends including muscle, fibroblast, myofibroblast, glandular, other and endothelial cells were not considered in this analysis. For determining trend sizes, the following calculation was used: denote the linear fit to per-image enrichment scores as V, and the corresponding per-image temporal stamps (either GA or mean image δ) as X. Trend size is then calculated as (V(max(X)) – V(min(X))).

To determine whether two cell types were significantly enriched around each other throughout the cohort, we averaged their pairwise enrichment over all images. The pair was considered enriched if the absolute value of mean enrichment was ≥2 (Supplementary Table 3).

In Extended Data Fig. 6c, the following cell–cell enrichments were not plotted for clarity: enrichment of a cell type around itself (for example, Treg cells around Treg cells); enrichments including muscle, fibroblast, myofibroblast, glandular, other and endothelial cell types; enrichment trends that are SAR-driven.

Functional marker positivity rate per cell type as a function of GA and SAR

For examining cell-type-specific temporal trends in the expression of functional markers (Fig. 5a), 48 combinations of cell-type functional marker were selected. The selected combinations were those for which the positivity frequency Z-score exceeded 0.5 (Fig. 2a, right). For each of these combinations, the frequency of cells positive for the functional marker was calculated as the number of cells positive for the marker (see the methods section ‘Definition of thresholds for functional marker positivity’) out of the total number of cells of the same cell type in the image. All cells except those located within the cell column mask were included to focus the analysis on functional marker trends of maternal cells and EVTs that had infiltrated the decidua. For glandular cells, the location was further restricted to the glands mask. The frequency of cells positive for a functional marker as a function of GA, for a cell type, was defined as the per-image positivity proportion values as a function of the GAs associated with the images. Similarly, marker positivity frequency as a function of SAR for a cell type was defined as the per-image proportions of that cell type positive for the marker, as a function of the mean δ values per image. For the volcano plot in Fig. 5a, we fitted a linear regression model to the two above-described functions. All linear regression models were implemented using the MATLAB function fitlm, and the volcano plot only shows points for which regression R2 > 0.05. R2 and P values for all δ-based and GA-based regressions are provided in Supplementary Table 10. For determining trend sizes depicted in Fig. 5a, the following calculation was used: denote the linear fit to the per-image marker positivity proportion of a cell type as V, and the corresponding per-image temporal stamps (either GA or mean image δ) as X. Trend size is then calculated as the difference between the first and last time point in units of the mean: \(\frac{V({\rm{\max }}({\rm{X}}))-V({\rm{\min }}(X))}{{\rm{mean}}(V)}\).

Cellular microenvironments

For each cell in the dataset, we defined a ‘neighbourhood’ consisting of its 25 closest neighbouring cells, as measured by Euclidean distance between x/y centroids, excluding cells that were not in the decidua (that is, cells that overlapped with any artery, gland, anchoring villous or vessel feature masks). We clustered these cellular neighbourhoods on the basis of their composition of the 26 cell populations as identified previously using FlowSOM. For clustering, we used the scikit-learn implementation of k-means algorithm with k = 20 to identify neighbourhoods characterized by the presence of rare cell populations. Selected clusters were merged on the basis of similarity when hierarchically clustered, a threshold of 0.5 when comparing Euclidean distances between k-means cluster centroids and manual inspection of the cluster assignment when overlaid on the images. Based on these approaches, we defined 12 distinct decidual cellular microenvironments, 10 of which are shown in Fig. 5f (microenvironments characterized by predominantly stromal cell populations, fibroblasts (3 in Supplementary Table 12) and myofibroblasts (6 in Supplementary Table 12) are not shown in the heatmap).

Definition of anatomical EVT location and associated arteries

Cell column EVTs were defined as EVTs located within cell column masks, intravascular EVTs were located within artery masks, and interstitial EVTs were located in the decidua. Perivascular EVTs were defined as interstitial EVTs located within 50 pixels of the edge of an artery, as defined by the radial expansion of the artery masks (Extended Data Fig. 7b). Arteries were said to have perivascular or intravascular EVTs (Fig. 6d and Extended Data Fig. 7e,f) if the number of EVTs in the appropriate artery compartment was >5.

SMA and endothelium-loss scores

The loss scores presented in Extended Data Fig. 7e,f were based on digitized morphological features. For SMA, the average feature was used, whereas for endothelium, the radial coverage of CD31 was used (see methods section ‘Automated digitization of artery morphological features’). The values for each of the two features were then divided by their maximum across arteries and subtracted from 1 to obtain a loss score. The resulting values were then linearly rescaled to the range 0–1 using the MATLAB function rescale.

Characterization of EVTs by compartment

To further characterize EVT composition per spatial compartment (cell column, interstitial, perivascular or intravascular), we first quantified EVT-subtype frequency per compartment. One image was excluded (16_31762_20_8) owing to abnormal tissue morphology (Extended Data Fig. 8a). We then compared the distance from the nearest artery between EVT subtypes (Extended Data Fig. 8b). For this analysis, only images that contained all four EVT types were considered, and the cell-to-artery distance was measured from the cell centroid as detected by segmentation to the closest point on the border of the artery mask.

We then set out to assess the extent of similarity between EVTs by compartment in terms of expression of functional markers. For this analysis we used a LDA based method, similarly to our calculation of the continuous SAR remodelling score δ for compartment-wise analysis of EVT types. The input table for LDA consisted of MIBI-measured marker expression values per EVT. The following lineage and functional markers expressed by EVTs were included: CD56, CD57, HLA-G, CK7, PD-L1 and Ki67. EVTs were labelled by spatial compartment as cell column, interstitial, perivascular or intravascular. Marker expression values were standardized (mean subtracted and divided by the standard deviation), and cell column, interstitial and intravascular location labels per EVT were used for training the LDA model. Perivascular EVTs were withheld as a test set. Owing to the small number or features (markers), a one-dimensional LDA was calculated to produce a single coordinate, LD1. LD1 was the optimal linear combination of a subset of markers to maximize the separation by compartment between EVTs (Supplementary Table 14). LD1 values were subsequently calculated for the withheld test set of perivascular EVTs (Supplementary Table 14). The distributions of LD1 values per compartment indicated that perivascular EVTs are similar to interstitial and intravascular EVTs, with a median value between the two (Extended Data Fig. 8c). This result implies that perivascular EVTs could be an intermediate state between interstitial and intravascular, in line with the intravasation model whereby interstitial EVTs invade the artery lumen.

Origin of CD56+ EVTs in the intravascular compartment

The frequency of CD56+ EVTs was highest in the intravascular compartment (Extended Data Fig. 8a). Furthermore, the frequency of CD56+ EVTs increased with SAR both in the perivascular and intravascular compartment (Extended Data Fig. 8d,e). However, the increase in the intravascular compartment was steeper (Extended Data Fig. 8f). We set out to test whether this was compatible with intravasation whereby the source for intravascular EVTs is perivascular EVTs or whether additional sources of intravascular CD56+ EVTs were needed to account for their increase. These alternative sources could be cell proliferation or extravasation, whereby EVTs migrate in a retrograde manner after entering spiral arteries directly at the basal plate. Under the intravasation model, the steep increase in CD56+ EVTs between the perivascular and intravascular compartments would be explained by CD56 upregulation upon arterial invasion. We proposed that this should involve an intermediate state of perivascular EVTs defined by moderate levels of CD56. To test this hypothesis, we compared the average CD56 intensity of perivascular and intravascular EVT1a and EVT1b cells for each artery (for arteries that initiated remodelling: δ ≥ 2). This analysis detected a significant increase in CD56 expression between the perivascular and intravascular compartment by EVT1a and EVT1b cells (sided Wilcoxon signed rank test P value = 5 × 10−3; Extended Data Fig. 8g). An alternative explanation for the increasing frequency of CD56+ EVT1c cells within arteries could be proliferation. However, only 0.5% of intravascular EVT1c cells were Ki67+, a lower frequency than 9.6% and 1.8% for intravascular EVT1a and EVT1b cells, respectively, which suggested that proliferation is not a primary contributor (Extended Data Fig. 8h).

DEGs in EVTs

DEGs between intravascular and interstitial EVTs were identified using the Bioconductor package limma65 (linear models for microarray data) after consulting with the NanoString statistics team. Using the default parameters in limma on 75th percentile normalized counts, 131 upregulated genes and 143 downregulated genes were found (false-discovery rate cutoff = 0.1, log fold change cutoff = 2). Genes with log fold change ≥ 2.3 or ≤ 2.3 and adjusted P value ≤ 0.05 are shown in the heatmap in Fig. 6f. A complete list of DEGs is shown in Extended Data Fig. 9 and Supplementary Table 11. For IHC validation of differential expression for selected genes, see Extended Data Fig. 10a.

NicheNet analysis

We used the NicheNet R package to predict ligand–receptor interactions between intravascular EVTs and arteries. The analysis was performed by following the vignette available at GitHub (https://github.com/saeyslab/nichenetr/blob/master/vignettes/ligand_activity_geneset.md).

NicheNet requires three input gene lists to predict ligands in sender cells that are likely to interact with receptors in receiver cells and by doing so affect the expression of a set of genes of interest. These three gene lists are: genes of interest, genes expressed in sender cells and genes expressed in receiver cells.

For our analysis, we wanted to check which ligands expressed in intravascular EVTs are likely to be causing temporal gene expression trends with remodelling in arteries. To do so, we defined the genes of interest as all genes trending with remodelling in arteries (Fig. 3o). The genes expressed in receiver cells were defined as all genes expressed in arteries (see previous sections for the definition of ‘expressed’), and genes expressed in sender cells were defined as genes differentially expressed between interstitial and intravascular EVTs and higher in intravascular EVTs (Supplementary Table 11). NicheNet analysis was performed as described in the vignette to prioritize ligands and to infer corresponding receptors and downstream targets (Extended Data Fig. 10b). The inferred targets were manually classified according to their known function using the Gene Cards database (https://www.genecards.org) and survey of the literature. A list of references for all classifications are provided in Supplementary Table 5.

Statistical analyses

Throughout the paper, unless indicated otherwise, the Kruskal–Wallis test was used. It was implemented using the MATLAB function KruskalWallis. All linear regression models were implemented using the MATLAB function fitlm unless stated otherwise. The sided Wilcoxon signed-rank test for paired analysis was implemented using the MATLAB function signrank. MATLAB v.2020b was used throughout the article for statistical analysis.

Ethics statement

All human samples were acquired and all experiments were approved by Institutional Review Board protocol number 46646 “Assessing normal expression patterns of immune and non-immune markers across tissue types with multiplexed ion beam imaging” at Stanford University. Per this protocol from the Institutional Review Board at Stanford University, the consent to use archival deidentified tissue was not required. All experiments followed all relevant guidelines and regulations.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.



Source link

By AUTHOR

Leave a Reply

Your email address will not be published. Required fields are marked *