Human cohorts
Table of Contents
Covance
Details of the Covance study have been previously published54. Briefly, Covance is a multi-site cross-sectional study of health across the lifespan collected at five hospital sites in the United States in 2008. A total of 1,028 subjects were included in analyses for this study. Cohort demographic characteristics are summarized in Supplementary Table 1. Exclusion criteria for the study included uncontrolled hypertension, self-reported treatment for a malignancy other than squamous cell or basal cell carcinoma of the skin in the last two years, self-reported pregnancy, self-reported chronic infection, autoimmune condition or other inflammatory condition, self-reported chronic kidney or liver disease, chronic heart failure or diagnosed with myocardial infarction in the last three months, self-reported diabetes (HbA1c > 8% if known), self-reported acute bacterial or viral infection in the past 24 h or a temperature greater than 38 °C within 24 h of enrolment, self-reported participation in any therapeutic study within 14 days before blood sampling and taking more than 20 mg of prednisone or related drugs.
Clinical blood chemistry was performed on the same samples, including a complete blood count and comprehensive metabolic panel, lipid panel and liver function tests. Basic physical workup (blood pressure, pulse and respirations) was also collected. Lifestyle information was also collected from all participants using a survey which asked about smoking, alcohol, exercise, habits and frequency of consumption of different meats and vegetables.
LonGenity
Details of the LonGenity cohort have been previously published55,56. Briefly, LonGenity is an ongoing longitudinal study initiated in 2008 and designed to identify biological factors that contribute to healthy aging. The LonGenity study enrols older adults of Ashkenazi Jewish descent with an age range of 65–94 years at a baseline. Approximately half of the cohort consists of offspring of parents with exceptional longevity, defined as having at least one parent who survived to 95 years of age. The other half of the cohort includes offspring of parents with usual survival, defined as not having a parental history of exceptional longevity. A total of 962 subjects were included in analyses for this study. The cohort characteristics are summarized in Supplementary Table 1. LonGenity participants are thoroughly characterized demographically and phenotypically at annual visits that include collection of medical history and physical and detailed neurocognitive assessments (described in detail below). The LonGenity study was approved by the institutional review board (IRB) at the Albert Einstein College of Medicine.
Subjects in the LonGenity cohort underwent extensive cognitive examination. The Overall Cognition Composite score was determined by the relative performance of the subject in the Free and Cued Selective Reminding Test, WMS-R Logical Memory I, RBANS Figure Copy, RBANS Figure Recall, WAIS-III Digit Span, WAIS-III Digit Symbol Coding, Phonemic Fluency (FAS), Categorical Fluency, Trail Making Test A and Trail Making Test B. For each task a standardized score (z) was calculated based on the population. The z-score for each task was then combined to create the overall cognition composite.
Stanford Alzheimer’s Disease Research Center
Samples were acquired through the National Institute on Aging (NIA)-funded Stanford Alzheimer’s Disease Research Center (Stanford-ADRC). The Stanford-ADRC cohort is a longitudinal observational study of clinical dementia subjects and age-sex-matched nondemented subjects. The collection of plasma was approved by the Institutional Review Board of Stanford University and written consent was obtained from all subjects. Blood collection and processing were done according to a rigorous standardized protocol to minimize variation associated with blood draw and blood processing. Briefly, about 10 cc of whole blood was collected in a vacutainer ethylenediaminetetraacetic acid (EDTA) tube (Becton Dickinson vacutainer EDTA tube) and spun at 3,000 RPM for 10 mins to separate out plasma, leaving 1 cm of plasma above the buffy coat and taking care not to disturb the buffy coat to circumvent cell contamination. Plasma processing times averaged approximately one hour from the time of the blood draw to the time of freezing and storage. All blood draws were done in the morning to minimize the impact of circadian rhythm on protein concentrations. Plasma pTau-181 levels were measured using the fully automated Lumipulse G 1200 platform (Fujirebio US, Inc, Malvern, PA) by experimenters blind to diagnostic information, as previously described57.
All healthy control participants were deemed cognitively unimpaired during a clinical consensus conference that included board-certified neurologists and neuropsychologists. Cognitively impaired subjects underwent Clinical Dementia Rating and standardized neurological and neuropsychological assessments to determine cognitive and diagnostic status, including procedures of the National Alzheimer’s Coordinating Center (https://naccdata.org/). Cognitive status was determined in a clinical consensus conference that included neurologists and neuropsychologists. All participants were free from acute infectious diseases and in good physical condition. A total of 409 subjects were included in analyses for this study. Cohort demographics and clinical diagnostic categories are summarized in Supplementary Table 1.
Stanford Aging Memory Study
SAMS is an ongoing longitudinal study of healthy aging. Blood collection and processing were done by the same team and using the same protocol as in Stanford-ADRC. Neurological and neuropsychological assessments were performed by the same team and using the same protocol as in Stanford-ADRC. All SAMS participants had CDR = 0 and a neuropsychological test score within the normal range; all SAMS participants were deemed cognitively unimpaired during a clinical consensus conference that included neurologists and neuropsychologists. A total of 192 cognitively SAMS participants were included in the present study. The collection of plasma was approved by the Institutional Review Board of Stanford University and written consent was obtained from all subjects. Cohort demographics and clinical diagnostic categories are summarized in Supplementary Table 1.
Knight Alzheimer’s Disease Research Center
The Knight-ADRC cohort is an NIA-funded longitudinal observational study of clinical dementia subjects and age-matched controls. Research participants at the Knight-ADRC undergo longitudinal cognitive, neuropsychologic, imaging and biomarker assessments including Clinical Dementia Rating (CDR). Among individuals with CSF and plasma data, AD cases corresponded to those with a diagnosis of dementia of the Alzheimer’s type (DAT) using criteria equivalent to the National Institute of Neurological and Communication Disorders and Stroke-Alzheimer’s Disease and Related Disorders Association for probable AD58, and AD severity was determined using the Clinical Dementia Rating (CDR)59 at the time of lumbar puncture (for CSF samples) or blood draw (for plasma samples). Controls received the same assessment as the cases but were nondemented (CDR = 0). Blood samples were collected in EDTA tubes (Becton Dickinson vacutainer purple top) at the visit time, immediately centrifuged at 1,500g for 10 min, aliquoted on two-dimensional barcoded Micronic tubes (200 ul per aliquot) and stored at −80 °C. The plasma was stored in monitored −80 °C freezer until it was pulled and sent to Somalogic for data generation. The Institutional Review Board of Washington University School of Medicine in St. Louis approved the study and research was performed in accordance with the approved protocols. A total of 3,075 participants were included in the present study. Cohort demographics and clinical diagnostic categories are summarized in Supplementary Table 1.
Proteomics data acquisition and quality control
SomaScan assay
We used the SomaLogic SomaScan assay, which uses slow off-rate modified DNA aptamers (SOMAmers) to bind target proteins with high specificity, to quantify the relative concentration of thousands of human proteins in plasma. The assay has been used in hundreds of studies and described in detail previously54,60. Two versions of the SomaScan assay were used in this study. The v.4 assay (4,979 protein targets) was applied to the Covance and LonGenity cohorts, and the v.4.1 assay (7,288 protein targets) was applied to the SAMS, Stanford-ADRC and Knight-ADRC cohorts. All v.4 targets are included in the v.4.1 assay based on SeqId, and only the v.4 targets were analysed for this study.
Somalogic normalization and quality control
Standard Somalogic normalization, calibration and quality control were performed on all samples54,61,62,63. Briefly, pooled reference standards and buffer standards are included on each plate to control for batch effects during assay quantification. Samples are normalized within and across plates using median signal intensities in reference standards to control for both within-plate and across-plate technical variation. Samples are further normalized to a pooled reference using an adaptive maximum likelihood procedure. Samples are additionally flagged by SomaLogic if signal intensities deviated significantly from the expected range and these samples were excluded from analysis. The resulting expression values are the provided data from Somalogic and are considered ‘raw’ data.
The v.4 → v.4.1 multiplication scaling factors provided by Somalogic were applied to the raw v.4 assay expression values to allow for direct comparisons across two v.4 and three v.4.1 cohorts. We discarded proteins for which the correlation was low between assay versions v.4 and v.4.1 and low estimated replicate coefficient of variation64 (Supplementary Fig. 1). This resulted in 4,778 proteins for downstream analysis. The raw data were log10 transformed before analysis, as the assay has an expected log-normal distribution.
Somalogic probe validation
Somalogic has analysed close to 1 million samples with their technology at the time of this publication, resulting in some 700 publications (https://somalogic.com/publications/). There is minimal replicate sample variability64,65 (coefficient of variation, CV). The majority of SomaScan protein measurements are stable and a subset of proteins have been validated as laboratory-developed tests (LDTs), and have been delivered out of Somalogic’s CLIA-certified laboratory to physicians and patients in the context of medical management66.
-
1.
All 7,524 probes on the assay undergo rigorous primary validation of binding and sensitivity to the target protein.
-
a.
Determination of equilibrium binding affinity dissociation constant (KD).
-
b.
Pull down assay of cognate protein from buffer.
-
c.
Demonstration of dose-responsive in the SomaScan Assay.
-
d.
Estimation of endogenous cognate protein signals in human plasma above limit of detection.
-
a.
-
2.
A total of 70% of their probes have at least one orthogonal source of validation (Supplementary Fig. 1b) from:
-
a.
Mass spectrometry: approximately 900 probes which measure mostly high and mid abundance proteins (due to sensitivity limitations of mass spectrometry), have been confirmed with either data dependent acquisition (DDA) or multiple reaction monitoring (MRM) mass spectrometry.
-
b.
Antibody: approximately 390 probe measurements correlate with antibody based measurements.
-
c.
Cis-protein quantitative trait loci (pQTL): approximately 2,860 probe measurements are associated with genetic variation in the cognate protein-encoding gene.
-
d.
Absence of binding with nearest neighbour: approximately 1,150 probes do not detect signal from the protein that is most closely related in sequence to the cognate protein.
-
e.
Correlation with RNA: approximately 1,460 probe measurements correlate with mRNA levels in cell lines.
-
a.
Identification of organ-enriched plasma proteins
We used the Gene Tissue Expression Atlas (GTEx) human tissue bulk RNA-seq database18 to identify organ-enriched genes and plasma proteins (Extended Data Fig. 1). Tissue gene expression data were normalized using the DESeq2 (ref. 67) R package. We define organ-enriched genes in accordance with the definition proposed by the Human Protein Atlas19: a gene is enriched if it is expressed at least four times higher in a single organ compared to any other organ. Within GTEx, we grouped tissues of the same organ together, such that a gene’s expression level for a given organ was the maximum gene expression value among its subtissues. For example, all GTEx brain regions were considered subtissues of the brain organ. We define the immune organ, which is not a GTEx tissue, as expression in the blood and the spleen tissues. Organ-enriched genes were mapped to the 4,979 plasma proteins quantified in the v.4 SomaScan assay.
Bootstrap aggregated LASSO aging models
To estimate biological age using the plasma proteome, we built LASSO regression-based chronological age predictors (Extended Data Figs. 2–3 and Supplementary Fig. 3) using the scikit-learn68 python package. We employed bootstrap aggregation for model training. Briefly, we resampled with replacement to generate 500 bootstrap samples of our training data (Knight-ADRC: 1,398 healthy individuals). Each bootstrap sample was the same size as the training data, 1,398. For each bootstrap sample, we trained a model on z-scored log10 normalized protein expression values with sex (F = 1, M = 0) as a covariate to predict chronological age. For model training, we performed hyperparameter tuning of the L1 regularization parameter, λ, with five-fold cross validation using the GridSearchCV function from scikit-learn. To reduce model complexity and avoid overfitting, we selected the highest λ value that retained 95% performance relative to the best model. The mean predicted age from all 500 bootstrap models was used.
We trained our models in 1,398 cognitively unimpaired participants from the Knight-ADRC cohort. We evaluated their performance in the Covance (n = 1,029), LonGenity (n = 962), SAMS (n = 192), Stanford-ADRC (n = 409) cohorts and Knight-ADRC cognitively impaired subjects (n = 1,677). Models that included sex as a covariate and models trained separately on males and females showed similar age prediction performance on both sexes, so we controlled for sex to extend the generality of the findings and reduce analytic complexity (Supplementary Fig. 3a–c). There was a correlation between age estimation accuracy and the number of proteins used as input to each model (Supplementary Fig. 3c,d). However, several models with few protein inputs, such as the adipose (five proteins) and heart models (ten proteins), predicted chronological age better than models with more protein inputs (Extended Data Fig. 3).
Age gap calculation and independent validation
To calculate each individual sample age gap for each aging model, we performed the following steps for each aging model. We fit a local regression between predicted and chronological age using the lowess function from the statsmodels69 python package with fraction parameter set to 2/3 to estimate the true population mean (Supplementary Fig. 3e). A local regression is used in place of a simple linear regression because of extensive evidence that the plasma proteome changes nonlinearly with age1, which we see replicated in all five cohorts (Supplementary Fig. 8). Individual sample age gaps were then calculated as the difference between predicted age and the lowess regression estimate of the population mean. Age gaps were calculated separately per cohort to account for cohort differences (Supplementary Fig. 3e). Age gaps were z-scored per aging model to account for the differences in model variability (Supplementary Fig. 3f). This allowed for direct comparison between organ age gaps in downstream analyses.
Phenotypic age calculation
We used the published coefficients14 to calculate the phenotypic age of participants in the Covance cohort using albumin, creatinine, glucose, c-reactive protein, % lymphocyte, mean cell volume, red cell distribution width, alkaline phosphatase, white blood cell count and age.
Statistical methods to associate organ age gaps with age-related phenotypes
Study design
A flowchart of the study design is provided in Supplementary Fig. 2. Each box in the flowchart was treated as a separate analysis for the purpose of multiple testing correction. Multiple testing correction was done using the Benjamani–Hochberg method and the significance threshold was a 5% false discovery rate. To summarize the flowchart, the age gaps from all 11 organ aging models, the organismal model and the conventional model were used in the following analyses: prediction of future mortality in the LonGenity cohort with a cox proportional hazards model (CPH) (12 of 13 tests significant after FDR), prediction of future heart disease in the LonGenity cohort with a CPH (12 of 13 tests significant after FDR), association with nine diseases of aging in a cross-cohort meta-analysis (66 of 17 tests significant after FDR) and association with 42 clinical biochemistry markers in the Covance cohort (237 of 588 tests significant after FDR, PhenoAge gap also tested for 14 × 42 tests).
The 12 cognition-optimized models (11 organs + organismal model) were tested on additional brain aging phenotypes. The CognitionBrain age gap only was tested for association with 65 MRI brain volumes and an MRI-based brain age gap (40 of 66 tests significant after FDR). The CognitionBrain age gap only was included in a multivariate CPH model of dementia progression in AD (1 of 1 tests significant, no FDR). The 12 cognition-optimized model age gaps were tested for association with AD status in the Knight-ADRC (12 of 12 tests significant after FDR), then a replication analysis was performed in Stanford-ADRC (4 of 12 tests significant at P < 0.05, no FDR). The four models which replicated CognitionBrain, CognitionOrganismal, CognitionArtery and CognitionPancreas were then tested for associations with overall cognition in healthy elderly people (LonGenity, 4 of 4 tests significant and no FDR), memory function in the Stanford-ADRC (2 of 4 tests significant, no FDR) and 15-year prediction of conversion from normal cognition to mild cognitive impairment in the Knight-ADRC with a CPH model (2 of 4 tests significant, no FDR).
Linear modelling
Estimation of chronological age is not sufficient in determining whether an organ aging model measures the age-related physiological dysfunction of an organ. To determine whether estimated organ age contains physiologically relevant information, we associated organ age gaps with various age-related phenotypes across Covance, LonGenity, SAMS, Stanford-ADRC and Knight-ADRC cohorts. Most organ age gap versus trait associations in this study (Figs. 2a–d and 3c and Extended Data Figs. 4d,e, 5c, 6b,c,7, 8c,d and 9) were assessed using linear models controlled for age and sex as follows: age gap ≈ trait + age + sex and adjusted for multiple testing burden using the Benjamini–Hochberg method when appropriate. To describe disease associations in relation to years of additional aging in the main text, we took the coefficient for the trait variable—which provides an estimate of the mean difference in z-scored age gaps between disease and control—and converted that to an estimate of mean difference in raw age gaps, using the standard deviation of raw age gaps provided in Supplementary Table 8.
Meta-analyses
Meta-analyses to compare and aggregate effect sizes and confidence intervals from multiple cohorts were performed in R using the metafor70 package with an inverse variance weighted fixed effects model.
Cox proportional hazard modelling
Cox proportional hazards models were used to assess the association between organ age gaps and future risk of mortality, congestive heart failure and increase in clinical dementia rating using the following model: event risk ≈ organ age gap + age + sex. Models were tested using the lifelines71 python package. Kaplan Meyer curves were generated at population-average covariate values in the relevant subject populations.
Extreme agers
Extreme agers were defined as individuals who had an age gap value two standard deviations above or below the mean (z-scored age gap greater than 2 or z-scored age gap less than −2) for at least one aging model. A total of 23% of the population across all cohorts were extreme agers. All extreme agers showed accelerated aging; no individuals displayed extreme youth signatures without extreme aging signature in a different organ (Extended Data Fig. 4a). To identify different groups of extreme agers with similar aging profiles, we performed k-means clustering (n = 13) of the extreme agers. Z-scored age gap values above 2 or below −2 were set to zero before clustering. The clusters showed distinct organ agers (Fig. 1e and Extended Data Fig. 4b). A multi-organ ager cluster was also identified. Individuals who were extreme agers in at least five different organs were manually set to multi-organ agers. Extreme ageotypes (clusters) were associated with major age-related diseases using logistic regression (trait ≈ e-ageotype) in a cross-cohort meta-analysis (Extended Data Fig. 4d and Supplementary Table 9)
Feature importance for biological aging
FIBA is an adaptation of permutation feature importance (PFI)72 (Extended Data Fig. 6a). PFI is traditionally used in machine learning to assess how much a model depends on a given feature for prediction accuracy of the target variable. The PFI score is defined as the decrease in a model’s performance when values from a single feature are randomized. In our case, for chronological age predictors, the PFI score would be calculated as the difference between the model’s original prediction accuracy (Pearson correlation between predicted and chronological age) and the model’s prediction accuracy after randomization of a single feature. The final PFI score is the mean PFI score from five randomizations.
FIBA builds on the concept of PFI and applies it to the field of aging to assess the importance of a feature in measuring biological age, instead of the target variable, chronological age. We assume that information about biological age lies in the model age gap and its association with an age-related trait. Thus, randomization of an important feature would reduce the association between the model age gap and the trait (in the expected direction). The FIBA score for a protein is calculated based on this logic and is defined as the difference between the model age gap’s original association with a trait and the association with that trait after randomization of a single feature.
We applied FIBA to understand aging model protein contributions to associations with cognition using the CDR-Global score. The mean FIBA score after five permutations was calculated for all 500 bootstraps for all organ aging models (Supplementary Table 15). A protein was defined as significant (FIBA+) if less than 5% (empirical single-tailed P < 0.05) of its FIBA scores across bootstraps was negative. Only proteins with nonzero coefficients in at least 100/500 bootstraps were considered. FIBA+ organ-specific proteins were used to train new cognition-optimized aging models from cognitively unimpaired individuals in the Knight-ADRC cohort.
Biological pathway enrichment and protein–protein interaction analysis
Biological pathway enrichment analyses were performed using g:Profiler73 with the all human genes set as the background distribution. Protein–protein interaction networks were generated using the STRING database74.
Single-cell RNA sequencing analysis
Preprocessed human heart52 and kidney51 scRNA-seq data were accessed from studies in the Human Cell Atlas. Preprocessed brain scRNA-seq data were accessed from ref. 53. Preprocessed human brain vasculature scRNA-seq data were accessed from ref. 42. Preprocessed human vasculature scRNA-seq data were accessed from Tabula Sapiens41. Gene expression counts data were log(CPM + 1) transformed and z-scored for visualization.
Brain tissue bulk proteomics and RNA sequencing
Differential expression statistics of proteins and RNA from AD versus control brains were accessed from ref. 39.
Brain MRI data from Stanford-ADRC and SAMS cohorts
MRI acquisition
Whole-brain MRI scans were collected from all subjects in the Stanford-ADRC and SAMS cohorts. All MRI data was collected at the Stanford Richard M. Lucas Center for Imaging. A total of 271 subjects underwent MRI scanning on a 3 T MRI scanner (GE Discovery MR750). T1-weighted SPGR scans were collected (TR/TE/TI = 8.2/3.2/900 ms, flip angle = 9, 1 × 1 × 1 mm) and used to define grey matter volumes. A total of 134 subjects underwent MRI scanning on a hybrid PET/MRI scanner (Signa 3 tesla, GE Healthcare). T1-weighted SPGR scan were collected (TR/TE/TI = 7.7/3.1/400 ms, flip angle = 11, 1.2 × 1.1 × 1.1 mm) and used to define grey matter volumes.
Structural MRI processing
Region of interest (ROI) labelling was implemented using the FreeSurfer75 software package v.7 (http://surfer.nmr.mgh.harvard.edu). In brief, structural images were bias field corrected, intensity normalized and skull stripped using a watershed algorithm. These images underwent a white matter-based segmentation, grey/white matter and pial surfaces were defined, and topology correction was applied to these reconstructed surfaces. Subcortical and cortical ROIs spanning the entire brain were defined in each subject’s native space, using the aparc+aseg atlas in FreeSurfer.
MRI brainageR algorithm
Using matched brain MRI and plasma proteomic data from n = 541 samples in SAMS and Stanford-ADRC, we compared our plasma proteomic organ clocks with established brain MRI-based clocks, brainageR16 and BARACUS Brain-Age76.
We used a pretrained machine learning algorithm (https://github.com/james-cole/brainageR) and raw T1-weighted MRI scans to estimate brain age. This software uses SPM12 (https://www.fil.ion.ucl.ac.uk/spm/software/spm12/) to perform tissue segmentation and normalization of individual scans to Montreal Neurological Institute (MNI) template space. The software relies on a model that used Gaussian process regression to predict brain age on 3,777 participants from seven publicly available datasets (mean age = 40.1, range = 18–90 years). It applies the results of this training to predict brain age in any new T1-w data, utilizing the RNifti (v.1.4.5) and kernlab (v.0.9-32) packages within R v.4.2.
We also used another pretrained algorithm, BARACUS (https://github.com/bids-apps/baracus, ref. 76) to estimate brain age from FreeSurfer v.5.3 processed T1-w scans. The vertex-wise cortical thickness and surface area values (transformed from subject space to fsaverage4 standard space), along with the subcortical volumetric statistics, were used as input to BARACUS’s linear support vector machine model. This model was trained on 1,166 participants with no objective cognitive impairment (566 female, mean age = 59.1, range = 20–80 years). It returns a ‘stacked-anatomy’ prediction among its results, which we used as the estimate of brain age for this method.
MRI regions of interest analysis
The volume of the AD signature region was calculated as the sum of the volumes of the parahippocampal gyrus, entorhinal cortex, inferior parietal lobules, hippocampus and precuneus. Following best practice, ROIs were linearly adjusted for estimated total intracranial volume to account for the differences in human size that is unrelated to cognitive function and neurodegeneration. Associations between organ age gaps and adjusted brain ROIs were tested using a linear model controlled for age and sex. Associations were performed for all ROIs in the aparc+aseg atlas.
Alzheimer’s disease polygenic risk score in the Stanford-ADRC cohort
AD polygenic risk scores (PRS) were calculated in the Stanford-ADRC cohort to compare to the CognitionBrain age gap. PRSs were determined from whole-genome sequencing. The Genome Analysis Toolkit workflow Germline short variant discovery was used to map genome sequencing data to the reference genome (GRCh38) and to produce high-confidence variant calls using joint-calling77. Six individuals were excluded from further whole-genome sequencing analysis due to discordance between their reported sex and genetic sex. APOE genotype (ε2/ ε3/ ε4) was determined using allelic combinations of single nucleotide variants rs7412 and rs429358. The independent loci identified in the largest AD GWAS to date were used to compute AD PRS. Namely, the 84 variants and their effect size available from Tables 1 and 2 in ref. 30 were used, in addition to rs7412 (odds ratio = 0.6) and rs429358 (odds ratio = 3.7). Plink1.9 (ref. 78) with the ‘—score’ flag was used to formally compute the PRS, while providing the individual genotypes and the list of variants with their effect size as input. Three individuals with pathogenic mutations PSEN1 or GBA were removed from this analysis.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.