Strange IndiaStrange India


Human samples

Human samples came from the Milieu Intérieur cohort, which was approved by the Comité de Protection des Personnes–Ouest 6 on 13 June 2012, and by the French Agence Nationale de Sécurité du Médicament (ANSM) on 22 June 2012. The study is sponsored by Institut Pasteur (Pasteur ID-RCB Number: 2012-A00238-35) and was conducted as a single-centre interventional study without an investigational product. The original protocol was registered under ClinicalTrials.gov (study no. NCT01699893). The samples and data used in this study were formally established as the Milieu Intérieur biocollection (NCT03905993), with approvals by the Comité de Protection des Personnes–Sud Méditerranée and the Commission Nationale de l’Informatique et des Libertés (CNIL) on 11 April 2018. Donors gave written informed consent. The 1,000 donors of the Milieu Intérieur cohort were recruited by BioTrial to be composed of healthy individuals of the same genetic background (Western European) and to have 100 women and 100 men from each decade of life between 20 and 69 years of age. Donors were selected based on various inclusion and exclusion criteria that were previously described12. In brief, donors were required to have no history or evidence of severe, chronic or recurrent pathological conditions, neurological or psychiatric disorders, alcohol abuse, recent use of drugs, recent vaccine administration and recent use of immune modulatory agents. To avoid the influence of hormonal fluctuations in women, pregnant and peri-menopausal women were not included. To avoid genetic stratification in the study population, the recruitment of donors was restricted to individuals whose parents and grandparents were born in Metropolitan France. Additionally, we formally checked how the genetic background of the donors could affect cytokine levels by performing association tests between the first 20 genetic principal components out of the PCA on the individual genotypes and each of the induced cytokines in each stimulation. Although PC1 had significant association with IL-10 (Benjamini–Yekutieli adjusted P value < 0.05), we found that the first 20 principal components showed no significant associations with cytokine responses at the P value threshold (Benjamini–Yekutieli adjusted P value < 0.01) we use throughout this study. To illustrate the homogeneity of the genetic structure of the 1,000 individuals of the Milieu Intérieur cohort, a PCA was performed with EIGENSTRAT41 on 261,827 independent SNPs and 1,723 individuals, which include the 1,000 Milieu Intérieur donors together with 723 individuals from a selection of 36 populations originating from North Africa, the Near East, as well as western and northern Europe42 is shown, similarly to what was previously performed3. PC1 versus PC2, PC1 versus PC3 and PC2 versus PC3 are displayed as well as a bar plot of the variance explained by the first 20 components of the PCA (Extended Data Fig. 9b). Unless otherwise stated, all displayed results have been performed on the 955 individuals of the cohort who gave consent to share their data publicly, in order to ensure easy reproducibility of the results.

TruCulture whole-blood stimulations

TruCulture whole-blood stimulations were performed in a standardized way as previously described4,43. Briefly, tubes were prepared in batch with the indicated stimulus, resuspended in a volume of 2 ml buffered medium, and maintained at −20 °C until time of use. Stimuli used in this study were LPS derived from E. coli O111:B4 (Invivogen), E. coli O111:B4 (Invivogen), C. albicans (Invivogen), vaccine-grade poly I:C (Invivogen), live Bacillus Calmette-Guerin (Immucyst, Sanofi Pasteur), live H1N1 attenuated influenza A/PR8 (IAV) (Charles River), SEB (Bernhard Nocht Institute), CD3 + CD28 (R&D Systems and Beckman Coulter), and cytokines TNF (Miltenyi Biotech), IL-1β (Peprotech) and IFNγ (Boehringer Ingelheim). One millilitre of whole blood was distributed into each of the prewarmed TruCulture tubes, inserted into a dry block incubator, and maintained at 37 °C room air for 22 h. At the end of the incubation period, tubes were opened, and a valve was inserted in order to separate the sedimented cells from the supernatant and to stop the stimulation reaction. Liquid supernatants were aliquoted and immediately frozen at −80 °C until the time of use.

Luminex multi-analyte profiling

Supernatants from TruCulture tubes were analysed by Rules Based Medicine using the Luminex xMAP technology. Samples were analysed according to the Clinical Laboratory Improvement Amendments (CLIA) guidelines. The lower limit of quantification (LLOQ) was determined as previously described43, and is the lowest concentration of an analyte in a sample that can be reliably detected and at which the total error meets CLIA requirements for laboratory accuracy. The 13 cytokines (CXCL5, CSF2, IFNγ, IL-1β, TNF, IL-2, IL-6, IL-8, IL-10, IL-12p70, IL-13, IL-17 and IL-23), which were measured in this study, were selected to best capture broad immune response variability. Among 109 analytes initially tested, these are the ones that captured the maximum variance following stimulation with the 4 stimuli (LPS, BCG, poly I:C and SEB) that showed the most distinct immune responses among 27 stimuli tested on a subset of 25 individuals of the Milieu Intérieur cohort.

Principal components analysis

The PCA in Extended Data Fig. 1 was created in R 4.2.1 using the FactoMineR 2.8 package. The data were log-transformed and by default scaled to unit and missing values were imputed by the mean of the variable.

Cytokine induction visualization

Cytokines were considered induced if the absolute value of their median concentration in the stimulated condition was 30%-fold of their concentration in the null condition. Standardized log mean differences were computed as follows (mean(concentration of the cytokine in the stimulated condition) − mean(concentration of the cytokine in the null condition))/s.d.((concentration of the cytokine in the stimulated condition) − (concentration of the cytokine in the null condition)) and the corresponding heat map was generated with heatmaply 1.0.0 and dendextend 1.13.12 with ‘complete’ clustering method and ‘euclidean’ distance in R version 4.2.1.

Identification of CD3 + CD28 non-responders

Levels of cytokines that we focused on are low to undetectable in the non-stimulated condition, and cytokine induction is generally homogenous within this healthy population of individuals, with no clear distinguishable groups of responders and non-responders, except for anti-CD3 + CD28 stimulation (Extended Data Fig. 2). For the anti-CD3 + CD28 stimulation, we identified through k-means clustering a group of 705 individuals that responded to the stimulation and a group of 295 individual did not. This lack of response of 295 individuals is explained by a FcγRIIA polymorphism (rs1801274) that was previously described as preventing response to this anti-CD3 + CD28 stimulation28 (Extended Data Fig. 9). All statistical analyses on anti-CD3 + CD28 stimulations in this study were thus performed on the responders only.

eCRF criteria association tests with induced cytokines

Variables were extracted from the eCRF filled by the donors with the help of a physician. To limit biases in associations, categorical variables had to have at least 5% of individuals in at least half of the categorical levels to be considered for association tests. Such categoricalvariables or numerical ones were tested for associations with the log-transformed induced cytokine levels in each stimulation through LRTs, using age, sex and the technical variable batchID (corresponding to two batches of TruCulture tubes produced at different periods of time) as covariates: the LRT compared the models lm(cytokine ~ variable + age + sex + batchID) with lm(cytokine ~ age + sex + batchID), followed by Benjamini–Yekutieli multiple testing correction applied to the whole heat maps, so taking into account the tests made for the 136 CRF variables with all the induced cytokines in a specific stimulation. For Extended Data Figs. 4 and 5, the models compared were lm(cytokine ~ age + sex + batchID) with lm(cytokine ~ sex + batchID) for age and lm(cytokine ~ sex + batchID) with lm(cytokine ~ age + batchID) for sex. P values of association tests were represented using ggplot2 3.2.1 in R 3.6.0. Adjusted P values on the box plots were computed with the wilcox.test function, correcting for multiple testing. Versions of the box plots and scatter plots made on the residuals after regression on age, sex and batchID are displayed on Extended Data Fig. 6d–f.

Effect size plots

Linear regression models were estimated in each stimulation using the log-transformed induced cytokine levels as outcome and age, sex, batchID, and the covariates of interest (for example, smoking status) as predictor variables. Interactions with the covariates of interest were considered when indicated. Exponential of the regression coefficient estimates, and their 95% confidence interval were plotted. When the covariate of interest is of categorical nature, each level of the variable is shown independently, considering the one specified as the reference. When the P value of the t-test testing if the coefficient estimate is different from zero is <0.01, it is plotted in black, otherwise it is plotted in grey. If the LRT comparing the regression with and without the variable of interest in the model with a Chi-square test is significant with a Benjamini–Yekutieli adjusted P value < 0.01, a red star is added above the effect size value and interval.

Cell subset association tests

Acquisition of flow cytometry data was detailed previously3. Association tests with log-transformed values of induced cytokines in each stimulation were performed as for the eCRF criteria association tests using log-transformed raw counts of cell subsets for each donor. P values of significance are indicated with asterisks as follows: *P < 0.05; **P < 0.01; ***P < 0.001.

DNA methylation association tests

CpG methylation profiles were generated using the Infinium MethylationEPIC BeadChip (Illumina) on genomic DNA treated with sodium bisulfite (Zymo Research) for 958 individuals of the Milieu Intérieur cohort as described19. Associations between the DNA methylation levels for the CpG sites located within 1 Mb of each cytokine gene transcription start site (TSS) and the levels of log-transformed induced cytokines in each stimulation, adjusting for age, sex, technical variable batchID and major immune cell population counts for each stimulation, were tested through LRT and identified CpG sites weakly associated with IL-17 in LPS (cg09582880), IL-2 in C. albicans (cg17850932 and cg25065535) and IL-8 in influenza (cg16468729) stimulations (FDR adjusted P value of LRT < 0.05) (Extended Data Fig. 10). These effects were mild compared with the identified associated genetic variants and the other associated variables identified in this study but are considered in the final global models (Fig. 5). CpG sites with DNA methylation levels that are directly affected by smoking have been selected as described19.

Heat maps showing effects of covariates

To test if the levels of some covariates, such as cell subsets, plasma proteins or DNA methylation probes, could modify the observed association of a variable, such as smoking status, with the log-transformed induced levels of cytokines in each stimulation, we compared with a LRT for each cytokine in each stimulation the model considering both the variable of interest and the covariate of interest (with interactions) plus the usual covariates age, sex and the technical covariate batchID, with a model containing all the covariates but not the variable of interest, followed by a Benjamini–Yekutieli multiple testing adjustment on the whole heat maps. For example, for Fig. 3a, the variable of interest was smoking status and the covariate of interest was each cell subset, so we compared lm(cytokine ~ smoking status × cell subset + age + sex + batchID) with lm(cytokine ~ cell subset + age + sex + batchID). When the LRT is significant, it means adding the variable of interest to the model improves the fit to the cytokine levels. For BMI-related variables, these do not improve the fit to both IL-2 and CXCL5 when T cell subsets are passed as covariates, showing that our approach is powered to identify cellular associations with effects on CXCL5 levels when present.

pQTL analyses

Protocols and quality-control filters for genome-wide SNP genotyping are detailed in ref. 3. In brief, the 1,000 Milieu Intérieur donors were genotyped on both the HumanOmniExpress-24 and the HumanExome-12 BeadChips (Illumina), which include 719,665 SNPs and 245,766 exonic SNPs, respectively. Average concordance rate between the two genotyping arrays was 99.99%. The final dataset included 732,341 high-quality polymorphic SNPs. After genotype imputation and quality-control filters, 11,395,554 SNPs were further filtered for minor allele frequencies > 5%, yielding a dataset composed of 1,000 donors and 5,699,237 SNPs for pQTL mapping. pQTL analyses were performed using the MatrixEQTL44 2.2 R package. SNPs were considered as cis-acting pQTLs if they were located within 1 Mb of the TSS of the gene, otherwise they were considered as trans-pQTLs. Protein expression data of the 1,000 individuals were log-transformed prior to pQTL analysis. Bonferonni correction for multiple testing (adjusted P value < 0.05) was applied to the results. We used detection thresholds of 10−3 for cis-pQTLs and 10−5 for trans-pQTLs and age, sex and the technical covariate batchID, as well as a main associated cell subset (monocytes for LPS, E. coli and C. albicans stimulations, CD4pos for SEB, CD8posEMRA for anti-CD3 + CD28, CD45pos for BCG, cDC3 for poly I:C, CD3pos for influenza, CD45pos for TNF, none for null, IL-1β and IFNγ) as covariates. SNPs associated with IFNγ in IFNγ stimulation, with IL-1β in IL-1β stimulation and with TNF in TNF stimulation were disregarded because each of these cytokines were respectively added to the TruCulture tubes and thus do not reflect endogenous secretion. To test the novelty of our pQTL results, we studied the SomaLogic plasma protein pQTL database20, for both cis– and trans-pQTLs listed in Table 1. This dataset allowed testing associations for CXCL5, IFNγ, IL-1β, IL-2, IL-6, IL-10 and IL-12a. Significant associations were identified between the variants rs352045 (cis), rs2393969 (trans), rs10822168 (trans) and the protein CXCL5 (respective FDR adjusted P = 3.02 × 10−10, P = 0.01 and P = 0.022), between rs35345753 (cis), rs62449491 (cis) and IL-6 (respective FDR adjusted P = 4.17 × 10−3 and P = 0.017) and between rs3775291 (trans) and IL-12A (FDR adjusted P = 0.049). To test associations for SNPs in linkage disequilibrium with the SNPs originally referenced in Table 1, we used a dataset of linkage disequilibrium from the ensemble database with similar ancestries as the Milieu Intérieur cohort (1000GENOMES:phase_3:CEU: Utah residents with Northern and Western European ancestry). To be inclusive, SNPs with a r2 > 0.2 were selected as associated alleles and underwent the same analysis as the one performed with the SNPs of reference. SNPs that came out as significant are those in linkage disequilibrium with the SNP referenced in Table 1 that is significantly associated with the corresponding protein. In addition, we also screened eQTL results. We compared our pQTL results with the eQTLs reported in our previous work based on nanostring transcriptomic data for common cytokines (CSF2, IFNγ, IL-1β, TNF, IL-2, IL-6, IL-8, IL-10, IL-12p70, IL-13, IL-17 and IL-23) and stimulations (E. coli, C. albicans, influenza, BCG, and SEB)4, which identified 2 main loci: the TLR1/6/10 locus and the CR1 locus. Association of variants referenced in Table 1 were found in the GTEx consortium database for rs1518110 and IL-10 (FDR adjusted P = 4.3 × 10−9), for rs352045 (cis) and CXCL5 (FDR adjusted P = 9.2 × 10−23) in whole blood and for rs143060887 (cis) and IL-12A (FDR adjusted P = 0.000076). Significant associations between rs352045 and CXCL5 and between rs1518110 and IL-10 were also found in the eQTLgen catalogue.

Computation of variance explained

For each stimulation, all the variables associated with at least one induced cytokine were considered to compute the percentage of each induced cytokine variance explained by each associated variable (q value < 0.05) with the R package relaimpo 2.2.3 and plotted with the R package ggplot2 3.2.1. The R2 contribution averaged over orderings among regressors was computed using the lmg type in the calc.relimp function of the relaimpo R package. For this analysis log-transformed induced cytokine data and log-transformed raw counts of cell subsets were used, as well as data for cis– and trans-associated SNPs and methylation probes. For each stimulation, all associated cis-pQTLs (rs352045, rs143060887, rs62449491 and rs1518110 for LPS; rs352045 for anti-CD3 + CD28, rs352045, rs62449491 and rs113845942 for poly I:C; rs352045 and rs35345753 for influenza), and trans-pQTLs (rs3764613 for LPS; rs4833095 for E. coli; rs11936050 for SEB; rs1801274 for anti-CD3 + CD28; rs4833095, rs72636686 and rs10013453 for BCG; rs10779330 and rs11117956 for C. albicans; rs3775291 and rs10822168 for poly I:C), as well as methylation probes (cg09582880 for LPS; cg25065535 for C. albicans, cg17850932 for poly I:C and cg16468729 for influenza) and a main associated cell subset (monocytes for LPS, E. coli and C. albicans stimulations, CD4pos for SEB, CD8posEMRA for anti-CD3 + CD28, CD45pos for BCG, cDC3 for poly I:C, CD3pos for influenza) were considered in the models.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.



Source link

By AUTHOR

Leave a Reply

Your email address will not be published. Required fields are marked *