Jul 6, 2022

### Ethics approval

Approvals for these studies were obtained from the Institutional Review Boards at the University of Rochester or the University of Texas at Austin. Participants in all studies provided informed consent or assent.

### Study registration and efforts to curb researcher degrees of freedom

All studies are registered on the Open Science Framework (study 1: https://osf.io/tgysd; study 2: https://osf.io/hb6vs, study 3: https://osf.io/x4a63; study 4: https://osf.io/fkgru; study 5: https://osf.io/9pfha; study 6: https://osf.io/mkqgf). Detailed descriptions of open science disclosures, links to study materials, analysis plans and deviations from analysis plans appear in the Supplementary Information. Studies 1, 2 and 4 were registered before analysing the data. Studies 3, 5 and 6 were registered after analysing the data. As explained in greater detail in the Supplementary Information, researcher degrees of freedom for Studies 3, 5 and 6 were constrained by following published and previously pre-registered standard operating procedures for TSST and daily diary studies29 (the focus on TPR, stroke volume and PEP in study 3 and the focus on the stressor intensity × treatment interaction in study 5), and by following the same analysis steps as the pre-registered studies (for example, the same core covariates and moderators whenever measured and the same conservative BCF modelling approach).

### Intervention overview

The intervention consisted of a single self-administered online session lasting approximately 30 min. Random assignment to the intervention or control condition occurred in real time via the web-based software Qualtrics, as participants were completing the online intervention materials. Simple random assignment was used, with equal probabilities of selection, but the actual observed proportions in treatment or control groups varied randomly across the six studies. Participants were blinded to the presence of different conditions, and teachers or others interacting with participants were blind to the intervention content and to condition assignment. Thus, the intervention experiments used a double-blind design throughout.

### Synergistic mindsets intervention

Second, the intervention targeted the stress-is-debilitating mindset50, which is the belief that stress is inherently negative and compromises performance, health and well-being; this mindset leads to the appraisal that a given stressor is uncontrollable and harmful. Counter to the stress-is-debilitating mindset, the intervention developed here introduced the stress-can-be-enhancing mindset50, which is the belief that stress can have beneficial effects on performance, health and well-being; this more adaptive belief system leads to the appraisal that stressors can be potentially helpful and controlled. The intervention explained that when people undergo challenges, they inevitably begin to experience stress, which can manifest in a racing heart, sweaty palms or possibly feelings of anxiety or worry. The intervention leads people to perceive those signals as information that the body is preparing to overcome the challenge; for instance, by providing more oxygenated blood to the brain and the muscles17. Thus, the stress response is framed as helpful for goal pursuit, not necessarily harmful. The intervention also argued that feelings of anxiety can be a sign that you have chosen a meaningful and ambitious set of goals to work on, and therefore can indicate a positive trajectory, not a negative one.

Notably, these two mindsets were conveyed synergistically, not independently, so that they built on one another. Participants were encouraged to view struggles as potentially positive and worth engaging with, and then they were invited to view inevitable stress coming from this engagement as a part of the body’s natural way to help them overcome the stressor.

These mindset messages were couched within a summary of scientific research on human performance and stress. Participants were not simply informed of these facts, but they were instead invited to engage with them, make them their own and plan how they could use them in the present and future. Participants heard stories from prior participants (older students in this case) who used these ideas to have success in important performance situations, and they also completed open-ended and expressive writing exercises. For instance, participants wrote about a time when they were worried about an upcoming stressor, and then later on they wrote advice for how someone else who might be undergoing a similar experience could use the two mindsets they learned about, which has been called a ‘saying-is-believing’ writing exercise51.

We defined adherence as completion of the last page of the intervention. In the studies in which participants were closely supervised by researchers (studies 3, 4 and 5), adherence was high (97% to 99%). In the studies in which the intervention was self-administered with no supervision, adherence was lower but still acceptable: 85%, 88% and 82% for studies 1, 2 and 6, respectively. Because we conducted intent-to-treat analyses, participants were retained in the analytic sample regardless of intervention completion status.

### Control group content

The control group intervention was also an online, self-administered activity lasting around 30 min. It was designed to be relatively indistinguishable from the intervention group by using similar visual layout, fonts, colours and images. The content was predominately from the control condition from a prior national growth mindset experiment4, which included basic information about the brain and human memory. It also involved open-ended writing activities and stories from older students. However, the control condition did not make any claims about the malleability of intelligence. To this standard content, we added basic information about the body’s stress response system (for example, the sympathetic and parasympathetic nervous system and the HPA axis) to control for the possibility that simply reflecting on stress and stress responses could account for the results. The latter content did not include any evaluations of whether stress responses are good or bad, or controllable or uncontrollable.

### Negative prior mindsets

At baseline, participants in all experiments except study 2 completed standard measures of negative event-focused mindsets (fixed mindset of intelligence; that is, “Your intelligence is something about you that you can’t change very much”)4 and response-focused mindsets (the stress-is-debilitating mindset21; that is, “The overall effect of stress on my life is negative”) (for both, 1 = strongly disagree, 6 = strongly agree). The items for each construct were combined into indices by taking their unweighted averages. Measures of internal consistency were all in the acceptable range (between 0.70 and 0.85). Means and standard deviations for each of the six studies are presented in Supplementary Table 6. In the primary Bayesian analyses for studies 3, 5, and 6, the two measures and their product were entered into the covariate and moderator function, and the machine-learning algorithm decided how best to use the mindset measures to optimize prediction or moderation. In the preliminary correlational analyses (Extended Data Table 1), we analysed the multiplicative term of the two, for simplicity.

### Analysis strategy

For all experimental analyses, we used intention-to-treat analyses, which means that data were analysed for all individuals who were randomized to condition and who provided outcome data, regardless of their fidelity to the intervention protocol. If participants were missing data on covariates, those data were imputed. This analysis is more conservative than analyses that drop participants with low fidelity, but it also better reflects real-world effect sizes.

Our research advanced a fully Bayesian regression approach called Bayesian causal forests and its extension targeted smooth Bayesian causal forests (BCF and tsBCF)﻿6,52,53 to calculate treatment effects and understand moderators of the treatment effects. A previous version of the BCF algorithm has won several open competitions for yielding honest and informative answers to questions about the complex, but systematic, ways in which a treatment’s effects are—or are not—heterogeneous, and it is designed to be quite conservative6. We used the existing single-level BCF method for studies 1, 2, and 6. The model is specified in equation (1):

$$\begin{array}{c}{y}_{ij}={\alpha }_{i}+\beta ({x}_{ij})+\tau ({w}_{ij}){z}_{i}+{\epsilon }_{ij}\end{array}$$

(1)

In studies 3 and 4, we updated the BCF method to apply to time-series data. See equation (2):

$${y}_{ij}={\alpha }_{j}+\,\beta ({x}_{j},{t}_{ij})+\tau ({w}_{ij},{t}_{ij}){z}_{j}+{\epsilon }_{ij}$$

(2)

In equations (1) and (2), y﻿ij﻿ is the outcome for adolescent i at time j, αj is the random intercept for each individual, xj is the vector of covariates that predict the outcome and could control for chance imbalances in random assignment, wij is the vector of potential treatment effect moderators, ﻿t is time (the tij term is omitted in all studies except studies 3 and 4), zj is the dichotomous treatment effect indicator for each individual, and ϵij is the error term. (Study 4 involved additional updates to allow for multi-arm comparisons that accommodate the four-cell design; see the Supplementary Information).

What makes BCF unique, and well-suited for this application, is that both β(.) and τ(.) are non-linear functions that take a ‘sum-of-trees’ representation, and which are estimated using standard BART machine-learning tools6,54,55. This frees researchers from making arbitrary decisions about which covariates to include, what their functional form should be and how or whether covariates should interact. Notably, BCF uses conservative prior distributions, especially for the moderator function, to shrink towards homogeneity and to simpler functions, avoiding over-fitting. The data are used once—to move from the prior to the posterior distribution—and all analyses then summarize draws from the posterior.

The BCF approach contrasts with the classical method, which involves re-fitting the model many times to estimate simple effects or to conduct robustness analyses with different specifications. The BCF approach, therefore, reduces researcher degrees of freedom, mitigating the risk of false discoveries and other spurious findings. In this research we focused on estimation of treatment effects (that is, how large the effect is) and not null hypothesis testing (that is, whether it is ‘significant’ or not) because of well-known problems with the all-or-nothing thinking inherent in the null hypothesis significance test56. Following convention57, we reported the ATEs and the CATEs with the associated 10th and 90th percentiles from the posterior distributions (see the Figures for the 2.5th and 97.5th percentiles). When the pre-analysis plan called for it (in study 4), we report the exact posterior probabilities of a difference in effects.

The covariates included in each study are listed in Supplementary Table 5. The core covariates and moderators were: the prior mindset measures (fixed mindset and stress-is-debilitating mindsets), sex and perceived social stress, as pre-registered (https://osf.io/tgysd). When available, other covariates were added as well: age, race or ethnicity, self-esteem, test anxiety, social class and personality. Justifications for each covariate appear in Supplementary Table 5.

### Effect size calculations

Unless otherwise noted, effects are standardized by the pooled s.d.

### Manipulation checks (all studies)

The intervention reduced negative mindset beliefs relative to controls (four items, including “Stress stops me from learning and growing” and “The effects of stress are bad and I should avoid them”; 1 = strongly disagree, 6 = strongly agree). BCF analyses revealed lower levels of negative mindsets in the synergistic mindsets intervention condition at post-test compared to the neutral control condition, signifying a successful manipulation check: study 1: ATE = −0.28 s.d. [10th percentile: −0.43, 90th percentile: −0.16]; study 2: −0.49 s.d. [−0.73, −0.24]; study 3: −0.50 s.d. [−0.89, −0.14]; study 4: −0.54 s.d. [−0.75, −0.33]; study 5: −0.26 s.d. [−0.61, 0.03]; study 6: −0.56 s.d. [−0.71, −0.40]. The two field experiments with high schoolers (studies 1 and 5) had smaller manipulation check effects that were more imprecise than the others (studies 2, 3, 4 and 6). This was expected because the former studies were conducted in naturalistic school settings that tend to produce noisier data.

### Study 1

#### Sample size determination

Sample size was planned to have sufficient power to detect a treatment effect in a field experiment of 0.10 s.d. or greater, with 0.10 s.d. being the minimum effect size that we would interpret as meaningful for a study focused on immediate post-test self-reports. We worked with our data collection partner, the Character Lab Research Network (CLRN) (https://characterlab.org/research-network/), to recruit as close to 3,000 participants as possible in a single semester. The final sample size was determined by the logistical constraints of data collection during the COVID-19 pandemic and by CLRN’s data availability.

#### Participants

Participants were from a large, heterogeneous sample of adolescents who were evenly distributed across grades 8 to 12 in 35 public schools in the ﻿United States (13 years old: 16%; 14 years old: 20%; 15 years old: 20%; 16 years old: 21%; 17 years old: 18%; 18 years old: 5%). The schools were sampled from a stratum of large, diverse, suburban and urban public schools in the southeast ﻿United States. Forty-nine per cent of adolescents identified as male, 49% as female and 2% as gender non-binary. Participants were racially and ethnically diverse (participants could indicate multiple racial or ethnic identities so numbers exceed 100%): Black: 20%; Latinx: 39%; white: 68%; Asian: 7%. Participants were also socioeconomically diverse: 40% received free or reduced-price lunch, an indicator of low family income. Therefore, study 1 provided a test of the hypothesis that the intervention could be widely disseminated and effectively change beliefs and appraisals in a large and diverse sample of adolescents. Even so, the sample was not strictly representative because random sampling was not used to recruit the CLRN sample.

#### Procedure

Participants were recruited by CLRN (https://characterlab.org/research-network/), which administers roughly 45-min online survey experiments three times per year to a large panel of adolescents in the 6th to the 12th grade. Researchers program their studies using the Qualtrics platform and students self-administer the materials at an appointed time. Data collection continued during the modified instructional settings of autumn 2020. We note that all measures had to be short so as to keep the respondent burden low and fit within the required time limit for CLRN studies. Thus, the trade-off in study 1, when achieving scale and reaching a large adolescent population during the COVID-19 pandemic, was estimating potentially weaker effect sizes owing to greater statistical noise.

#### Measures

The end of the study also included an additional behavioural intention measure: a choice between an ‘easy review’ extra credit assignment and a ‘hard challenge’ assignment58,59. The intervention increased the rate of choosing the challenging assignment by 0.11 s.d. [0.028, 0.200]. We expected the treatment to increase engagement with stressors because it leads to the appraisal that they are opportunities for learning and growth.

### Study 2

#### Sample size determination

All students in an introductory social science course in autumn 2019 were invited to complete the intervention or control materials in return for a small amount of course credit. Sample size was set by the response rate.

#### Participants

Participants were predominately first-year college students attending a selective public university in the ﻿United States that drew from a wide range of socioeconomic status groups: 17 years old: 3%; 18 years old: 49%; 19 years old: 29%; 20 years old: 11%: 21 or older: 8%. Sixty-four per cent identified as female and the rest as male; 39% had mothers who did not have a four-year college degree or higher (an indicator of lower socioeconomic status), and 59% identified as lower class, lower middle class or middle class (versus upper middle or upper class).

#### Procedure

This experiment was conducted in a social science course in which students completed timed, challenging quizzes at the beginning of each class meeting, twice per week. In the second week of the semester, soon before the first graded quiz, students were invited to complete the intervention (or control) materials on their own time using their own computer in return for course credit, and 83% of invited students did so. The effects of the intervention were assessed through students’ appraisals of the first graded quiz of the semester one to three days later. The appraisal items were necessarily short because they were embedded at the end of the assignment and students completed them during class before the lecture. The appraisal items were then administered a second time after another quiz, which occurred three to four weeks after intervention.

#### Measures

Participants rated their agreement or disagreement with the statements “I felt like my body’s stress responses hurt my performance on today’s benchmark” (1 = strongly disagree, 5 = strongly agree) and “I felt like my body’s stress responses helped my performance on today’s benchmark” (5 = strongly disagree, 1 = =strongly agree). The two ratings were averaged to provide an appraisal index, with higher values corresponding to more negative appraisals60.

### Study 3

#### Sample size determination

An a priori power analysis was used to determine sample size. Previous stress research that assessed cardiovascular responses in laboratory-based stress induction paradigms produced medium to large effect sizes (for example, range: d = 0.59 to d = 1.44. Based on a standard medium effect size, at the low end of this range (d = 0.50), with a two-tailed hypothesis, G*Power indicated that 64 participants per condition (that is, 128 total participants) would be necessary to achieve a target power level of 0.80 to test for basic effects of the treatment using frequentist methods. In anticipation of potential data loss, we determined a priori that we would oversample by 20%. Data collection was terminated the week after more than 150 participants had been enrolled in the study and provided valid data.

#### Daily negative self-regard

On each daily survey, students reported daily negative self-regard, an internalizing symptom, operationalized as overall positive or negative feelings about themselves (“Overall, how good or bad did you feel about yourself today?”; 1 = extremely good, 7 = extremely bad). This was a single-item measure owing to the limited respondent time.

#### Cortisol

Acute cortisol responses follow a specific time course (peak levels occur around 30 min after stress onset). However, the diary survey stressors were not calibrated to identify the timing of specific events, so the two sources of information could not be yoked. Indeed, as noted in the main text, there was no association between the intensity of stressors reported and cortisol in the control condition (unlike self-regard and stressor intensity). In addition, levels of cortisol have a diurnal cycle (peak levels at wakening, rapid declines within the first waking hours and nadir at the end of the day). Waking levels and diurnal slopes can map onto well-being, stress coping and health70. Because all sampling was conducted during the school day, waking levels and diurnal cortisol slopes could not be accurately and precisely measured. The lack of time-course specificity and diurnal cycle data means that our reported effect sizes for global cortisol levels are likely to be conservative because noise in the data attenuates effect sizes.

The research team obtained students’ transcripts from schools after credits were recorded in the spring of 2020. Credit attainment (that is, whether students passed the course) in core classes (mathematics, science, social studies and English or language arts) were coded. An ‘on-track’ index71 was computed for each student (1 = students passed all four of their core classes; 0 = they did not). In addition, following a previous growth mindset intervention study4, a STEM course on-track indicator was computed (1 = passed mathematics and science; 0 = they did not) as was a non-STEM course on-track indicator (1 = passed social studies and English or language arts; 0 = they did not).

### Study 6

#### Sample size determination

We recruited all students possible from an entire social science class in the spring of 2020, which, we would later learn, was a unique cohort for examining stress during the COVID-19 lockdowns. A minimum of 278 students would be needed to have a greater than 80% chance of detecting a directional effect on anxiety of 0.3 s.d. with a conventional linear model analysis, and more students than this participated.

#### Participants, procedure and measures

Data were collected during the spring semester of 2020. Participants were from the same university as study 2 and the same intervention procedures were followed. (Owing to a difference in data collection procedures relative to study 2, quiz appraisal data could not be collected in study 5). The intervention was delivered at the end of January 2020. In March 2020, students were sent home owing to COVID-19 quarantines. In mid-April 2020, students completed the Generalized Anxiety Disorder-7 (GAD-7)38 as a part of a class activity focused on psychopathology. The GAD-7 asks “How often have you been bothered by the following over the past 2 weeks?” and offers several symptoms, including “Feeling nervous, anxious, or on edge,” “Not being able to stop or control worrying,” and “Feeling afraid as if something awful might happen.” Each symptom is rated on a scale from 0 (“Not at all”) to 3 (“Nearly every day”). The seven items were summed, producing an overall score ranging from 0 to 21, with higher values corresponding to higher levels of general anxiety symptoms.

