We use two approaches to evaluate the effects of anthropogenic climate change on spring snowpack. First, we follow an attribution that uses the correlation between observed historical snowpack trends from several SWE data products and those from climate model simulations. Second, we take a data–model fusion approach in which we generate a large observation-based ensemble of historical snowpack and estimate what March SWE would have been in the absence of anthropogenically forced changes to cold-season temperature and precipitation. The former indicates forced changes to hemispheric snowpack and the latter indicates forced snow changes at hydrologically relevant scales.
Data
Table of Contents
Our ensemble of SWE observations consists of five long-term gridded datasets from the European Center for Medium-Range Weather Forecasting’s (ECMWF) ERA5-Land reanalysis45; the Japan Meteorological Agency’s JRA-55 reanalysis46; NASA’s MERRA-2 reanalysis47; the European Space Agency’s Snow-CCI, Version 2.048; and TerraClimate49. Products with a submonthly temporal resolution are averaged across all available March values. We focus on March because it is climatologically the month of maximum snow mass in the Northern Hemisphere20 and there is an extensive collection of in situ measurements taken during March against which we can benchmark our results. Because the satellite remote-sensing-based Snow-CCI product is masked over mountainous terrain, we follow the approach of ref. 20 and fill SWE values in mountainous cells with the mean value from the other four data sources. For non-mountainous grid cells, we use the unaltered Snow-CCI data. In addition, we use in situ SWE data from the Snowpack Telemetry Network (SNOTEL) network in the western USA50; the Canadian historical Snow Water Equivalent dataset (CanSWE)51; and the Northern Hemisphere Snow Water Equivalent (NH-SWE) dataset, a hemispheric dataset that converts far more abundant snow depth observations to SWE using a well validated model52. Only in situ observations with records for at least 35 years between 1981 and 2020 are retained, resulting in a set of 550 from SNOTEL, 341 from CanSWE and 2,119 from NH-SWE.
Gridded precipitation data come from the ECMWF’s ERA5 reanalysis53; the Global Precipitation Climatology Centre (GPCC)54; MERRA-247; Multi-Source Weighted-Ensemble Precipitation (MSWEP), Version 255; and TerraClimate49. Gridded temperature data come from Berkeley Earth (BEST)56; NOAA’s Climate Prediction Center (CPC) Global Unified Temperature57; ERA553; and MERRA-247. Daily gridded runoff data come from the ECMWF’s Global Flood Awareness System (GloFAS)58. Details of all datasets used in the analysis are given in Extended Data Table 1.
For the climate-model-based attribution and observation-based reconstructions, we regrid all data to 2° × 2° and 0.5° × 0.5° horizontal resolution, respectively, using conservative regridding. For all data except runoff, grid cells where March SWE is zero in more than half of all product years are masked out, as is Greenland.
We also use climate model output from 12 models that archived monthly SWE (‘snw’) data from the pre-industrial control (PIC), historical (HIST), historical-nat (HIST-NAT) and SSP2-4.5 CMIP6 experiments, as well as monthly air temperature (‘tas’) and precipitation (‘pr’) data from the HIST, HIST-NAT and SSP2-4.5 experiments27,28. All model output are regridded and masked as with the gridded observational data. Consistent with the Detection and Attribution Model Intercomparison Project (DAMIP) protocol, the HIST simulations, which end in 2014, are extended to 2020 using the SSP2-4.5 scenario59. For simplicity, ‘historical’ (HIST) will always refer to these extended time series. Model details are given in Extended Data Table 2.
To provide estimates of hydrologic quantities at decision-meaningful scales, we aggregate from the gridded to the river-basin scale using basin extents from the Global Runoff Data Center’s Major River Basins of the World database44. All empirically estimated grid-cell values of SWE, precipitation and runoff (in mm, or equivalently kg m−2) are multiplied by the grid cell area (in m2) before summing all grid cells within a basin to calculate basin-scale mass (in kg). Basin- and hemisphere-average temperatures are given by the area-weighted mean temperature of all snow-covered grid cells.
All estimates of basin population are calculated using the 2020 values from the 15 arcmin Gridded Population of the World, Version 4 (GPWv4) dataset from NASA’s Socioeconomic Data and Applications Center60.
Attributing SWE trends to anthropogenic forcing
Our hemispheric attribution approach tests whether the similarity between observed and climate-model-simulated forced SWE trends exceeds what could be possible from natural climate variability alone26,27,28,29. To evaluate the null hypothesis that the pattern of SWE trends in the HIST simulations could be the result of natural variability alone, we calculate the spatial pattern of trends in March SWE from 1981 to 2020 in each model’s HIST simulation and for every unique 40-year period from those same models’ unforced PIC simulations (for example, for a 500-year PIC simulation, we generate 461 maps of 40-year trends). All trends are calculated using the Theil–Sen estimator, a non-parametric technique for estimating a linear trend that is more robust to data that is skewed or contains outliers than trends calculated using ordinary least squares regression. Then, we calculate the Spearman (rank) correlation coefficient between the spatial maps of HIST and PIC trends to quantify the pattern similarity. The resulting empirical distribution of 78,601 correlations (background histogram on Fig. 2) represents the likelihood that the pattern in the forced historical simulations could have arisen from natural variability alone.
We quantify the similarity between the observed pattern of SWE trends and the model-estimated response to forcing by taking the Spearman spatial correlation between the map of trends from each observational product and the multimodel mean map from the HIST simulations (red symbols in Fig. 2e). For this analysis, the in situ observations are aggregated to the same 2° × 2° grid as the gridded observations and climate models by taking the mean trend of all stations within each grid cell (Fig. 2a). If the correlations between the observations and HIST simulations are greater than almost all of the correlations between the HIST and PIC simulations, we can reject the null hypothesis that the observed historical pattern could have arisen from natural variability alone and claim that a response to historical forcing is present in the observed pattern. Furthermore, if we cannot reject the null hypothesis using the correlations between the observations and HIST-NAT simulations with only solar and volcanic forcing, then it is unlikely that the observed pattern is the result of natural radiative forcing. Combined, these two lines of evidence strongly indicate that anthropogenic forcing is causing the observed patterns of SWE trends.
Observation-based snow reconstructions
As another means of attributing historical SWE change, and to better understand its patterns and drivers at scales more commensurate with the impacts of snow loss, we generate a large observation-based ensemble of historical March SWE with and without the effects of anthropogenic forcing. We do so by using the common random forest machine-learning algorithm, which fits randomized regression trees on bootstrapped samples of the data and averages their predictions together. The decision tree framework is particularly well suited to pick up nonlinear interactions, such as that between temperature and precipitation in the context of snow, as well as correlated predictors. The random forest algorithm has been applied to reconstruct a wide variety of biogeophysical variables that are shaped by temperature, precipitation and their interaction, including historical runoff61, crop yields62 and climate-induced species range shifts63. In each instance, the random forest model was found to significantly outperform both other machine-learning algorithms and more traditional approaches such as linear regression. In addition, for this particular application of reconstructing historical snowpack, the model imposes no prior assumptions about temperature thresholds for rain–snow partitioning or snowmelt, which can vary substantially in space and are themselves a contributor to uncertainty in modelled estimates of SWE42,64. We model March SWE as a function of average monthly temperature and cumulative monthly precipitation from the previous November to March:
$${{\rm{SWE}}}_{y,i}=f({T}_{y,11,i},{P}_{y,11,i},{T}_{y,12,i},{P}_{y,12,i},{T}_{y,1,i},{P}_{y,1,i},{T}_{y,2,i},{P}_{y,2,i},{T}_{y,3,i},{P}_{y,3,i})$$
(1)
where SWEy,i is average March SWE in water year (October–September) y at grid cell i, f is the random forest model, Ty,m,i is the average temperature in month m of water year y and grid cell i, and Py,m,i is the total precipitation in month m of water year y and grid cell i. We fit the model using the full spatiotemporal panel of 0.5° × 0.5° gridded data (that is, all grid-cell years from 1981 to 2020), then aggregate the predicted gridded values to the river-basin scale. We find that training a single model on the full panel of data offers two main advantages over training multiple models on more local data (for example, a model for each river basin). First is that the out-of-sample prediction skill of the full panel model is significantly higher in many highly populated mid-latitude basins of the western USA, western Europe and High Mountain Asia; local models are more skilful in fewer than 20% of basins, concentrated in sparsely populated high-latitude basins where the skill of the full panel model is already high (Extended Data Fig. 3). Second, training a single model on data from the entire hemisphere provides greater statistical stability of projections made with large perturbations to the input variables, such as adding an end-of-century climate change signal (Extended Data Fig. 8), which could exceed the support of local historical observations as records fall at an increasing rate65,66.
To adequately sample and quantify the observational uncertainty in snowpack, temperature and precipitation and create a sufficiently wide ensemble of possible SWE values, we repeat this procedure for all combinations of 6 SWE (5 gridded + in situ), 4 temperature and 5 precipitation datasets (Extended Data Table 1), providing 120 (6 × 4 × 5) estimates of basin-scale March SWE from 1981 to 2020. Our ensemble approach is motivated by two main considerations. First, it is difficult to determine what represents ‘true’ snowpack at hydrologically relevant scales. All methods of estimating spatially distributed snowpack (for example, remote sensing or reanalysis) have their intrinsic limitations that result in high levels of disagreement on snow mass, its variability and long-term trends5,6, as we show in Fig. 1. In situ measurements may represent truth at the locations at which they are collected, but are difficult to generalize, especially in complex terrain. As a result, using these point observations to adjudicate which gridded products (whose values represent averages over tens to tens of thousands of kilometres) lie closest to ‘truth’ is challenging. Given the inability to know the true state of snowpack or rigorously rule out any of its various gridded estimates, we choose to consider these observational products as equally valid estimates of truth in which we can attempt to identify shared responses. Second, the ensemble approach allows us to capture the structural uncertainty in how SWE responds to changes in temperature and precipitation, which are themselves subject to data uncertainties (Supplementary Fig. 2). Using all dataset combinations, we can sample and characterize uncertainty in SWE, temperature and precipitation and their covariance with one another. Such an approach has been used to estimate forced changes in components of the Earth system in which both the dependent and independent variables of interest are themselves uncertain32,67.
We compare the model-predicted time series generated through this process with the observational SWE product on which the model is trained, using the common R2 and RMSE metrics (Extended Data Fig. 4). In addition, as the emphasis of the analysis is on long-term trends in SWE, we compare the reconstructed trends with the observed trends over the study period and find that our models faithfully reproduce the spatial pattern and magnitude of the trends quite well, with correlations for all data products falling between 0.9 and 0.97 (Extended Data Fig. 3). Furthermore, the RMSE of the construction model predictions is comparable across the 10 coldest, 10 warmest and 20 ‘average’ years in the 1981–2020 period, indicating that the reconstructions are stable even in extreme years (Supplementary Fig. 4).
As an additional test of model skill, we use the model trained on only the gridded observational products to predict fully out-of-sample March SWE at 2,961 in situ sites from the SNOTEL, CanSWE and NH-SWE datasets. Our reconstructions are able to capture the interannual variability in in situ SWE quite well, with a median R2 across stations of 0.59 and an RMSE of around 22% (Extended Data Fig. 5). The reconstruction model predictions are similarly able to capture skillfully the long-term SWE trends at the in situ sites, with a pattern correlation of 0.72 (Extended Data Fig. 5). Finally and crucially, we confirm that there are no systematic trends in time of the bias of our reconstructions against the in situ observations (Supplementary Fig. 5), indicating that the reconstruction models are capturing the real-world rate of change of snowpack with high fidelity.
Counterfactual snowpack reconstructions
To identify where and how anthropogenic climate change has altered spring snowpack at impact-relevant scales, we combine our observation-based reconstructions, which are highly skilful at capturing historical SWE trends at impact-relevant scales, with climate model simulations that allow us to estimate forced changes to temperature and precipitation. Such a data–model fusion approach has been used to attribute anthropogenically forced changes to a wide variety of systems, both physical (for example, soil moisture8,31, wildfire30 and lake water storage32) and socioeconomic (for example, crop indemnities33 and climate damages34).
We calculate the temperature response to anthropogenic forcing as the difference between the 30-year rolling mean average temperature for each month in the HIST and HIST-NAT runs. For precipitation, we calculate the forced response as the percentage difference between 30-year rolling mean monthly precipitation in HIST versus HIST-NAT. By differencing experiments from the same model, we hope to limit the influence of model biases in climatological temperature and precipitation, as each model is benchmarked to its own climatology. Systematic biases in the model-simulated trends (for example, too rapid warming or wetting), however, could potentially lead to over- or under-estimating the forced response. To address this possibility, we evaluate model biases in the 1981–2020 trends in winter temperature and precipitation against observed trends by taking the difference between the CMIP6 HIST ensemble mean and the mean of the observational products for each quantity (Extended Data Fig. 6). To test whether the observed and modelled trends are consistent, we ask whether the observed trend falls within a plausible range of forcing plus internal variability, given as the 2.5–97.5th percentile range of the CMIP6 HIST trends. Only 1% (3%) of grid cells fall outside this range for temperature (precipitation), indicating that the climate models capture realistic historical climate trends.
Having estimated anthropogenically forced changes in gridded temperature and precipitation, we create counterfactual time series of temperature and precipitation by downscaling the output to the 0.5° × 0.5° resolution of the observational ensemble using conservative regridding and removing the forced response from each model realization from each gridded temperature and precipitation dataset. Temperature is adjusted by subtracting the forced change from the observations and precipitation is adjusted by the forced percentage change. Then, we use the reconstruction models trained on historical data (equation (1)) to predict March SWE using the counterfactual temperature and precipitation data, giving an estimate of what SWE would have been absent human-caused climate change. In addition, we isolate the effects of forced changes to temperature and precipitation individually by removing the forced response of only one or the other quantity from the observations, while leaving the other at its observed historical values. These gridded counterfactual reconstructions are then similarly aggregated to the basin scale and linear trends in SWE for these counterfactual scenarios are calculated using the Theil–Sen estimator. The effect of forced changes to temperature and precipitation individually (Fig. 3c,d) and in combination (Fig. 3e) is calculated as the difference between each historical trend and the counterfactual trends based on the same SWE–temperature–precipitation dataset combination. For each of the 120 reconstruction ensemble members, we have 101 estimates of the anthropogenic effect (one from each climate model realization; Extended Data Table 2), for a total of 12,120 estimates for each basin. Using only the first realization from each climate model, rather than all available runs, produces nearly identical results (Supplementary Fig. 6).
To further test the validity of this approach of using forced changes in temperature and precipitation to estimate counterfactual SWE, we repeat this protocol using exclusively climate model output in a ‘perfect model’ framework. For each model, we fit the empirical model described in equation (1) using SWE, temperature and precipitation data from the CMIP6 HIST simulations over the 1981–2020 period, rather than observations. Then, we use the random forest trained on these HIST data to predict counterfactual SWE using temperature and precipitation from the HIST-NAT simulations. Finally, we compare the forced (HIST minus HIST-NAT) trends calculated from the reconstruction approach to the ‘true’ forced trends calculated by using the direct SWE output from the HIST and HIST-NAT climate model experiments (Extended Data Fig. 9 and Supplementary Fig. 7). The strong similarity in the patterns of the ‘true’ and reconstructed forced responses indicates that using observations with forced changes in temperature and precipitation removed produces reasonable estimates of a forced SWE change.
Uncertainty quantification
The methods detailed above yield 12,120 estimates of the effect of climate change on March snowpack trends in each of 169 major river basins. Contributing to the spread of these estimates are four main sources of uncertainty: (1) uncertainty in the SWE data products on which the reconstructions are based; (2) uncertainty in the temperature and precipitation data products and their relationship with SWE; (3) differences in the forced response of temperature and precipitation due to structural differences between climate models; and (4) uncertainty due to internal climate variability in temperature and precipitation.
To quantify the magnitude of uncertainty introduced by each source, we calculate the standard deviation of forced SWE trends across a single dimension, holding all others at their mean. For instance, the uncertainty due to differences in model structure is given by the standard deviation of forced SWE trends across the 12 climate models (considering only the first realization from each), taking the mean across all SWE–temperature–precipitation dataset combinations.
To isolate the uncertainty from internal variability in temperature and precipitation, we use 50 pairs of HIST and HIST-NAT simulations from the MIROC6 model68, which differ in only their initial conditions. We take the standard deviation of forced SWE trends for all 50 realizations, taking the mean across all SWE, temperature and precipitation data product combinations.
Consistent with previous work in uncertainty partitioning19,41,69, we consider total uncertainty U in the forced SWE trend in basin b to be the sum of all four sources:
$${U}_{b}={S}_{b}+{{\rm{T}}{\rm{P}}}_{b}+{M}_{b}+{I}_{b}$$
(2)
where S is the uncertainty from SWE observations, TP is the uncertainty from temperature and precipitation observations, M is the uncertainty from model structure, and I is the uncertainty from internal variability. To assess which sources are the largest contributor to uncertainty in each basin, we consider the fractional uncertainty of each (for example, Sb/Ub gives the proportion of uncertainty in basin b attribution to SWE observational uncertainty). This fractional uncertainty is reported in Supplementary Fig. 12. For each source, we hatch out basins where the magnitude of uncertainty is insufficient to change the sign of the ensemble mean estimate of the forced SWE trend (that is, the signal-to-noise ratio is >1).
Temperature sensitivity of snowpack
To better understand the drivers of the heterogeneous spatial response of SWE and its potential future changes with further warming, we evaluate the temperature sensitivity of March SWE across a gradient of climatological winter temperatures in in situ observations, gridded observations, our basin-scale reconstructions and climate models. The marginal effect of an additional degree of warming, ∂SWE/∂T or β1, is calculated as the regression coefficient of March SWE on cold-season (November–March) temperature:
$${{\rm{SWE}}}_{y,i}={\beta }_{0,i}+{\beta }_{1,i}{T}_{y,i}$$
(3)
where SWEy,i is March SWE in unit i (in situ station, grid cell or river basin) in water year y and Ty,i is average cold-season temperature in that same unit. We run this regression at each in situ location, for all 20 combinations of gridded SWE and temperature products, for all 12 climate models (using the HIST simulations), and for all 120 basin-scale reconstructions. We then calculate the average and standard deviation of all of the coefficients for a given type of data (in situ, gridded observations, climate models and basin-scale reconstructions) in a rolling 5° temperature window to produce the curves in Fig. 4a. As such, the uncertainty estimate includes both parametric and data uncertainty.
Snowpack-driven runoff changes
To evaluate the differential water security implications of the human-caused snowpack declines, we quantify the spring (April–July) runoff change due to forced March SWE changes. We once again use the random forest algorithm, modelling April–July run-off as a function of March SWE and monthly temperature and precipitation from the previous November to July:
$${Q}_{y,b}=f({{\rm{SWE}}}_{y,b},{T}_{y,11,b},{P}_{y,11,b},{T}_{y,12,b},{P}_{y,12,b},\ldots ,{T}_{y,7,b},{P}_{y,7,b})$$
(4)
where Qy,b is April–July total runoff in water year (October–September) y in basin b, SWEy,b is average March SWE in water year y in basin b—unlike the SWE reconstructions, which were fit at the grid-cell level and aggregated to the basin scale, the runoff model is fit using basin-scale data—Ty,m,b is the area-weighted basin-average temperature in month m of water year y, and Py,m,b is the total basin-scale precipitation in month m of water year y. We fit this model using all 120 SWE–temperature–precipitation dataset combinations and the GloFAS runoff data (Extended Data Table 1). We evaluate model skill using the same methods as those used to validate our SWE reconstructions (Extended Data Fig. 10).
Analogous to the basin-scale March SWE attribution described above, the spring runoff change due to forced changes to snowpack is given by the difference between runoff estimated with historical SWE and runoff estimated with the effects of forced temperature and precipitation changes on SWE removed.
Future snowpack and runoff changes
To better understand the differential water-availability implications of future warming-driven SWE changes, we combine our statistical models and projections of future temperature and precipitation change to produce estimates of end-of-century (2070–2099) snowpack under the SSP2-4.5 forcing scenario. Specifically, we use a ‘delta’ method in which we adjust the observed climatology for each month by the difference between the end-of-century and historical (1981–2020) climate from the climate models. We additively adjust temperature and adjust precipitation by the percentage change between historical and future climate. We then make predictions of future climatological snowpack using the adjusted data and the model described in equation (1) trained on historical data.
Future runoff changes due to changes in SWE are calculated using equation (4), but substituting estimates of future SWE climatology for the historical, while keeping temperature and precipitation at their observed historical climatological values.
Snow dominance
To identify a priori the river basins considered to be snow dominant in Fig. 1, we use the ratio R of water year (October–September) cumulative snowfall to runoff1, calculated from ERA5-Land45. Basins where the average R is greater than 0.5 are considered to be snowmelt dominant.