Data
Table of Contents
We compiled ten datasets that describe population abundances through time^{2,3,4,5,6,7,8,9,10,11,41}. These datasets represent some of the most influential in ecology and conservation biology, forming a basis for influential reports such as the Living Planet^{15}, as well as a series of highprofile and highly cited publications (see Supplementary Table 2 for a full summary). For each dataset, we extracted the population abundance estimates, the accompanying timestamps, the scientific names of the species, the name of the site (location) where the population was sampled and any site coordinates. For datasets to be included, they had to be open access, and contain multiple abundance time series for a selection of species and locations. Although these datasets are vital in biodiversity science, many of the datasets are prone to biases (for example, lacking tropical representation, and contain few plant and invertebrate species). The datasets have been compiled from a variety of methods, realms and systems, covering a vast array of spatial, taxonomic and temporal scales. Further, there is probably some overlap in data between datasets where population time series may occur in more than one dataset. We take no action to correct or acknowledge these biases and features, as our analysis is designed to show how model choice can have a substantial influence on inference in a variety of datasets, rather than to derive a consensus trend across datasets.
To account for correlative nonindependence introduced by species’ shared evolutionary past, we extracted a phylogeny for each dataset. We used synthetic trees from the Open Tree of Life^{42,43} and estimated missing branch lengths using Grafen’s approach^{44} from the compute.brlen function in the R package ape^{45}. The Open Tree of Life identified a phylogeny for 80% of species (n = 23,871); all other species were removed from the analysis. For studies with the overall aim of assessing biodiversity change, removing species could be problematic, as the collective trend would not be representative of all species. However, in our case, in which the aim is to assess how collective trend inference changes under a variety of modelling approaches, trimming the data to species with an accompanying phylogeny has no impact on our conclusions. Regardless, in sensitivity analyses in the Supplementary Information section entitled Phylogeny, we investigated this tradeoff, and found that more than 1,000 species would have to be excluded from the data if higherquality phylogenies were used (Supplementary Table 3). Further, inference is generally consistent across the datasets regardless of phylogeny quality (Supplementary Figs. 6 and 7).
After removing species not present in the Open Tree of Life topology, we further trimmed the data to include only higherquality time series, removing the following: time series that contained zeros (which we considered extreme cases of extinctions or recolonizations) and time series with missing abundance values for a given year throughout the sampling duration (that is, we required consecutive abundance estimates). In all datasets except the two smallest ones—Atlantic reef fishes and Large carnivores—we further trimmed the datasets to keep only time series that had greater than or equal to the median number of abundance observations (that is, including the longest 50% of time series in each dataset). In some cases, this cutoff was not sufficient as the median number of observations in the time series equalled two. With only two abundance observations, trends are highly exposed to error purely driven by random fluctuations in abundance^{19}. To partially address this issue, we imposed a further cutoff on these datasets, ensuring that each time series had at least five observations. These datasets are characterized in Supplementary Table 2. With our trimmed dataset, we derived a mean abundance in each year (in cases for which there were more than one observation per year) for each time series. In some datasets, there is a possibility that some species will have overlapping populations measured at different scales (for example, a national trend and a regional trend).
Modelling
We explored which models have been used in the literature to infer abundance change. We focused on studies that characterized the average change in abundances over time, rather than studies assessing how many species are declining or increasing, as this avoids discretizing a numeric value; that is, we avoid having to define what change is necessary to be classified as a decline. To evaluate the diversity of approaches used to model abundance change over time in multispecies and/or multilocation datasets, we conducted a literature search in a selection of highprofile ecology journals over the past 13 years (Supplementary Information). Our search identified 282 research papers, 28 of which described approaches to model abundance change across multispecies and/or multilocation datasets. A further 16 methods were not detected in the literature search but were known priori to the authorship team, resulting in 44 different studies and/or methods. Models of abundance change varied in complexity, each containing their own assumptions, with no clear ‘standard’ approach for deriving the rate of change in abundance. However, across the 44 studies and/or methods that we compiled (Supplementary Table 1), five general approaches were present, as follows.

1.
Abundance average: The simplest models derive an average or total abundance across all species or sites in a given year, and then regress average abundance against time. This approach fails to recognize any of the hierarchical structures in the data.

2.
Trend average: A slightly more complex model, which estimates abundance change per population by fitting a series of loglinear modes of abundance against year; averaging over the extracted slope coefficients. This approach fails to propagate uncertainty in average rates of change of each population, and ignores the implicit spatial and taxonomic structures in the data, inducing pseudoreplication.

3.
Random intercept: Some studies partially address the aforementioned pseudoreplication (for example, certain sites or species having multiple estimates) with mixed models, regressing loglinear abundance against year across all populations, while specifying that populations belong to a site and/or species. However, often this mixed model structure extends only to random intercepts, which only acknowledge that mean abundance can differ between sites, species and locations, and assumes that the abundance trends will all remain the same. This is a particularly common approach among the indicators from population monitoring schemes that shape policy^{46}.

4.
Random slope: In the scientific literature, it is common to use more complex models, with a similar structure to the random intercept model, but now also capturing the differences in abundance trends (not only mean abundances) across populations, sites and species with random slope terms.

5.
Decomposition: This is the rarest of the approaches and deviates from the linear mixed model methods. Instead, the decomposition approach involves fitting generalized additive models through each time series to smooth abundance estimates and reduce noise. The smoothed time series is then decomposed into a time series of rates of change (or λ values), which are then averaged across species and biomes to derive estimates of the average change in abundance for each year across all of the time series.
The most common approaches were the random intercept and random slope models, used 19 and 23 times, respectively. The abundance average, trend average and decomposition approaches were rare, used just five, two and three times, respectively. Not all studies adopted just one approach, sometimes splitting their model into two steps (for example, using a random intercept model to estimate a given species trend across locations, which could then be aggregated across broader taxonomies with a random slope model). Further, all approaches regularly failed to account for temporal, spatial and phylogenetic structures (that is, closely related species are likely to have more similar trends than distant species), with only 14 of the 44 approaches accounting for temporal autocorrelation. Studies that accounted for phylogenetic or spatial covariance were comparably rarer—included in just six and three studies, respectively. Four studies attempted to account for two sources of correlative nonindependence in their models, by first deriving population trends while accounting for temporal autocorrelation of abundances in time series, and then using phylogenetic least squares to aggregate these trends. However, no study captured more than one of these covariances simultaneously (for example, spatiotemporal models). Further, no study attempted to account for all three sources of correlative nonindependence.
Given the apparent rarity of the abundance average, trend average and decomposition approaches in the literature, we focus on understanding how the dominant approaches (that is, the random intercept and random slope models) compare to our newly developed correlated effect model. Full model equations are available in the Supplementary Information.
Model 1 (random intercept)
In model 1, we fit a linear mixedeffect model between the natural logarithm of abundance and year, with five random intercepts: population (the unique time series), site (unique locations), region (broader spatial category to nest sites; flexibly determined on the basis of the spatial extent of the dataset), species (unique species) and genus (broader taxonomic category to nest species; measured as the parent node to the species tip). In the model, we do not specify any nesting of the population in the site and species random intercepts as the hierarchical structure of the data is poorly defined (for example, although populations always occur in a species and site, some species are nested in sites, and some sites are nested in species, creating a crossed random effect design). Model 1 assumes all populations, sites, regions, species and genera have the same trend in abundance.
Model 2 (random slope)
In model 2, we develop a linear mixedeffect model, in which we regress the natural logarithm of abundance against year, including population, site, region, species and genus all as random slopes. This builds on the random intercept model by allowing abundance–year slope coefficients to vary for each category in each random slope term (for example, each species can have a different slope)—not simply differing intercepts as in model 1. Unlike in model 1, we centre the year and abundances of each population time series at zero (for example, subtracting each year by the mean year in each population, and subtracting the log of each abundance by the mean log abundance value in each population). This centring fixes the y and x intercepts at zero for each slope, and is a convenient solution to account for variance captured by the random intercepts without increasing the number of parameters. To all intents and purposes, assuming that the objective is to characterize the abundance–year coefficient, the random slope model is equivalent to a model with random intercepts and slopes.
Model 3 (correlated effect)
Model 3 is structurally similar to model 2, but accounts for correlative nonindependence. For temporal nonindependence, we model the population level time series with a discrete firstorder autoregressive temporal process, which assumes that sequential abundance observations in a time series will be more similar. To capture the spatial and phylogenetic correlative nonindependence, we focus on nonindependence across timeseries trends (instead of abundance observations), assuming that trends in population abundances through time will be more similar in neighbouring sites and more closely related species. In models 1 and 2, we try to capture this nonindependence with grouping categories (genus and region). However, in the correlated effect model, we more explicitly describe shared correlations between every species and site by specifying the covariance structures of our site and species random slopes. The site covariance matrix was derived by taking each site’s coordinates and estimating the pairwise Haversine (spherical) distance between the sites (for example, how far is every site from every other site). This was then converted into a matrix, normalized between 0 and 1, with values close to 1 indicating neighbouring sites, whereas values approaching 0 indicate distant sites. The species covariance matrix was derived by converting the phylogeny into a variance–covariance matrix, in which phylogenetic branch lengths describe the evolutionary distance between species.
All models were developed using Bayesian Integrated Nested Laplace Approximation (INLA)^{47} in R v.4.0.5 (ref. ^{48}). We describe our model priors in the Supplementary Information section entitled Priors and validate our model assumptions in the Supplementary Information section entitled Assumptions (Supplementary Figs. 1–5). We also conduct additional sensitivity analyses exploring how phylogeny quality and how the addition of each correlative component (space, time or phylogeny) affect inference (see the Supplementary Information sections entitled Phylogeny and Component importance). We compiled the data using the following R packages: tidyverse^{49}, countrycode^{50}, janitor^{51}, here^{52} and arrow^{53}_{.} Figures were produced using the following R packages: ggplot2^{54}, ggtree^{55} and ape^{45}.
Outputs
Measuring nonindependence
We assess whether correlative terms capture a meaningful proportion of variance in the data, by dividing the proportion of variance captured by the correlated slopes (for example, spatial covariance) by the combination of the variance captured by the correlated and independent slopes (for example, spatial covariance + site random slope + region random slope). This was carried out separately for the spatial and phylogenetic terms. As the spatial and phylogenetic components each contain three terms (an independent species or location slope, an independent genera or region slope and a correlated species or location slope), a proportional variance captured of 0.33 would indicate that the correlative slope captures an equal proportion of variance compared to the two independent slopes. A value greater than 0.33 indicates that correlative slopes account for more variation than independent random slopes. We measure temporal nonindependence as the degree of correlation between sequential abundances (ρ).
Differing inference between the models
Using the mean and 50% credible intervals of the global trend (overall abundance–time coefficients), we display abundance projections for each model in each dataset. These projections are based on an arbitrary baseline abundance of 100, set at the first year of available data in each dataset, and this abundance would change according to the overall coefficients and credible intervals. For instance, with a 1% annual rate of change, an abundance in year zero of 100 would become 101 in year 1, and 164 in year 50. The purpose of these projections is to showcase varying abundance trajectories under different model complexities. We also report the fold change in the collective trend s.d. of the correlated effect model, relative to the random intercept and random slope models. This involved regressing fold change against category (for example, correlated effect versus random intercept) in a linear model. We report the mean fold change and 95% CIs. Model outputs are reported in Supplementary Tables 4 and 5.
Predictive performance
We assess the predictive performance of the different models by determining their ability to predict final observations in time series, and their ability to predict population trends of a given species in a given location. To test the predictive accuracy for the final observation in the time series, we removed the final observation from half of the time series in each dataset and predicted the missing values using each of the three models on the log scale. We report the percentage error (PE), a metric describing the median of the absolute percentage difference between predicted and observed values (for example, with a 5% error, an abundance on the log scale of 1 would become 1.05). This is calculated by finding the absolute difference between the true value and prediction, divided by the true value, before being multiplied by 100 and converted to an absolute error.
To test the accuracy of the population trend prediction, we conducted leaveoneout crossvalidation, removing one population time series (or trend) from each dataset, and estimating the missing trend using the random slope and correlated effect models. We solely removed population time series with a trend not overlapping zero at 95% credible intervals (that is, populations changing significantly), to test our ability to identify which populations are changing or not. We repeated this process 50 times per dataset and compare the predicted trends to trends from a simplified correlated effect model, which contains a populationlevel slope and accounts for temporal autocorrelation, but does not include the spatial and phylogenetic correlation terms or any of the hierarchical terms, which have no influence on the required populationlevel inference. We measured trendpredictive performance using the same approaches as above (PE). In the random slope model, the population trend coefficients were derived by adding the species, location, genus, region and overall coefficients together, meaning that missing population trends can still be informed by other hierarchical information. For the correlated effect model, the population trend is informed by the phylogenetic and spatial variance–covariance matrices, as well as all hierarchical information in the random slope model. Prediction accuracy for each dataset is reported in Supplementary Tables 6 and 7.
Phylogenetic and spatial distribution of abundance change
To plot abundance change across a phylogeny, we derived specieslevel rates of change in abundance from the taxonomic (species and genera) and phylogenetic random effects. We incorporate uncertainty in specieslevel trend prediction by estimating the CI threshold by which a species would be considered to have increased or decreased. For instance, a negative trend at an 80% CI threshold would be considered stronger evidence of decline than a negative trend at a 20% interval threshold. We derive four asymptotic CI thresholds (20%, 40%, 60% and 80%) using the uncertainty (s.d.) from the phylogenetic random effect and a series of zscores (0.25, 0.52, 0.84 and 1.28).
To plot abundance change across space, we focus solely on one abundant and iconic species, the American robin T. migratorius, as sitelevel trend variability is high at the community level (that is, community trends across space are rarely significant). To produce abundance change predictions for the American robin across space, we expanded the BioTIME spatial Haversine distance matrix (describing distances between each time series) by supplementing it with a gridded extent covering North America. This new grid had a latitudinal range of 20 to 60 and 1° spacing (for example, 15, 16 and so on), and longitudinal range of −130 to −60 with 1° spacing. This new matrix allows us to estimate expected covariance (similarity) in abundance trends for any pair of 1° cells across North America. We then derived the average rate of change in abundance across all hierarchical and correlative random effects, and used populationlevel trend uncertainty to derive the selection of CI thresholds described above.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.