Wildlife ecological risk assessment in the 21st century: Promising technologies to assess toxicological effects
Editor's Note: This article is part of the special series from the SETAC workshop “Wildlife Risk Assessment in the 21st Century: Integrating Advancements in Ecology, Toxicology, and Conservation.” The series presents contributions from a multidisciplinary, multistakeholder team providing examples of applications of emerging science focused on improving processes and estimates of risk for assessments of chemical exposures for terrestrial wildlife. Examples are considered relative to applications within an expanding risk assessment paradigm where improvements are suggested in decision-making and bridging various levels of biological organization.
Abstract
Despite advances in toxicity testing and the development of new approach methodologies (NAMs) for hazard assessment, the ecological risk assessment (ERA) framework for terrestrial wildlife (i.e., air-breathing amphibians, reptiles, birds, and mammals) has remained unchanged for decades. While survival, growth, and reproductive endpoints derived from whole-animal toxicity tests are central to hazard assessment, nonstandard measures of biological effects at multiple levels of biological organization (e.g., molecular, cellular, tissue, organ, organism, population, community, ecosystem) have the potential to enhance the relevance of prospective and retrospective wildlife ERAs. Other factors (e.g., indirect effects of contaminants on food supplies and infectious disease processes) are influenced by toxicants at individual, population, and community levels, and need to be factored into chemically based risk assessments to enhance the “eco” component of ERAs. Regulatory and logistical challenges often relegate such nonstandard endpoints and indirect effects to postregistration evaluations of pesticides and industrial chemicals and contaminated site evaluations. While NAMs are being developed, to date, their applications in ERAs focused on wildlife have been limited. No single magic tool or model will address all uncertainties in hazard assessment. Modernizing wildlife ERAs will likely entail combinations of laboratory- and field-derived data at multiple levels of biological organization, knowledge collection solutions (e.g., systematic review, adverse outcome pathway frameworks), and inferential methods that facilitate integrations and risk estimations focused on species, populations, interspecific extrapolations, and ecosystem services modeling, with less dependence on whole-animal data and simple hazard ratios. Integr Environ Assess Manag 2024;20:725–748. © 2023 His Majesty the King in Right of Canada and The Authors. Integrated Environmental Assessment and Management published by Wiley Periodicals LLC on behalf of Society of Environmental Toxicology & Chemistry (SETAC). Reproduced with the permission of the Minister of Environment and Climate Change Canada. This article has been contributed to by US Government employees and their work is in the public domain in the USA.
INTRODUCTION
Ecological risk assessment (ERA) “is the process that evaluates the likelihood that adverse ecological effects may occur or are occurring as a result of exposure to one or more stressors” (United States Environmental Protection Agency [USEPA], 1992). In the United States, ecological risk refers to nonhuman organisms, populations, and ecosystems, while in Europe, the term environmental risk encompasses ecological considerations (Suter, 2007). The ERA process is widely used for prospective and predictive evaluations (i.e., preregistration, premarket assessments) to support decisions on chemical use and in retrospective evaluations (i.e., postmarket assessments) that focus on hazards and remediation of chemical spills or contaminated sites to protect natural resources and the environment (Suter, 2007). In addition to ERA, environmental management decisions can be based on other such tools (e.g., human health risk assessment, environmental impact assessment), often with a regulatory bias toward the protection of human health. The ERA process reflects its history (Suter, 2008), with concepts embraced by regulatory entities to address regulations and laws in the United States, Canada, Europe, and elsewhere. Central to this paradigm are characterizations of exposures and ecological effects. Exposures of biota to naturally occurring and synthetic chemicals, accidentally released or purposefully used in the environment, may evoke toxicologic responses at the molecular through organism levels that may result in adverse effects on populations, communities, and ecosystems.
Adverse effects related to survival, growth, and reproduction (endpoints at the organism level) of representative species have been used in prospective and retrospective ERAs. The choice of those endpoints was likely derived from basic demographic theory (i.e., survival, growth, reproduction; Fisher, 1930; Lotka, 1924) and population-level processes (growth, decline) (L. Barnthouse, personal communication, June 10, 2021). A founding document on the use of these toxicity endpoints was an evaluation and consensus statement developed at a 1977 Pellston workshop on toxicity test methods as predictive tools for hazard evaluation (Macek et al., 1978; G. Suter, personal communication, June 10, 2021). Using various criteria (i.e., ecological significance, scientific and legal defensibility, availability of routine methods, predictive utility, general applicability, simplicity, cost), 15 types of toxicity test systems were evaluated, with traditional hazard endpoints (i.e., acute lethality, cumulative mortality and growth, and chronic life cycle effects including reproduction) ranking the highest. Tests involving behavior, physiologic, and biochemical endpoints scored uniformly lower, and in vitro cell culture tests scored the lowest, but were viewed as being valuable in studying chemical metabolism and as a screening tool.
In the 1970s and the 1980s, toxicity tests and field trials were standardized, and exposure models were improved by the USEPA (1982, 1988). In the 1990s, a new ERA framework emphasized problem formulation and assessment endpoints with continued use of hazard quotients (e.g., daily oral exposure divided by toxicity threshold or reference value) for terrestrial wildlife (i.e., air-breathing amphibians, reptiles, birds, and mammals) (USEPA, 1992). In Europe, ERAs for wild birds and mammals were similarly based on adverse effects including reproductive endpoints and toxicity exposure ratios (European Food Safety Authority [EFSA], 2009). By the 2000s, emphasis was placed on uncertainty and reporting of risk in probabilities rather than deterministic hazard quotients, but were limited to exposure profiles. While it has become feasible to describe toxicity with probability statements, advances in assessing effects and ecological processes at relevant geographic scales have only slowly been applied in regulatory frameworks. In contrast, refinement options for exposure assessments have been readily used in wildlife ERAs to evaluate scenarios (EFSA, 2009, 2023). The dissemination of Toxicity Testing in the 21st Century: A Vision and Strategy (National Research Council, 2007) accelerated the development and use of in vitro, in chemico, and in silico methods as more efficient, predictive, and ways to inform hazard and risk assessments, but emphasis has been on human health. At that time, Registration, Evaluation, Authorisation and Restriction of Chemicals (REACH) legislation specified that vertebrate testing should be used only as a last resort, and more recently, the USEPA (2021a) announced a goal of reducing use of vertebrates for toxicity testing and related research. The need for new approach methodologies (NAMs) to replace vertebrate animals in hazard assessments has come to the forefront of ecotoxicology (Lillicrap et al., 2016).
In 2021, a SETAC Technical Workshop was initiated with the objectives of reviewing scientific advancements that might improve ERAs for terrestrial wildlife and identifying and prioritizing information gaps that warrant further research. One of four workgroups addressed toxicological effects assessment, and was charged with reviewing and providing recommendations for (1) existing wildlife toxicology testing protocols; (2) NAMs and their translation to supplement or replace current data requirements; (3) incorporation of other nonstandard in vivo endpoints; (4) animal models and interspecific extrapolations; and (5) use of statistical techniques and modeling, all to support, refine, and enhance wildlife ERAs. To this end, a companion paper (Bean et al., 2023) focused on current animal guidelines and potential improvements to these testing protocols, use of nonguideline studies, knowledge gaps for some taxa, and the need for better guidance on the conduct of field studies. In the present paper, the members of the workgroup address (1) nonstandard effect endpoints at the molecular, cellular, organismal, population, and ecosystem levels; (2) knowledge collection and organization to provide evidence of potential causal relationships among those levels; and (3) inferential methods to better predict hazards and enhance the ecological relevance of findings. We then discuss the ways in which such information and technologies might be used in hazard assessment, addressing aspects of their readiness for application, reliability, relevance to ecological assessment endpoints, sources of uncertainty, challenges to regulators, and key efforts that could be undertaken to better inform ERAs. Ideally, incorporation of the aforementioned technologies could move wildlife ERAs beyond their current reliance on organismal-level endpoints, improve understanding of mechanisms of action, and address societal and humane interests in reducing or replacing animals in toxicological assessments while facilitating resource management decisions through increasingly efficient, predictive, and economical methods that ultimately benefit wildlife populations and their supporting ecosystems.
OVERVIEW OF SELECT NONSTANDARD ENDPOINTS
While survival, growth, and reproduction are central endpoints for hazard assessments, effects at other levels of biological organization have the potential to enhance ERAs (Figure 1) (Rohr et al., 2016). The term “nonstandard” refers to these other effect endpoints. Advancements in molecular and cellular toxicology are improving the efficiency and predictive ability of toxicity testing, and will likely reduce animal use (Mondou et al., 2021). Responses at the organ and organism levels could also support risk assessments by improving realism and ecological relevance (Ford et al., 2021; Hutchinson et al., 2000), as some of these nonstandard endpoints may predict effects of toxicants with common modes of action on survival and reproduction under field scenarios. Identifying the most relevant endpoints in toxicity pathways (i.e., series of events starting from chemical interactions with molecular receptors or processes leading to harmful effects at higher levels of biological organization) could make some aspects of animal testing unnecessary. Thus, it is important to understand how upstream events of a pathway cause downstream apical endpoints, whole-organism outcomes, such as developmental disruption, reproductive failure, or death. Wildlife protection goals are often at the population level and above, and yet, the most commonly used endpoints in ERAs are at the organism level. Nonetheless, mortality in nontarget species from contaminant exposure can result in societal concern. The reliability of ERAs as decision-support tools could be enhanced by incorporating approaches that make quantitative links, including uncertainty among exposures, response endpoints, and population-level effects (Forbes et al., 2011). Since some effects of contaminants on wildlife populations are mediated by interactions at the community or ecosystem level (i.e., within and among species, their community, and their abiotic environment), tools to encompass such high-level processes are needed.
Molecular endpoints
Demands for increased efficiency and predictive ability of chemical risk assessment have generated interest in the development of NAMs that rely on computational, molecular, and other in vitro tools to evaluate effects at molecular to cellular scales at reduced cost and utilizing fewer animals (Ankley et al., 2010; USEPA, 2021a). This is because downstream adverse effect endpoints are triggered by upstream molecular mechanisms, and tools of systems biology can monitor the relevant pathways. A prominent example of nonstandard endpoints for chemical screening and prioritization is USEPA's Toxicity Forecaster (ToxCast), a high-throughput automated testing platform that can rapidly conduct hundreds of in vitro assays, each measuring unique molecular or cellular endpoints (Borrel et al., 2020; Richard et al., 2016; USEPA, 2021a). The ToxCast program reveals links among in vitro endpoints and toxic effects in animals for risk assessments. ToxCast has generated data for diverse chemicals for which traditional mammalian toxicity data were available. In addition to high-throughput platforms, advances such as three-dimensional (3D) culturing and organ-on-chip are complementing traditional in vitro models (Akarapipad et al., 2021; Yang et al., 2021). Immortalized and primary hepatocytes from a variety of species grown as 3D spheroids have metabolic and gene expression profiles more similar to intact livers than traditional 2D monolayer cultures (Hartung, 2018; Moreau et al., 2022). While current in vitro testing approaches are useful for screening and qualitative characterization of mechanisms, additional progress in in vitro to in vivo extrapolations can strengthen quantitative risk characterization (Lammel et al., 2019; Ramaiahgari et al., 2017; Sharin et al., 2020; Takahashi et al., 2015). Of course, concentrations in vitro should reflect those seen in vivo and ideally be linked to population-level effects, as effects at the molecular level are not always evident or even linked to effects at higher levels of organization.
“Omics” technologies enable simultaneous quantifications of many system components, such as genomes, epigenomes, transcriptomes, proteomes, and metabolomes, for monitoring responses cultured cells, tissues, or whole organisms to chemical exposures. Even when whole organisms are used, omics can lessen animal stress because studies often use very early-life stages and short exposure durations. Once restricted to more qualitative contributions to risk assessment (Brockmeier et al., 2017), recent omics advances (e.g., transcriptomic dose–response [DR] modeling) facilitate quantitative hazard estimates for use in risk calculations. There is growing evidence from the human health arena that transcriptomic points of departure (i.e., the dose or concentration at which biological response is first observed; Sturla, 2018) from short-term studies are highly correlated to traditionally derived points of departure. A benchmark dose (BMD) is the estimated dose, which may be expressed as a range rather than a fixed number, that produces a predetermined change in the response rate of an adverse effect (USEPA, 2023). Transcriptomic points of departure in the form of BMDs after only 13 weeks of exposure were highly correlated to traditionally derived two-year exposure BMDs for cancer and noncancer endpoints (Thomas et al., 2012). Many studies have since made similar observations, using even shorter exposures to derive transcriptomic BMDs (Alcaraz et al., 2022; Johnson et al., 2020; Moffat et al., 2015; Pagé-Larivière et al., 2019). Transcriptomics are widely used for a small number of species (e.g., typical laboratory and domestic animals, humans), but recent advances in RNA sequencing, real-time polymerase chain reaction (PCR), and genome annotation (e.g., Larras et al., 2018) have made advanced omics applications (e.g., DR modeling) economically feasible for many wildlife studies. For example, platforms such as EcoToxChip (Basu et al., 2019) make omics endpoints in wildlife a viable option in standard molecular biology laboratories. EcoToxChips are under development for double-crested cormorant (Nannopterum auritum) and leopard frog (Lithobates sp.), and this technology is expandable to other species. Indeed, similar, although smaller, PCR arrays have been developed for a variety of avian species (Crump et al., 2016; Porter et al., 2014; Zahaby et al., 2021). Whole-genome or transcriptome sequencing has become more feasible for wildlife, although more well-annotated reference genomes are needed. Fortunately, advanced genome annotation tools (e.g., Seq. 2Fun; Liu et al., 2021) are expanding the utility of RNA sequencing in nonmodel organisms lacking well-characterized genomes. Advances have been made in the development and application of high-throughput in vitro gene expression biomarkers. A common approach is to identify hazard-specific gene expression signatures in established cell culture models including, for example, genotoxicity (Liu et al., 2021) and estrogen receptor activity (Corton et al., 2022) in human TK6 and MCF-7 cells, respectively. Although comparatively few molecular and cellular transcriptomic tools have been developed for wildlife-specific applications, because they are mechanism-based, advancement in this area should lead to useful predictive information for wildlife hazards. The identification of sensitive endpoints by NAMs could be used to inform endpoints to assess in field studies, which could then be linked to survival and reproduction measures of free-ranging wildlife.
Bioinformatics tools like Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS; LaLone et al., 2016), which compare sequences and structural similarities of key proteins across taxonomic groups for cross-species evaluations, are relevant for wildlife. As more wildlife-specific tools emerge, molecular and cellular endpoints will likely become common in monitoring programs to detect or predict adverse health effects in populations. For example, ongoing efforts are establishing baseline genes expression signatures in arctic bird colonies that may be affected by increased shipping and oil exploration (Zahaby et al., 2021). To fully exploit these technologies for wildlife ERAs, research will need to link molecular and cellular changes to apical wildlife health impacts (see the Challenges to using nonstandard endpoints in wildlife ERAs section below).
Physiologic to whole-animal endpoints
Researchers often document the physiologic effects of contaminants in wildlife, but they are rarely considered during quantitative characterizations of hazards. Using physiologic biomarkers and other sublethal effects for regulatory studies could reduce animal numbers, their stress, and potential suffering. Many well-studied physiologic measures have been linked to apical responses of regulatory concern, and are more cost-effective and logistically feasible than investigating survival, reproduction, and population-level effects in field studies. Some such measures involve minimally invasive sampling. For example, reduced hemoglobin concentration and hematocrit are sensitive indicators of lead (Pb) toxicosis that are associated with reduced fitness and reproductive success in birds (Buekers et al., 2009; Fronstin et al., 2016). Data on the hematologic effects of Pb from avian and mammalian species have been used to establish toxicity reference values (TRVs) for use in wildlife ERAs (Buekers et al., 2009). Another nondestructive endpoint is the ornamentation of avian integument by carotenoids. These pigments provide important intraspecific signals for mating and reproduction in birds, but they are altered by oxidative stress (Grunst et al., 2020; Lopez-Antia et al., 2015; Spickler et al., 2020; Vallverdú-Coll et al., 2016). Spectrophotometric measurement of integument carotenoid pigmentation could easily be included in ERAs as a measure of health status with implications for avian populations if quantitative exposure–response metrics for coloration were correlated with reproductive endpoints.
Contaminant effects that may be sublethal and without reproductive consequences in laboratory environments with ad libitum access to food may have fitness consequences in free-ranging wildlife that are experiencing multiple stressors (e.g., predators, infectious diseases, extreme weather, limited food resources; Figure 2). For example, sublethal exposures to neurotoxic insecticides can cause appetite suppression, ataxia, hypothermia, loss of body mass, and depressed mentation (Addy-Orduna et al., 2019; Eng et al., 2017; Lopez-Antia et al., 2013; Rattner & Franson, 1984). While these effects may be sublethal and temporary in controlled laboratory-type settings, under field conditions, they could lower survival by increasing susceptibility to ambient temperature extremes, traumatic accidents, or predation (e.g., Mateo et al., 2015). Reduced fueling and mass loss resulting from insecticide exposures have been linked to migration delays in birds, which in turn are associated with reduced reproductive performance and population-level effects (Elliott & Bishop, 2011; Eng et al., 2019; Kokko, 1999; Newton, 2006). Nevertheless, these effects are rarely considered in ERAs because studies that quantitatively link such nonstandard endpoints to direct determinants of population status (i.e., rates of survival and reproduction) are limited.
Population- and ecosystem-level endpoints
Incorporating higher-order nonstandard endpoints, such as effects on populations, communities, and ecosystems, into ERAs could improve their ecological relevance and predictive value. Historically, the strongest wildlife risk assessments and management decisions have relied upon the combination of laboratory exposure–effect data sets with field observations of closely related species (e.g., DDT, PCBs, diclofenac, Hg, Pb, Se; e.g., Arcadis, 2021, 2022; Meyer et al., 2015; Rattner et al., 2011). Once the links between nonstandard endpoints to survival, growth, or reproduction are established based on studies of individuals and groups, population models and field studies could assess relevant higher-level effects. In particular, population modeling is increasingly being used in retrospective ERAs and has the potential for greater use in prospective ERAs (Arcadis, 2021, 2022; Etterson et al., 2021; Forbes et al., 2015, 2016; Luxon et al., 2013; Meyer et al., 2015; USEPA, 2004). A population model uses survival and reproduction input parameters to generate three key population-level endpoints, namely, population growth rate, population size, and probability of extinction. Whole-organism and population endpoints (e.g., age structure, dispersal, survival, reproduction, and growth that is linked to survival or reproduction) are integrated into these three key endpoints to characterize the expected number of animals of a species occupying an environment over time and the viability of that population. Advances in individual-based models are beginning to link suborganismal responses and specific individual behaviors to population dynamics (Silva et al., 2020). Field studies assessing effects on local populations are discussed in a companion paper (Bean et al., 2023).
The next higher level of organization, the community, is comprised of many species that interact and may show changes in diversity related to contaminant exposures (e.g., diminished small mammal assemblages at petroleum-polluted sites; Phelps & McBee, 2009) and to reduced contaminant exposures (e.g., recoveries of sea eagle populations that cause declines in surface-nesting seabirds; Hipfner et al., 2012). A well-recognized need among risk assessors is the importance of “other” factors that can influence contaminant impacts at these levels of organization. Examples include agricultural intensification, urbanization, and contaminants on wildlife epigenetics (Supporting Information), food resources, and spread of infectious pathogens.
Food depletion effects on wildlife
Pesticides and other contaminants can have unintended indirect effects with consequences on wildlife populations by reducing food resources. In the United Kingdom, gray partridge (Perdix perdix) populations declined by 85% over 40 years, likely due to reduced chick survival related to agrichemical use affecting their insect prey base (Sotherton & Holland, 2003). In forest habitats, diflubenzuron application to control tree defoliation (West Virginia, US) seemingly had no direct overt toxic effects on wildlife, but caused dietary shifts of songbirds to less digestible insect taxa that reduced their fat reserves, body condition, and potentially reproduction and survival (Sample et al., 1993; Whitmore et al., 1993). While food availability has long been acknowledged as a regulating factor of wildlife populations and community dynamics, its consideration in contaminant-driven ERAs is rare (additional examples in the Supporting Information).
Complex contaminant–parasite–host interactions
Environmental contaminants can increase risks of infectious disease in several ways, including (1) immunosuppression of hosts; (2) providing or diverting nutrients from other species to intermediate hosts of parasites; and (3) eliminating predators that directly ingest and then digest parasites, intermediate hosts, or highly infected definitive hosts. Amphibians are experiencing global declines, and infectious diseases are among the important problems affecting them. Waterbirds and raccoons are definitive hosts of some adult trematodes that rely on snails and amphibians as intermediate hosts. Rohr et al. (2008) demonstrated that atrazine and phosphate could account for 74% of the variation in the abundance of encysted trematodes in northern leopard frogs (Lithobates pipiens) in Minnesota. Trematodes that encyst in limb buds areas of developing tadpoles can cause debilitating malformations in frogs, and other trematodes may occupy large percentages of the kidneys. Moreover, either of these can be highly lethal when early-stage tadpoles are infected. Phosphate can support increased periphyton, the major food source of snail intermediate hosts of trematodes. The sum of atrazine and desethylatrazine was the best single predictor of the infections, accounting for 51% of the variation in larval trematode loads in frogs. In a mesocosm study, atrazine was associated with reduced phytoplankton abundance, increased nutrient availability, water clarity, and sunlight penetration that presumably promoted the growth of periphyton, as well as greater numbers of snail egg masses and hatchlings, reduced hepatic melanomacrophages and eosinophils consistent with immunosuppression that may decrease resistance to trematode infections, and increased numbers of encysted trematodes in developing frogs. Amphibian trematode infections are also influenced by invertebrate predators that feed on snails or free-swimming trematode larvae (cercariae). Laboratory mesocosm and field studies have illustrated how diverse aquatic invertebrates prey upon trematode cercariae, thereby protecting developing frogs (Rohr et al., 2015; Schotthoefer et al., 2007). In addition to nutrient and herbicide impacts, insecticides that harm aquatic micropredators that preferentially feed on trematode cercariae may also increase trematode infections of frogs (Jayawardena et al., 2016) (Figure 3).
Ecosystem-level endpoints
Ecosystem-level impacts are the highest tier that can be evaluated in ERAs. Ecosystem services (ESs) may be defined as aspects of ecosystems, utilized actively or passively, to produce human well-being (Fisher et al., 2009). Ecosystem services are wide-ranging, including provisioning services like food and material goods, regulating services such as flood control and disease mitigation, and cultural services such as recreation and esthetics. Ecosystem services concepts can account for the benefits of intact or recovering ecosystems as well as losses when components are degraded or lost. Examples of the adverse effects of chemicals on specific ESs include diclofenac treatment of livestock and the malicious use of poisons, both of which harmed vulture populations, reducing their role as “nature's cleanup crew” (Ogada et al., 2012), and impacts of systemic insecticides on pollinator, predatory, or parasitoid insects that can reduce their benefits to agricultural productivity and native plant communities (Chagnon et al., 2015).
CHALLENGES TO USING NONSTANDARD ENDPOINTS IN WILDLIFE ERAs
- 1.
What is the exposure–response relation for the NAM?
- 2.
Is the NAM response specific to a chemical or chemical group?
- 3.
What level of change of a NAM translates to a higher-level effect?
- 4.
What is the uncertainty bound between the NAM and the higher-level effect?
- 5.
What field conditions apart from the chemical(s) of interest enhance or inhibit the NAM?
Further, NAMs that prove to be well linked to apical endpoints need to be validated in terms of accuracy, precision, and reproducibility before adoption. New approach methodologies can generate enormous amounts of data, which must be evaluated and communicated for effective translation into actionable knowledge. However, remedial project managers (RPMs) are often most comfortable with conventional hazard quotient approaches, and few responsible parties are likely to invest in a NAM-based ERA in the absence of an expectation that the new method will be considered by the RPM. Ecological risk assessments are used to support decisions about releasing chemicals into the environment or cleaning up legacy contamination, and the costs of incorrect decisions—whether ecological, financial, or reputational—can be high. As such, regulators and stakeholders are unlikely to accept lines of evidence that are not clearly linked to ecological assessment endpoints and protection goals, often wildlife populations.
Incorporating higher-order nonstandard endpoints into ERAs presents similar challenges. If a significant behavioral or physiologic effect is observed, determining the magnitude that has relevance for populations can be complicated. One example of a critical effect level being set for a nonstandard endpoint is eggshell thinning, where 18% thinning is considered the relevant threshold for increased cracking and consequently decreased reproductive success (EFSA, 2009, 2017, 2023). The case of eggshell thinning illustrates the merit of investing in laboratory and field studies that link nonstandard endpoints to population-level effects. While many reports have identified biochemical, physiologic, and histopathologic biomarkers of contaminant exposures, such nonstandard endpoints have rarely been linked to higher-order effects on wildlife populations.
Field studies can provide important links between physiologic and behavioral effects deduced in laboratory studies and their ecological relevance. However, setting effect thresholds may be complicated by interspecies differences in sensitivity and variation in field conditions experienced by wildlife. Also, field studies often lack the duration or statistical power to demonstrate significant impacts on reproduction or survival (Ågerstrand et al., 2020). They are also generally limited to postmarket or retrospective risk evaluations. There are no standard protocols or guidance documents to develop and evaluate field studies, although there are recommendations with respect to field studies and their characteristics for consideration in ERAs (Bean et al., 2023; EFSA, 2009, 2021).
Regulatory and logistical challenges to the incorporation of nonstandard endpoints can be significant, as in the case of prescriptive risk assessment methods for pesticides. Nonstandard endpoints are more readily used in postmarket reevaluations of chemicals, in which adaptive management frameworks for ERAs integrate preregistration (i.e., prospective) risk assessment procedures with postregistration studies of exposed wildlife (EFSA, 2009, 2023; Pest Management Regulatory Agency, 2021; USEPA, 1998a, 1998b). Such frameworks are ecologically relevant and may accommodate more diverse, innovative, and ultimately informative data than studies of standard endpoints (Bustnes et al., 2015; Dietz et al., 2021). Unfortunately, many ecological scenarios examined in terms of species, chemicals, and molecular targets have been difficult to include in conventional ERAs (Ågerstrand et al., 2020; Matthiessen et al., 2018). Moreover, in some cases, their relevance to population effects has yet to be established (Crane et al., 2019; Topping & Luttik, 2017).
One regulatory challenge of some preregistration ERAs is that the time may be lengthy between chemical registration and the subsequent detection of unexpected adverse effects at the organism or population level. In addition, damage to natural resources may be compounded by the additional time required for regulators to accumulate evidence sufficient to trigger actions that reduce or eliminate the risk (Attademo et al., 2021; Oaks & Watson, 2011). While it is important to avoid stifling the chemical industry's incentives for innovation, the reevaluation of registered chemicals may be improved by more flexible integration of available nonstandard endpoints into the postregistration assessment (e.g., using data from related chemicals or those sharing similar modes of action and response metrics; Ågerstrand et al., 2017).
Although the value of models that link organism-level to population- or higher-level responses is acknowledged, the use of such models has been hindered by their complexity, uncertainty, resource investment, and data availability. To tackle these technical and data availability challenges, Raimondo et al. (2018, 2021) created an integrated modeling framework and decision guide (Pop-GUIDE) to aid in conceptual population model development and application in ERAs. Pop-GUIDE assists users in selecting the appropriate model complexity commensurate with the quality and quantity of data to fit the risk objectives and uncertainties. This effort is a start toward advancing ERAs and understanding population-level effects. However, practitioners must be trained in the nuances of population or ecosystem-level modeling, which requires a commitment to studying the field and developing strong modeling and programming skills.
SOLUTIONS TO CHALLENGES
For research and new technologies to be applicable and of value to modernize ERAs, they must meet the requirements of risk assessors. Conditions include (1) the decisions that the ERA is intended to support' (2) the metrics identified in problem formulation that are needed to support the decision' (3) the quality, quantity, and uncertainty that are acceptable for the decisions' and (4) the linkage to exposure measures that will be available (i.e., effects need to match measures of exposure to be integrated into the ERA). Ultimately, NAM effects need to be tied back to problem formulation and exposure to ensure that the data collected can support the decision. Shared insights are needed on topics such as nonstandard endpoints, knowledge collection solutions (e.g., systematic reviews, adverse outcome pathway [AOP] frameworks), data integrations using advanced statistical or mathematical analyses (e.g., DR modeling, Bayesian networks, probabilistic modeling), and models that link or extrapolate across levels of biological organization (species population to ecosystem modeling) to enhance risk estimations (Rohr et al., 2016).
Systematic review
Systematic review is a documented and transparent process for conducting literature searches, screening literature for relevance to assessment goals, rating the quality of evidence, extracting data, conducting data integration, and analyzing the strength and limitations of available evidence for exposure and hazard assessments (e.g., Woodruff & Sutton, 2014). Many aspects of this process can be applied to wildlife ERAs to identify and evaluate evidence in a comprehensive, objective, transparent, and consistent manner. With this process, wildlife ERAs can utilize information from all levels of biological organization, including information derived from NAMs and models. The information considered and process workflows differ depending on the goals and needs of the regulatory program. To date, most such activity has focused on human health and epidemiologic data. Systematic review under the Toxic Substances Control Act not only considers publicly available information in conducting a risk assessment for occupational worker health, consumer health, and general population health but also for aquatic and terrestrial environmental health. From a terrestrial wildlife perspective, knowledge collection solutions include the ECOTOXicology Knowledgebase that uses systematic methods for literature search, review, and data curation, providing data for over 12 000 chemicals from studies using standard and nonstandard species (Olker et al., 2022).
A fully documented systematic review can be a laborious undertaking. This process leverages existing data, minimizing the need for animal toxicity tests. Wildlife risk assessors might adopt particularly useful elements of systematic review (e.g., study inclusion/exclusion criteria, data quality evaluation), while limiting some elements of documentation (e.g., literature search terms, excluded studies). In a review of the effects of mercury on avian reproduction, Fuchsman et al. (2017) identified study inclusion criteria and documented study exclusions only in cases that required expert judgment (e.g., strength of causal inference). Reliance on predetermined search criteria can be limiting for topics where wide-ranging and sometimes unexpected lines of evidence may be applicable; this can be addressed through preliminary searches and review to support final development of search terms.
Adverse outcome pathways
Despite advances in molecular and cellular toxicology, and vast evidence generated for nonstandard endpoints in ecotoxicology, there is a need for stronger links to ecological relevance to support confidence in using this information for decisions. Research should focus on linking molecular mechanisms of toxicants to population effects to substantiate their use in ERAs and management decisions. Such data gaps are symptomatic of technical challenges in translating mechanistic data to higher levels of biological organization and constitute hurdles for regulatory acceptance of NAMs and other nonstandard endpoints. This knowledge translation challenge can be categorized into issues of synthesis (i.e., evaluation and interpretation of evidence) and communication (transfer of knowledge from experts to nonexpert end users). These categories may entail extrapolating from biochemical to organismal, to population and/or ecological relevance, determining the strength and uncertainty of such extrapolation, and communicating this information to regulatory decision-makers.
One effort to address these challenges is the AOP framework (Ankley et al., 2010), which the Organisation for Economic Co-operation and Development (OECD) has put into practice (Delrue et al., 2016) via the AOP-Wiki (www.aopwiki.org). A goal of AOPs is to provide risk assessors with clear, transparent evidence of the regulatory significance of hazard-related biological measures. Evidence from the scientific literature is systematically cataloged according to a modular framework to establish causal relationships between stressors and quantifiable biological responses (including molecular and cellular levels) and traditional endpoints of regulatory importance (e.g., wildlife populations). The components of the framework are molecular initiating events, key events, adverse outcomes, and their relationships (key event relationships). Adverse outcome pathways are designed to organize and evaluate evidence for causal relationships between measurable key events and to identify data gaps and uncertainties. Adverse outcome pathways can be developed by applying approaches used in the systematic review (Huliganga et al., 2022; Svingen et al., 2021). Compilations of such information and translations into pathways require significant effort. Conversely, AOPs can provide a framework to support systematic reviews (Roth et al., 2020; von Stackelberg et al., 2015).
Researchers who focus on in vitro toxicity testing have most strongly adopted AOP frameworks, but AOP development is not solely the domain of molecular toxicologists. Input from researchers with expertise at higher levels of biological organizations, including those who assess survival and reproduction in free-ranging wildlife, is critical to strengthening links between molecular effects and individual- or population-level protection goals. Several efforts have shown that bioenergetics may be a useful way to link suborganismal processes and effects to individual- and population-level endpoints (e.g., Murphy et al., 2018), particularly when food is a limiting factor. Applications of dynamic energy budget (DEB) theory and models have had value in establishing quantitative links across levels of biological organization. A DEB model was used to predict effects on apical endpoints in American mink (Mustela vison) exposed to PCBs (Desforges et al., 2017). Broader qAOP models provide means for developing AOPs for applications in wildlife ERAs (Perkins et al., 2019).
Data requirements for a contaminant could be supplemented or replaced by expanding upon an existing AOP with ecologically relevant key events and key event relationships (e.g., AOP#12 Chronic binding of antagonist to N-methyl-d-aspartate receptors (NMDARs) during brain development leads to neurodegeneration with impairment in learning and memory in aging; Tschudi-Monnet & Fitzgerald, 2021). Lead can initiate this AOP, which is plausibly conserved among vertebrate taxa, but resulting effects (e.g., impaired behavior in mallard ducklings, Anas platyrhynchos; Hoffman et al., 2000) may not be considered in current ERAs. Such developmental effects are relevant to population-level protection goals and might manifest at Pb exposure thresholds lower than those based on hematologic and biochemistry values. To warrant validation for an ERA, AOP#12 would require quantitative evidence linking impaired learning and memory in a representative organism to (1) alterations in behavior, (2) reduced productivity or survival, and (3) reduced populations (see the Population modeling section below). Currently, the vast majority of AOPs described in the AOP-Wiki are qualitative in nature. The aspiration of linking quantified key upstream events to quantified measures of downstream apical outcomes is challenging because of the paucity of data for wildlife. Realization of quantitative AOPs (qAOPs) requires the development of frameworks, including enhanced capabilities for cataloging evidence, standards for qAOP development and evaluation, and tools to link qAOPs to exposure–response relationships for higher-level effects (Spinu et al., 2020). Once causal relationships to the population level or higher levels can be established, quantified, and documented (i.e., validated), risk assessors will likely consider data further upstream in the pathway (e.g., molecular and physiologic data without need for animal testing). Pathway gaps may be acceptable if the overall relationship is well supported. For example, if apical level, ecologically relevant key events and relationships become well established, then molecular data (e.g., decreased messenger RNA transcripts for brain-derived neurotrophic factor, an upstream key event in AOP#12) could be accepted in identifying sources of risk to wildlife.
Another technical and data availability challenge is extrapolating toxicological evidence from model species to inform risk estimations for ecologically relevant species. The AOP framework can document evidence of structural and functional conservation, or divergence, of toxicologic mechanisms among species using experimental or bioinformatic evidence. SeqAPASS is a computational bioinformatic tool that is well-suited for cross-species comparisons of molecular structure and function (LaLone et al., 2016). It compares amino acid sequences of key proteins across species (e.g., proteins in an AOP molecular initiating event) to identify an AOP's probable taxonomic domain(s) of applicability. Development of fully quantified and validated AOPs may take decades but is a worthwhile goal for increasing the accuracy and efficiency of wildlife ERAs. Retrospective ERAs may benefit the most initially because specific populations can be evaluated and validated. Over time, prospective ERAs could benefit from information gained from validated AOPs.
Dose–response curves and meta-analyses
Relationships between chemical doses or concentrations and effects on organisms are often quantified using regression analysis. Dose–response (DR) relationships developed for laboratory or field study endpoints are essential to modeling contaminant effects on populations and can inform Bayesian networks (see below). For wildlife ERAs, DR relationships are recommended over no-observed-adverse-effect level (NOAEL) and lowest-observed-adverse-effect level (LOAEL) values, in part because they provide information about the magnitude and severity of responses if an effect threshold is exceeded, and they can identify thresholds of interest (e.g., 20% effect concentration, EC20) instead of relying solely on the statistical power of hypothesis tests (Allard et al., 2010; Mayfield et al., 2014). Hill et al. (2014) provided a useful review of approaches to DR analysis, and additional discussions on model selection and fit, and software options are available (e.g., Erickson & Rattner, 2020; Mayfield & Skall, 2018). Nonstandard, lower-tier endpoints can be directly or indirectly modeled by incorporating effects into a DR model of a higher-tier endpoint.
Benchmark dose models are increasingly being used to develop TRVs for wildlife. Benchmark dose models plot the best-fit DR curve using continuous and discrete data (Jensen et al., 2019) and consider all of the data to plot thresholds for given responses. Newer methods that incorporate trends in prior distributions in response modeling include Bayesian BMD models (Shao & Shapiro, 2018). These methods provide greater precision for estimating toxicity thresholds and are a considerable improvement over NOAEL, LOAEL, and regression methods.
Risk assessors face several decisions when performing DR analyses, including choice of statistical model, form of data (i.e., individual responses vs. mean), and the approach for evaluating multiple studies of the same toxicant and effect type. There may be trade-offs between optimizing the analysis for statistical considerations versus incorporating relevant biological information. For example, normalizing observed biological responses to control results has often been viewed as a convenient way of harmonizing data from multiple studies and facilitating comparability among species (e.g., Fuchsman et al., 2008; Sample et al., 2019), but this approach can lose valuable information on normal variation in biological performance that may reflect data quality and animal husbandry (Blankenship et al., 2008). Normalizing responses to the control violates the assumption of independence that is central in parametric analysis and, depending on the analysis method, has the potential to introduce quantitative distortion of results (Green, 2014, 2016). Control normalization can often be avoided without compromising assessment objectives (Fuchsman et al., 2017). Nevertheless, expressing biological responses relative to control or reference performance can facilitate comparisons of directly related yet disparate lines of evidence for DR analyses that link effects to populations.
Approaches to assessing effects of vanadium (V) on avian reproduction demonstrate the myriad choices when integrating DR information. Because V is a common impurity in poultry feed and can reduce egg production, its effects have been studied extensively. In an application of BMD methodology, Mayfield and Skall (2018) avoided combining data from multiple V studies and identified EC20 values of 3.25 and 13.86 mg V/kg BW/day from just two studies of chicken (Gallus gallus) reproduction. This approach suited the authors' objective of comparing EC20 results with NOAELs that had been used to develop screening benchmarks. However, it also omitted most available data and did not characterize the central tendency DR relationship across studies, which is more useful for interspecies extrapolation and more accurate for population modeling. Figure 4 shows an alternative four-parameter logistic analysis (R Core Team, 2021) of pooled data from 16 studies of V on egg production, where data are not control-normalized, which produces an EC20 of 2.54 mg/kg-day (Supporting Information: Table S1 and Figures S2–S4), similar to the lower EC20 identified by Mayfield and Skall (2018). Other approaches could include pooling control-normalized data or using a random variable to account for between-study differences. The latter approach is rigorous but applies primarily to linear regression and thus can require workarounds to accommodate nonlinear DR relationships (e.g., fitting a series of linear segments; using hierarchical Bayesian Markov chain Monte Carlo methods). The primary challenge to DR analyses is that they are often constrained by limited data. Given the variety of methods available, risk assessors should consider the quantity and quality of available data, the objectives of their analysis, trade-offs among options, and uncertainties in the selected approach.
Probabilistic approaches
Many risk estimates for wildlife (e.g., hazard quotients) are not actual estimates of risk, and their use in wildlife ERAs has been critiqued (Allard et al., 2010). Risk is the probability of an adverse outcome. Typically, ecological risk estimates rely upon a deterministic quotient (exposure divided by threshold effects dose), but these quotients can be used to evaluate actual risk by incorporating probabilistic approaches that quantify outputs of the magnitude and likelihood of adverse effects. Probabilistic risk assessment avoids compounded overestimation of risk, which can be an issue in deterministic risk assessments that use highly protective estimates of parameters in food web and effects models. Probabilistic approaches are not warranted when simpler assessments using protective estimates show no concern for risk (USEPA, 1997). Challenges in probabilistic risk assessments include the effort required to estimate distributions of variables driving risk and assuring that assumptions based on limited data are credible. A sensitivity analysis should be part of a probabilistic assessment to evaluate the effects of uncertain assumptions, and ideally, variability and uncertainty in output distributions should be differentiated.
Two common approaches for probabilistic ERAs are Monte Carlo simulations and Bayesian networks (Fenton & Neil, 2011). Both approaches yield probabilistic outputs to help identify likely causal drivers of risk, but Bayesian analysis also allows prior knowledge about the parameters to be incorporated (Kéry & Schaub, 2011) and is the preferred communication tool, with its flexibility to calculate risk forward and identify key causal factors with back calculations to meet management goals (Fenton & Neil, 2011). Care should be taken in discretization at the extremes of distributions in Bayesian networks to prevent misleading interpretations (Marcot & Penman, 2019).
Monte Carlo and Bayesian probabilistic simulations have been applied to assess risks from multiple, disparate stressors. Effects thresholds are often the most sensitive variables affecting risk outcomes (Meyer et al., 2015). Specific to effects assessment, probabilistic approaches allow for consideration of a broader range of interacting stress factors that can bolster evaluations of the ecological relevance of toxic effects (Supporting Information).
A key consideration for risk assessors is that chemicals in the environment occur as components of mixtures. Bayesian networks have been used to predict the additive toxicity of pesticides (Mitchell et al., 2021) and to evaluate a combination of biomarkers to assess impacts of PAHs under different exposures that vary with environmental conditions (Fahd, 2021). Salice (2012) used probabilistic tools to evaluate risk from toxicant exposure, altered hydroperiod, and terrestrial habitat availability in amphibians. The coordinated application of these probabilistic tools could help in estimating toxicant effects in complex, relevant, and realistic contexts of chemical mixtures and multiple stressors. Moreover, Bayesian networks could be of value in linking suborganismal responses in NAMs and AOPs to apical effects (Haselman et al., 2020; Moe et al., 2021; Sample et al., 2022).
Population modeling
Individual organism-level effects assessment for threatened and endangered species can be captured using DR curves, potentially enhanced by interspecies extrapolation methods (see below). As previously mentioned, risk assessed for nonspecial status species often targets protection at the population or community level. In cases where risk has not already been ruled out at the individual level, risk at the population level can be effectively addressed with population models (Raimondo et al., 2018, 2021). This is accomplished by applying the percent reduction obtained from DR curves to reproductive, survival, or dispersal rates in stage matrix models or models that track the individuals in the population (Forbes et al., 2015). Unlike matrix models, the individual-based population models (e.g., Vortex: Lacy & Pollak, 2022; Netlogo: Wilensky, 1999) track movements, reproduction, survival, and even alleles of individuals in the population over time. These models can include adverse effects of an increasing number of deleterious alleles in species that are declining due to a contaminant, an infectious disease, habitat degradation or loss, and/or other stressors, as well as the potential upsides of habitat protection and other stewardship actions. Dose–response curves are developed from available and validated laboratory or field studies and can be based on nonstandard endpoints if there is a linkage to survival, reproduction, or dispersal. Dose–response curves of surrogate species can be used as in standard ERAs or they can be adjusted to extrapolate to species evaluated in an ERA (see below). An advantage of population models is that a sensitivity analysis can be conducted on the variable and uncertain parameters used to assess risk including DR models can reveal whether uncertainty has a minor or substantial effect on risk (Raimondo et al., 2021). Before use in the population model, the DR curve is standardized so that chemical concentrations that produce no adverse effects are quantified as 100% of control (or reference) survival or reproduction.
An advantage of population modeling is that it integrates across many endpoints typically evaluated in ERAs, producing a more holistic understanding of risk. If reproduction is affected, the best endpoint selected for the DR curve is often an integrative endpoint such as the number of independent young produced/female (e.g., number of amphibian metamorphs, avian fledglings, mammal weanlings), which integrates effects on breeding probability, fertilization, litter or clutch size, and survival of young to independence (Arcadis, 2022; USEPA, 2004). The reproductive endpoint is applied to the first age class and the model then integrates survival of the other age classes with reproduction. Survival of the age classes typically begins with the survival of independent young to the next age class or life stage in the model and then survival of each older age or stage is modeled (e.g., MCnest algorithms; Bennett & Etterson, 2013). Therefore, DR curves on survival should include survival modeled separately for the independent juveniles, subadults, and adults of each age class if survival differs among life stages.
If the chemical impacts only one population-level endpoint (e.g., chick survival), then only a DR curve of that endpoint is needed (see below). That single curve can be used to reduce the background reproduction or the survival term (number of independent young produced/female). For example, a 10% reduction in chick standardized survival (based on the DR curve) will reduce the unimpacted estimate of independent young produced/female by 10%.
A population model is structured differently for retrospective versus prospective wildlife risk assessments. A retrospective assessment compares two scenarios: a chemical-impacted model of current conditions (baseline) and a model with the chemical impacts removed. A prospective assessment is similar but begins with an unimpacted baseline model and adds the chemical effects. To incorporate nonstandard endpoints, the aim is to model the effect of chemically induced changes in survival or reproductive values in the population based on DR curves that account for lower-tier effects. A comparison of model output in terms of the three key endpoints (population growth rate, size, and extinction probability) of the two scenarios will indicate the risk to the receptor species (Raimondo et al., 2021). Dispersal or migration of individuals can be added to account for landscape-level dynamics, in which the effect of the chemical in the source location may be muted by immigration from other local populations or transferred to populations in another location (e.g., salamander risk assessment; Arcadis, 2021). Software may be preexisting population modeling programs (best for beginning learners, e.g., RAMAS, Akçakaya & Root, 2005; Vortex) or designed by advanced modelers in more general software with modeling tools (e.g., Matlab, R, Python, or even Excel).
While the modeling process may seem data-intensive for wildlife risk assessment, often, the required population life history parameters required for the selected receptor species may be approximated from online data sets (e.g., Monitoring Avian Population Stations, Breeding Bird Surveys; Millsap et al., 2022), population size trend data (Hanley et al., 2022), and from the available literature. Combinatorial optimization algorithms have been used to estimate species vital rates (e.g., survival, reproduction, dispersal) from a population abundance time series if rates are unavailable (Hanley et al., 2022). Additionally, abiotic media and tissue concentration data needed to model exposure are often collected on a site for retrospective ERAs or field tests of toxicity. If capture or resighting histories are available, more advanced methods, such as integrated population modeling that incorporates multistate transition modeling, can be used to create the baseline model (Margalida et al., 2020; Nur et al., 2021). Wildlife ERAs, whether prospective or retrospective, could make great advancements if more such studies are completed with data becoming inputs for modeling population risk scenarios.
An issue with population modeling in ERAs is the appropriate scale of analysis. Risk is often evaluated on contaminated sites that are too small to capture risk to populations that extend well beyond the site (Tannenbaum, 2022). Evaluating cumulative impacts across contaminated sites within a region is important to understand the effects on populations of more mobile species (Green et al., 2022). While such assessments are feasible, regulatory frameworks to assess cumulative risk at the landscape scale are generally lacking.
Another challenge of incorporating nonstandard lower-level endpoints (molecular to organismal) into population models is quantifying effects on a DR curve, provided linkage has been demonstrated. Such linkages are sometimes revealed by pairing laboratory and field results such that sublethal effects can be quantified in a DR curve. Alternatively, empirical site data on the percentage of deaths due to the contaminant based on necropsies and ancillary data findings (e.g., histologic lesions and contaminant residues) might be used to estimate changes in survival to create the unimpacted scenario (Hanley et al., 2022; Meyer et al., 2016, 2022; Slabe et al., 2022). After the linkage is created, the population model that relates the nonstandard endpoint to survival or reproduction can quantify key population-level endpoints.
Example of incorporating a nonstandard endpoint into a population model
Here, we incorporate a nonstandard endpoint into a population model for a hypothetical island population of herring gulls (Larus argentatus) exposed to flaking Pb paint chips, where the DR relationship for chick survival is adjusted to account for behavioral impairment due to neurotoxicity. Burger and Gochfeld (1990) characterized effects on two-day-old herring gull chicks injected with Pb acetate in the laboratory and recorded behaviors at a relatively low dose that had no impact on survival. The experiment was repeated in the field, injecting wild chicks with the same low dose of Pb, and the same behavioral changes were observed in the laboratory (Burger & Gochfeld, 1994). However, they reported differences in survival in the field between injected and control chicks that were not observed in the laboratory. Findings from this combined laboratory and field study approach support an AOP linkage of abnormal behavior to reduced survival in wild populations. A concentration of 7.01 mg Pb/kg dw liver was associated with reduced chick attentiveness and begging behavior, ultimately causing higher predation- and starvation-related death rates (38% reduction) than in control chicks. Chicks are likely exposed to Pb by soil ingestion at the nest when being fed, while pelagic-feeding adult gulls generally do not consume paint chips (Finkelstein et al., 2003), and thus are assumed to be unaffected in this hypothetical scenario.
Using these data and other laboratory and field data (Supporting Information: Table S2) for waterbird chicks and juveniles (not adults, which have a different curve), a DR curve is developed showing the relationship that might be expected between liver Pb concentrations and herring gull chick survival (standardized to control values; Arcadis, 2022; Figure 5). For laboratory studies in the DR curve, the reported chick survival is reduced by 38% to account for behavioral effects.
The DR curve is applied to the island population of herring gull chicks exposed to Pb in soil. Liver Pb concentrations of gull chicks at nest sites are estimated throughout the island from randomly collected soil samples paired with site-specific bioaccumulation regressions developed between soil Pb at nests and liver Pb concentrations. After accounting for varying nesting density, the percentage of the island chick population expected to die from Pb toxicosis is estimated with the DR curve. This reduction in survival is incorporated into a stochastic, stage matrix population model that has survival and reproduction initially parameterized to baseline conditions on the island observed over time to represent the impacted scenario. The estimated reduction in survival is inverted when entered into the model to simulate the removal of Pb and an increase in chick survival for the unimpacted scenario. A ceiling on population size is included, and reproduction and survival are assumed to be independent of population density, as often observed for seabirds (Nur et al., 2021; Nur & Sydeman, 1999).
When the model is run 20 000 times for a specified number of years, it can predict change in three key population-level endpoints at the end of the time period due to Pb effects on behavior (Figure 5). The risk of decreasing to a population size of 1700 breeding gulls (i.e., risk of quasi-extinction), identified as a level of potential concern for a starting population of over 13 000 breeding gulls, was less than 2% after 80 years, which did not change when Pb was removed. These hypothetical results indicate that the effects of Pb on population trend and size do not appear to be ecologically distinguishable (Figure 5B,C). However, should differences in the endpoints be greater (e.g., if the overlap of population size distributions of the two scenarios is less than 90% or 95%; Nur et al., 2021), risk to seabirds nesting on the island could be considered unacceptable.
Interspecific extrapolations
Because laboratory toxicity tests rely on a few wildlife species, interspecies differences are a major source of uncertainty in wildlife ERAs. Interspecies extrapolations of toxicity data are essential, given the practical and ethical limits of testing numerous diverse vertebrate species. For prospective risk assessments, this uncertainty is typically addressed by applying an uncertainty factor, whereas retrospective risk assessments may rely primarily on narrative discussions of uncertainties. Improving such extrapolations would enhance the accuracy of ERAs to support better wildlife risk management decisions. The application of new and advanced methods of extrapolation is an area of active research (see below and Bean et al., 2023). Greater certainty in interspecies extrapolations could also improve the reliability of wildlife species sensitivity distributions (SSDs) used to develop broadly protective effect benchmarks based on data rather than relatively arbitrary uncertainty factors.
Toxicokinetics and toxicodynamics
Interspecies differences often consider toxicokinetics that relate to processes by which a contaminant moves from the point of environmental exposure (e.g., ingestion of mercury-containing prey) to and from the internal target site (e.g., binding to selenoenzymes in brain). Such processes include absorption, distribution, metabolism, and excretion. Interspecific differences also consider toxicodynamics that influence interactions between the contaminant and its target (e.g., differences in aryl hydrocarbon [AhR] binding and activation for dioxin-like compounds [DLC]). Clewell and Fuchsman (2023) discuss applications of NAMs for toxicokinetics and toxicodynamics in interspecies extrapolations in wildlife with a tiered approach depending on available data and resources.
Physiologically based toxicokinetic modeling is the state-of-the-art approach to understand toxicokinetic components of interspecies extrapolations, with some work conducted in common avian wildlife test species (Baier et al., 2022; Nichols et al., 2010). Physiological descriptors (e.g., volume fractions of tissue compartments) are among the model inputs. This basic physiological information is not necessarily available for many wildlife species, although it can be estimated. In vitro methods have the potential to elucidate important processes, such as metabolic clearance (e.g., toxicant removal from plasma through metabolic transformation). Quantitative in vitro to in vivo extrapolations of metabolic clearance are widely applied in drug development and have been considered for environmental contaminants (Yoon et al., 2012). Differences in chemical metabolism among species can be significant and correlated to toxicant sensitivity. Demethylation of methylmercury varies among bird species, with osprey showing particularly effective detoxification through this mechanism (Henny et al., 2009; Hopkins et al., 2007). Owls metabolize the anticoagulant warfarin more slowly than granivorous birds, and it has been suggested that owls and other raptors may therefore be more sensitive to adverse effects of anticoagulant rodenticides (reviewed in Horak et al., 2018).
Increasing the understanding of differences in protein binding, which can affect chemical distribution and excretion, is another opportunity for NAMs to be refined in the application of toxicokinetics for interspecies extrapolations. Computational analysis of binding predictions based on amino acid sequencing of target molecules can be used to predict the toxicity of anticoagulant rodenticides in different species that have varied mutations in the gene that encodes structural forms for the vitamin K 2,3-epoxide reductase, the enzyme that recycles vitamin K to enable its reuse in activation of blood coagulation proteins (Bermejo-Nogales et al., 2022; Takeda et al., 2022). For metals, differences in basal expression and induction potential of metallothioneins and other “metal chaperone” proteins could affect relative species sensitivities (Spurgeon et al., 2020). Predictions of protein binding are especially important for perfluoroalkyl substances (PFAS), as this class of contaminants bioaccumulates via affinity for proteins rather than lipids. In an innovative computational investigation, Cheng et al. (2021) applied the SeqAPASS tool to evaluate variable liver fatty acid binding proteins across species and identified specific amino acid differences that could potentially affect PFAS binding. These differences were then explored further through homology modeling of protein structure and molecular docking simulation (Cheng et al., 2021). This type of analysis could facilitate the screening of PFAS bioaccumulation rates and half-lives across a range of species (Bangma et al., 2022).
Considering the complexity of toxicokinetics, tissue-based TRVs are an appealing approach for retrospective assessments, allowing risk assessors to bypass translation from intake or absorption rates to internal exposures by evaluating internal exposures directly (Beyer & Meador, 2011; Clewell & Fuchsman, 2023; Mayfield et al., 2014). A limiting factor in tissue-based risk analysis is the availability of toxic effect data paired with tissue concentrations. Residue analyses are required in new toxicity studies to support interpretations of wildlife monitoring data. The appropriate tissues for analysis will vary among chemicals. Moreover, residues in minimally invasive nonlethal sample matrices (i.e., feathers, fur, blood, excreta) should be assessed with the goal of relating those residues to exposure–effect thresholds.
Toxicodynamic differences among species also provide fertile ground for NAMs, although their verification requires in vivo toxicity data. For avian embryotoxicity mediated at the AhR, bird species have been classified into three sensitivity groups based on certain polymorphisms in the genetic sequence of the AhR ligand-binding domain (Farmahin et al., 2013). This knowledge can be applied in ERAs by assigning TRVs or DR curves for each of the three avian sensitivity groups. This classification can help identify sensitive species (e.g., type I: domestic chicken, type II: Northern bobwhite Colinus virginianus, type III: American kestrel Falco sparverius) as well as potentially highly exposed but not highly sensitive species (e.g., most piscivorous birds), with site-specific species' vulnerabilities following from combinations of exposure and sensitivity (Hwang et al., 2016). Differences in species sensitivity seem likely to explain why significant DLC effects have been noted only in type II species (e.g., tree swallows Tachycineta bicolor) at sites contaminated with PCB-126 or toxic furans (Custer et al., 2003, 2018; Fredricks et al., 2011) or in type I species, such as European starlings (Sturnus vulgaris) at moderately contaminated sites (Arenal et al., 2004; Halbrook & Arenal, 2003). Robust studies of type III species, even at highly contaminated sites, have not found DLC effects on reproduction or survival, although biochemical endpoints such as CYP1A induction have been reported (e.g., Best et al., 2010; Harris & Elliott, 2011).
The AhR-mediated avian sensitivity model has been supported by in vivo toxicity testing, in vitro enzyme induction assays, genetic sequencing, and molecular docking analysis (Head et al., 2012; Hirano et al., 2015; Kennedy et al., 1996). Further, quantitative response–response analysis has been performed to facilitate the prediction of in vivo effects from in vitro assay results, which could complement the current genetic classification of species (Doering et al., 2018; Head & Kennedy, 2010). Describing and understanding interspecies differences in toxicodynamics require the development of additional models that are not as well supported by data as the AhR-mediated sensitivity model.
Molecular docking studies
Molecular docking approaches are likely to substantially improve the efficiency of interspecies sensitivity assessments, given the increasing availability of genetic sequences (Feng et al., 2020). These approaches find the “best-fit” orientation and binding affinity of a ligand to a protein of interest. Saxena et al. (2015) performed molecular docking studies for azole pesticides and aromatase inhibition in birds. Docking scores correlated well with human aromatase inhibition among azole compounds, and avian docking results were qualitatively consistent with in vivo data on the relative toxicity of azoles to birds. From a risk assessment perspective, this type of information is suitable for hypothesis generation regarding the relative toxicity of specific azoles and their potential hazards to wildlife. To support conclusions about risks, additional evidence of effects is needed, such as the in vivo demonstration of impact on reproduction and the expression of genes for enzymes involved in biosynthesis of sterols and steroid hormones in red-legged partridges (Alectoris rufa) fed seed treated with the azole fungicide, tebuconazole (Fernández-Vizcaíno et al., 2020; Lopez-Antia et al., 2021).
Another recent investigation exemplified why molecular docking studies require validation. Zhang et al. (2021) performed enzyme induction assays (the selected indicator of species sensitivity) and molecular docking/molecular dynamics modeling to explore the effects of brominated analogs of chlorinated DLCs on species representing AhR-based avian sensitivity groups. They found that the two lines of evidence did not correlate in a simple manner, and machine learning was needed to explore the relationships between molecular conformations and species sensitivities.
Interspecies correlations and SSDs
In contrast to the aforementioned mechanistic approaches, empirical read-across methods have been developed to facilitate interspecies extrapolations. Interspecies correlation estimation (ICE) models are least-squared linear regressions between two species for a range of chemicals that provide information on relative sensitivity to acute toxicity (Raimondo et al., 2010). The premise is that the relative sensitivity of different species is broadly consistent and predictable across toxicants, such that for a new chemical, knowledge of acute sensitivity for a surrogate species can be extended to other species that share a significant pairwise ICE model. The model robustness improves if the mode of action-specific model is built with supporting data, the data set is robust, and the taxonomic distance between predicted and surrogate species is small (Raimondo et al., 2007). The approach was developed to improve ERAs for threatened or endangered species and has been applied primarily to aquatic ERAs. Models using ICE are available for predicting acute toxicity values for birds and mammals (Awkerman et al., 2008, 2009; Raimondo et al., 2007), but are less robust than for aquatic species, and further limited by fewer test species and insufficient data for chronic toxicity.
An SSD is a statistical approach in which a probability distribution is used to integrate toxicity values on a chemical for multiple species to visualize which species may be more sensitive and predict concentrations that are hazardous to a given percentage of species. The estimated threshold concentration is represented as a hazardous concentration (HCx) and represents the variation in the sensitivity of different species exposed to a chemical or the percent of species affected by a given chemical concentration. In ERAs, the HCx threshold has often been set at 5% to estimate a concentration that would protect 95% of species. Species sensitivity distributions have been a valuable tool used in ERAs and have been used to develop hazard endpoints protective of various taxa. While common for aquatic organisms, the paucity of data for terrestrial wildlife species has limited generation of such estimates for many contaminants. A wildlife species ranking relative to the laboratory surrogate species could be evaluated using an SSD to estimate a TRV. For example, Buekers et al. (2009) used the SSD approach to identify a protective concentration for Pb in avian blood for a suite of molecular to whole-organism endpoints. Notably, Pb inhibits the heme-biosynthetic enzyme, δ-aminolevulinic acid dehydratase (ALAD), at concentrations approximately one order of magnitude lower than concentrations affecting growth, reproduction, hematology, physiology, growth, and reproduction (Buekers et al., 2009). Thus, TRVs based on ALAD inhibition would tend to overprotect birds with respect to organism-level effects. Other examples of avian SSDs include chlorpyrifos (Moore et al., 2014) and malathion (Supporting Information: Figure S5), which was examined in estimating hazard threshold values for an endangered species risk assessment (USEPA, 2021b). A longstanding issue in the use of SSDs is that they require multiple studies on multiple species to provide acceptable confidence for use in regulatory decision-making. Sufficient wildlife data are not available for many chemicals to support SSD analyses, and such data may rarely be generated going forward as vertebrate testing becomes more limited. One option is to use quantitative structure–activity relationships (QSARs) and/or interspecies extrapolations using read-across methods to predict toxicity values for untested species and fill the data gaps to populate SSDs. Quantitative structure–activity relationship models estimate toxicity based on structural and physiochemical property components of a chemical. However, QSAR models are usually developed to predict chemical toxicities for broad taxonomic groups and generally in aquatic systems (e.g., fish, invertebrates). Additional work is needed to develop QSARs for wildlife. Awkerman et al. (2009) compared SSDs from measured acute mammalian toxicity data to SSDs for the same chemicals as those extrapolated from rat or mouse data using ICE models; although there was general agreement, the variation was likely greater than regulators and stakeholders would prefer. Thus, further advances in interspecies extrapolations are needed to support extrapolation-based SSDs.
Ecosystem services models
Ecosystem services can be used to identify, describe, and assign value to assessment endpoints and/or protection goals in ERAs (Forbes et al., 2017). As such, explicit ESs models that identify quantifiable metrics for ERAs are needed for management decisions. Data on ESs can be used to identify how anthropogenic chemicals contribute risks to ecosystem processes and components that concern stakeholders and society (Forbes et al., 2017; Maltby et al., 2018; Munns et al., 2016). An ESs framework can be used to assess multiple ecosystem components and evaluate trade-offs among different, and sometimes competing, services (Galic et al., 2018). The framework is also useful in risk communication because ESs are appreciated by decision-makers and stakeholders. In addition, ESs can be useful in economic analyses of management actions in the calculation of benefits to offset costs associated with risks.
A clear and strong advantage of applying ESs frameworks in ERAs is that potential impacts to receptors can be placed in broader ecological and societal contexts. Fundamental ecosystem functions, such as decomposition (Galic et al., 2018) or microbial function (Brandt et al., 2015), can effectively be considered in an ERA by using an ESs framework. Risks to wildlife can also be considered through the ESs lens. Perhaps the simplest example for wildlife is that many species provide provisioning services when harvested for human consumption (Golden et al., 2014). In addition, a recent review of the ESs provided by birds and mammals in the Pampas region, Argentina, showed that birds of prey and carnivorous mammals were recognized as providing regulating/maintenance services such as nutrient provisioning and pest control (Gorosábel et al., 2020). Gaston et al. (2018) systematically explored multiple ESs delivered by birds, emphasizing seed dispersal, nutrient transport, scavenging, pest control, and cultural services, and concluded that the abundance of bird species generally was positively related to ESs. Birds and other wild animals also provide myriad cultural services that include nonmaterial benefits related to esthetics, spirituality, educational, cultural heritage, and recreational values. Often, however, these cultural services can be challenging to valuate, which can preclude their inclusion in ERAs (Daniel et al., 2012).
While there is growing recognition that ESs frameworks in ERA have merit, at present, use of this approach is not widespread as an acceptable decision driver for risk management at polluted sites. Most examples are case studies or overviews suggesting that the framework will improve ecological relevance and allow risk assessors to address a broader range of meaningful assessment endpoints. While the veterinary pharmaceutical diclofenac has reduced vulture populations and thus their role in carcass “cleanup” (Ogada et al., 2012), additional data-driven examples that include ESs metrics are needed to incorporate more of the interests and values of society into ERAs.
Readiness, reliability, and relevance of methods that might enhance wildlife ERAs
The value of new technologies for wildlife ERAs should be explored on a case-by-case basis by risk assessors. While a rigorous weight of evidence analysis for each knowledge collection solution, data integration procedure, or model type was not undertaken by the toxicological effects assessment workgroup, our impressions of their readiness, reliability, and relevance are presented in Table 1. All these methodologies are data driven, and for most (exception ESs models), their readiness, reliability, and relevance for retrospective wildlife ERAs seem to be high for legacy organic contaminants and some metals. For prospective ERAs, methodology reliability is less certain but will be determined as their uses and applications become commonplace.
Methodology | Readiness | Reliability | Relevance |
---|---|---|---|
Systematic reviews | Ready and in use if data are available, laborious. | High, if uncertainties are evaluated and reported; repurposing data may not always be possible if open literature studies lack detail. | Supportive information to identify strengths and limitations of evidence at all levels of biological organization. |
Adverse outcome pathways | Some ready, require considerable effort to develop and determine if pathway(s) conserved and similarly affected among species. | Data on more toxicants and exposed organisms that show similar effects on specific AOPs are needed before extrapolations among species will be widely accepted. | Supportive information provided, but linkage of AOP findings at realistic concentrations to organismal- or population-level effects is needed. |
Dose–response curves | Ready and in use if data are available. | High, if statistically robust, confidence limits are acceptable, and methods and uncertainty are reported. | High, if used in population models (dose–response function) and in the range of statistically robust TRVs (ECx with acceptable confidence limits). |
Probabilistic approaches | Ready and in use if data are available. | High, if uncertainties are evaluated and reported. | High, improvement over HQs. |
Population modeling | Ready and in use if data are available; however, more training is needed; software readily available and being created for ERAs. | High, if the baseline version of model is calibrated to actual population trend data, and uncertainties and parameter sensitivity are evaluated; can help differentiate chemical effects from other factors. | High, improvement over HQs. |
Interspecific extrapolation | Tools available with varying degrees of sophistication for many chemicals and species, but can be laborious. | Reliable for well-studied chemicals and some species; less sophisticated tools carry greater uncertainty but an improvement over default approaches. | High, if supported by data for chemicals and species of interest. |
Ecosystem services models | Framework and approaches available, but not widely used for terrestrial vertebrate ERAs. | Too early to know, more case studies needed. | Potentially high, links to key human interests may facilitate acceptance. |
- Abbreviations: AOP, adverse outcome pathway; ECx, effect concentration at X% response; ERA, ecological risk assessment; HQ, hazard quotient; TRV, toxicity reference value.
CONCLUSION AND RECOMMENDATIONS
Characterizations of adverse effects in ERAs focused on wildlife have generally relied on toxicity data for survival, growth, and reproduction for new chemicals and pesticides. Such ERAs estimate risk to wildlife populations so that regulatory decision-makers can determine if protective measures are necessary. The situation for contaminated site ERAs is somewhat different, with some focus on sublethal individual-level responses. While exposure–response relationships for survival and reproduction will likely remain central to wildlife risk assessment in the near term, other endpoints at many levels of the biological organization have the potential to improve efficiency, reliability, and realism for the longer term. The value of new technologies for ERAs should be explored on a case-by-case basis using a weight-of-evidence approach.
New approach methodologies are increasingly being applied to regulatory decision-making for industrial chemicals and pesticides (Parish et al., 2020; Stucki et al., 2022). Frameworks for establishing confidence in NAMs have been proposed, with elements of fitness for purpose, biological relevance, technical characterization, data integrity and transparency, and independent review (Parish et al., 2020; van der Zalm et al., 2022). However, most framework activities have been principally focused on human health risk assessments, with the “environment” (encompassing wildlife) merely mentioned in passing. This likely reflects societal attention to health and well-being of mankind first, despite the merits of One Health perspectives that are embraced by many. Some professional and organizational barriers, including the need for common language and consensus on validation requirements, metrics, data format, and uncertainty in the use of NAMs in ecotoxicology, will require international cooperation, trust, and data sharing (Mondou et al., 2020). While the value of NAMs to ecotoxicological hazard assessment has been acknowledged for some time (Lillicrap et al., 2016), their development seems to have targeted aquatic species and phylogenetically lower forms, and applications for terrestrial wildlife are less apparent (Ceger et al., 2022).
Integrations of data and qualitative evidence from in silico, in vitro (mechanistic), laboratory animal investigations (including new wildlife models), and field data provide robust opportunities to use relevant toxicity endpoints in deriving TRVs for next-generation wildlife ERAs. Extrapolations of measures of effect to populations, communities, and ecosystems can be greatly enhanced using tools such as population models and Bayesian networks. It seems likely that guidance from nonanimal-based data will be used by risk assessors to provide accurate predictions and reduce uncertainties for managers making decisions on the risks of chemicals and addressing requirements for remediation of contaminated sites. Substantial investment in method validations, including both interlaboratory reproducibility and utility for predicting apical endpoints, will be needed before suborganismal assays can be applied with confidence in ERAs. In addition to the 3Rs of (1) reduced animal testing, (2) refinement of tests, and (3) replacement of animal use with in vitro methods, many advocate prioritizing (4) reproducibility, (5) relevance, and (6) regulatory acceptance (Lillicrap et al., 2016). Supporting this 6Rs approach will require more animal testing in the near term, in the interest of less animal testing in the future. While the investment to address these activities may be substantial, they are dwarfed by the animal and monetary resources required to evaluate chemically mediated environmental mishaps and to undertake associated restorations.
Interspecies extrapolation of toxicity data is proving to be an attractive application of bioinformatics and NAMs. Applying genetic information to classify avian sensitivity to AhR-mediated toxicity serves as an example of the payoffs on investments in this area. As in the case of extrapolation across biological levels of organization, the development of effective interspecies extrapolations will also continue to require in vivo toxicity testing with a range of species for validations.
Despite substantial advances in knowledge generation and evaluation, the current era presents time-sensitive challenges to those who assess and manage risks to wildlife. There will remain an ongoing need for data from basic and applied research to underpin evolving methods for increasingly reliable ERAs. At present, it is not possible to predict all ecologically relevant effects from simple endpoint responses generated from in vitro molecular and cellular assays, in vivo toxicity tests, and models predicting organism-, population-, or community-level responses. Our workgroup identified increased realism and ecological relevance as top priorities for improving wildlife effects assessment in the 21st century. Toward this end, we recommend increased attention to linkages of nonstandard molecular- to organism-level endpoints to standard toxicity tests, to effects on wildlife at the population level, and to interactions at the community and ecosystem levels with a goal of preventing harmful effects of contaminants on wildlife populations. Through a futuristic lens of optimism, environmentally relevant mechanisms of toxic action and their consequences at higher levels of biological organization will be comprehensively understood in the 21st century, resulting in more robust estimates of chemical risk to wildlife populations and their supporting habitats.
AUTHOR CONTRIBUTION
Barnett A. Rattner: Conceptualization; funding acquisition; project administration; writing—original draft; writing—review and editing. Thomas G. Bean: Conceptualization; writing—original draft; writing—review and editing. Val R. Beasley: Conceptualization; writing—original draft; writing—review and editing. Philippe Berny: Conceptualization; writing—original draft; writing—review and editing. Karen M. Eisenreich: Conceptualization; writing—original draft; writing—review and editing. John E. Elliott: Conceptualization; writing—original draft; writing—review and editing. Margaret L. Eng: Conceptualization; writing—original draft; writing—review and editing. Phyllis C. Fuchsman: Conceptualization; writing—original draft; writing—review and editing. Mason D. King: Conceptualization; writing—original draft; writing—review and editing. Rafael Mateo: Conceptualization; writing—original draft; writing—review and editing. Carolyn B. Meyer: Conceptualization; writing—original draft; writing—review and editing. Jason M. O'Brien: Conceptualization; writing—original draft; writing—review and editing. Christopher J. Salice: Conceptualization; writing—original draft; writing—review and editing.
ACKNOWLEDGMENT
The authors thank Mark S. Johnson, Wayne G. Landis, and Bradley E. Sample for comments, suggestions, and for providing some published information used in this article. A draft of this manuscript was critically reviewed by Jennifer H. Olker. The SETAC Technical Workshop “Wildlife Risk Assessment in the 21st Century: Integrating Advancements in Ecology, Toxicology, and Conservation” and the contribution of Barnett A. Rattner to this manuscript were supported in part by the Contaminant Biology Program of the US Geological Survey Ecosystems Mission Area. Funding for the workshop was provided by the United States Geological Survey, Teck Resources Ltd., and SETAC. Thomas G. Bean is an employee of FMC Corporation, a manufacturer of pest control technology and products. The other authors have no known competing financial interests or personal relationships that could have influenced the work reported in this paper.
DISCLAIMER
Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the US Government. This article may be the work product of an employee or group of employees of the USDOD, USEPA, or other nongovernment organizations; however, the statements, opinions, or conclusions contained therein do not necessarily represent the statements, opinions, or conclusions of these agencies, the United States government, or other organizations, but do represent the views of the US Geological Survey. This article has been peer-reviewed and approved for publication consistent with USGS Fundamental Science Practices (https://pubs.usgs.gov/circ/1367/).
Open Research
DATA AVAILABILITY STATEMENT
This review is based on published information, and data provided in the manuscript and the Supporting Information. Some data, metadata, and calculation tools are available from author Carolyn B. Meyer ([email protected]).