Effect Modeling Quantifies the Difference Between the Toxicity of Average Pesticide Concentrations and Time‐Variable Exposures from Water Quality Monitoring

Synthetic chemicals are frequently detected in water bodies, and their concentrations vary over time. Water monitoring programs typically employ either a sequence of grab samples or continuous sampling, followed by chemical analysis. Continuous time‐proportional sampling yields the time‐weighted average concentration, which is taken as proxy for the real, time‐variable exposure. However, we do not know how much the toxicity of the average concentration differs from the toxicity of the corresponding fluctuating exposure profile. We used toxicokinetic–toxicodynamic models (invertebrates, fish) and population growth models (algae, duckweed) to calculate the margin of safety in moving time windows across measured aquatic concentration time series (7 pesticides) in 5 streams. A longer sampling period (14 d) for time‐proportional sampling leads to more deviations from the real chemical stress than shorter sampling durations (3 d). The associated error is a factor of 4 or less in the margin of safety value toward underestimating and an error of factor 9 toward overestimating chemical stress in the most toxic time windows. Under‐ and overestimations occur with approximate equal frequency and are very small compared with the overall variation, which ranged from 0.027 to 2.4 × 1010 (margin of safety values). We conclude that continuous, time‐proportional sampling for a period of 3 and 14 d for acute and chronic assessment, respectively, yields sufficiently accurate average concentrations to assess ecotoxicological effects. Environ Toxicol Chem 2020;39:2158–2168. © 2020 The Authors. Environmental Toxicology and Chemistry published by Wiley Periodicals LLC on behalf of SETAC.


INTRODUCTION
Synthetic chemicals such as pesticides, pharmaceuticals, biocides, industrial chemicals, and veterinary medicines either are released directly into the environment or reach the environment after their intended use and disposal. They are often termed "micropollutants" because they occur in the range of micrograms per liter (Schwarzenbach et al. 2006). The interplay of intermittent use patterns and environmental fate processes results in highly time-variable exposure concentrations in water bodies, in particular for pesticides in small streams (Kreuger 1998;Leu et al. 2004;Gilliom et al. 2007;Wittmer et al. 2010;Spycher et al. 2018;Willkommen et al. 2019). Assessing the risk to the environment of these chemicals requires measuring their presence in water bodies and establishing their temporal concentration profile. Many countries maintain regular water quality monitoring programs for this purpose, their practical implementation often delegated to regional or local authorities. In theory, one could propose to automatically take hourly samples to achieve high temporal resolution; however, in practice this would quickly result in too many samples (e.g., 720 samples per site per month). The analytical chemistry procedures required to extract micropollutants are costly and not feasible for thousands of samples. Therefore, monitoring strategies typically either use a sequence of grab samples (e.g., European Community 2000;Baas et al. 2016) or continuous sampling where samples are pooled over a certain time period, followed by chemical analysis of the pooled sample for the whole sampling duration (e.g., Moschet et al. 2014;Spycher et al. 2018). Continuous sampling (e.g., with on-site autosamplers) is more likely to capture peak concentrations of short duration than other methods, and these short peaks can be important (Ashauer and Brown 2013;Stehle et al. 2013).
Continuous time-proportional sampling of water bodies has been proposed as one strategy to capture the highly timevariable exposure of aquatic organisms to micropollutants (Spycher et al. 2018). Assuming that the chemicals do not degrade in the sampler and the concentration of exposure peaks is not diluted to concentrations below the limit of quantification, this method will yield the time-weighted average (TWA) concentration over the sampling duration. If sampling times are equally spaced, this corresponds simply to the average concentration. This average concentration is then compared with ecotoxicity data to approximate the chemical stress posed by such exposures (e.g., Spycher et al. 2018). The average concentration measured in the pooled sample is taken as a representative proxy for the real, time-variable exposure. However, we do not know how much the toxicity of the average concentration differs from the toxicity of the corresponding fluctuating exposure profile. In other words, is the realistic, time-variable exposure more or less stressful for aquatic organisms than the exposure to the corresponding average concentration? We aim to answer that question by quantifying how much the toxicity of the average concentration differs from the toxicity of the corresponding fluctuating exposure profile for a range of substances, species, and water bodies.

Toxicity modeling approach
To answer this question, we carried out computer-based toxicity modeling. We used toxicokinetic-toxicodynamic (TKTD) models Jager et al. 2014) for Gammarus pulex, Daphnia magna, and Pimephales promelas as well as simple exponential population growth models for different species of Lemna and algae (see Supplemental Data, Appendix C). These models account for the time course of toxicity and were selected because suitable models and data for parameterizing them were available. The models were calibrated using toxicity test data, and all models included processes which allowed the organisms to recover during periods of lower exposure concentrations. This approach enables modeling the time course of toxicity with better accuracy than using average concentrations. The toxicity of an average concentration only equals the toxicity of the corresponding fluctuating concentration when the organism's toxicokinetics and toxicodynamics are much faster than the timescale of the exposure fluctuations (Rozman and Doull 2000;Ashauer and Brown 2013). This situation is called "Haber's law" or "time effect reciprocity" (Figure 1). At the population level the same principle holds if population dynamics are much faster than the fluctuations of the exposure profile (Ashauer and Brown 2013).
We assume that the effect models used in the present study represent the real time course of toxicity with sufficient accuracy (Ashauer and Brown 2013;Ashauer et al. 2015;Focks et al. 2018;Jager and Ashauer 2018b;Ockleford et al. 2018;Zimmer et al. 2018). Based on that, we simulate the toxic effects of the fluctuating exposure profile and its corresponding average concentration and compare the 2. This predicts how the toxicity of the average concentration (representing timeproportional sampling) might differ from the real toxicity resulting from fluctuating concentrations (based on real monitoring data from Swiss water bodies; see Results). In other words, we provide an evaluation of time-proportional water pollution sampling by approximating the systematic error due to divergence from Haber's law. By doing these simulations and comparisons for different durations (3 and 14 d), we can also assess the consequences of shorter compared with longer sampling durations.

Margin of safety
When simulating the toxicity of exposure profiles from monitoring data, the resulting effects are often 0%, when the concentrations are below the toxic range, or 100%, when the concentrations are above the toxic range (see Results). This is because real concentrations in water bodies vary across several orders of magnitude (Moschet et al. 2014;Spycher et al. 2018). Nothing could be learned about the difference in toxicity caused by different exposure profiles from such all-or-nothing simulation results because we need to assess the differences in toxicity when comparing average versus fluctuating concentrations. To overcome this problem, we use the margin of safety concept . This is built on the idea that every concentration profile can be multiplied with a factor, the margin of safety, to bring it into the toxic range. This factor is larger than 1 if the concentration profile was originally below toxic levels, or it can be smaller than 1 if the original concentrations were higher than the toxic range. We define the margin of safety as the factor that, when used to multiply the exposure concentrations, results in 50% effects at the end of the simulated exposure profile. This factor is also known as the "multiplication factor" to an entire specific exposure profile that causes 50% lethality (LP50; Ockleford et al. 2018). In simulations where the endpoint is survival (mortality, acute toxicity), the margin of safety would be the factor to multiply concentrations in a monitoring time series with so that after simulating, for example, 14 d of exposure 50% of G. pulex (or P. promelas) would be dead. Similarly, we used 50% reduction in growth or reproductive output of D. magna as well as 50% reduction in biomass produced by Lemna or algae populations to calculate the margin of safety. The margin of safety is calculated for simulations with the average concentration (representing time-proportional sampling) and simulations with the fluctuating exposure profile (representing real exposures). Then, the 2 margin of safety values were compared to assess if the time-proportional sampling over-or underestimates the chemical stress and by how much. Note that the margin of safety is similar to the toxicityexposure ratios traditionally used in plant protection product authorization and can be interpreted the same way.

Monitoring data
The aquatic monitoring data originate from a monitoring campaign conducted in 2015 in 5 small water bodies (Table 1) situated in intensively used agricultural areas in Switzerland. Details on the water bodies and the monitoring can be found in Doppler et al. (2017) and Spycher et al. (2018). In short, samples were taken from March to August 2015, and automated samplers took half-day time-proportional composite samples. To reduce the effort of chemical analysis, samples during periods of dry weather were combined before analysis, assuming that concentration peaks are rainfall-driven. This results in samples with a maximum temporal resolution of 12 h. Chemical analysis was performed by liquid chromatography mass spectrometry (high-resolution tandem mass spectrometry; Orbitrap-Technology) after enrichment via online solid-phase extraction (Spycher et al. 2018).

Simulated time windows
For each monitoring data set, we simulated moving time windows of 14 or 3 d duration because these have been proposed as durations of time-proportional water sampling in Switzerland ). An analysis by Spycher et al. (2018) further showed that averaging times should not exceed 3 d for acute risk assessments for pesticides in small streams because peak exposure might be diluted to concentrations below the limit of quantification or well below acute water quality criteria like the maximum acceptable concentration under the European Union Water Framework Directive. The margin of safety was calculated for the resulting effects at the end of each time window. Each sampling time is the starting point of a single time window. Therefore, 2 time windows were created for each day because the exposure data have a time step of 12 h.

Substances studied
Although the monitoring comprised more than 100 substances, the concentration profiles of 7 pesticides were selected for the effect modeling (Table 2) based on the availability of ecotoxicity data suitable for TKTD model calibration. They cover 5 different chemical classes and modes of action (Table 2) as well as 5 different monitoring sites and 5 ecotoxicological endpoints (Table 3).
Endpoint mortality in G. pulex and P. promelas For G. pulex and P. promelas, we simulated toxic effects on survival using the TKTD model general unified threshold model of survival (GUTS; Jager et al. 2011). More specifically, we used the limit cases GUTS scaled internal concentration (SIC) with stochastic death and individual tolerance (Nyman et al. 2012;Ashauer et al. 2016;Jager and Ashauer 2018b). Because the model calibration to toxicity data for GUTS-SIC-individual tolerance resulted in much poorer quality of fit than for GUTS-SICstochastic death in G. pulex for chlorpyrifos (-log-likelihood stochastic death = 942, individual tolerance = 1004) and carbendazim (-log-likelihood stochastic death = 737, individual tolerance = 1006), we used only GUTS-SIC-stochastic death for this species. For P. promelas, we used GUTS-SIC-stochastic death and GUTS-SIC-individual tolerance and averaged the percentage of time data from the 2 models. Five substances (carbendazim, chlorpyrifos, diazinon, dimethoate, and imidacloprid) were assessed for G. pulex in 3 locations (Table 3), depending on the availability of monitoring data. Two substances were assessed for P. promelas in 2 locations (Table 3). Two durations of the time window (3 and 14 d) were assessed. The toxicity data for calibration of GUTS for G. pulex originated from previously published studies (chlorpyrifos, Ashauer et al. 2007; diazinon, imidacloprid, Nyman et al. 2013) and previously unpublished acute toxicity experiments (carbendazim, dimethoate) which were carried out following established protocols ). The toxicity data for calibration of GUTS for P. promelas also originated from the scientific literature (Geiger et al. 1988(Geiger et al. , 1990. See more details on the GUTS model calibration and use in Supplemental Data, Appendix A, and previously unpublished toxicity data (G. pulex, carbendazim, dimethoate) in Supplemental Data, Appendix D.

Endpoints growth and reproduction in D. magna
For D. magna, we simulated the effects of imidacloprid on 2 sublethal endpoints: organism growth ("length") and reproductive output ("offspring") using a dynamic energy budget model with a toxicity module (DEBtox; Jager and Zimmer 2012). The model was calibrated based on published data (Agatz et al. 2013), and we simulated moving 14-d time windows in 3 locations (Eschelisbach, La Tsatonire, and Weierbach). For D. magna only the simulations with the 14-d time window are discussed because that duration is more relevant for the chronic endpoints modeled in D. magna than the 3-d time window. See more details on the DEBtox model parameterization and simulations in Supplemental Data, Appendix B.

Endpoint population biomass in duckweed and algae
To assess responses to different exposure profiles of herbicides, we simulated effects on biomass growth for metazachlor and Lemna gibba, diuron and Lemna minor, metazachlor and Scenedesmus subspicatus, as well as diuron and the blue-green alga Synechococcus sp. Simulations for metazachlor used exposure data from Mooskanal, La Tsatonire, and Weierbach, whereas simulations for diuron used exposure data from Eschelisbach, La Tsatonire, and Weierbach. The effect model was a simple exponential population model for biomass growth as is also used to analyze standard toxicity tests with algae and Lemna. We assessed 3-and 14-d moving time windows.
The simulations were based on published growth rate data. For metazachlor the data were taken from the draft assessment report (European Food Safety Authority 2005). The 7-d growth rates for L. gibba derive from Scheerbaum (2000a). The 3-d growth data for the alga Scenedesmus subspicatus derive from Scheerbaum (2000b). The 3-d growth rates for diuron and L. minor were published by Drost (2011). And the 3-d growth rates for the blue-green alga Synechococcus sp. were taken from Devilla et al. (2005).

Detailed documentation of modeling
The modeling work is described in more detail in 3 appendices available as Supplemental Data: A (G. pulex and P. promelas), B (D. magna), and C (Lemna sp. and algae).

RESULTS
By how much does time-proportional sampling over-or underestimate chemical stress?
To answer this question, we analyzed the margin of safety values from the most toxic time windows, that is, those with the smallest margin of safety values (Figure 2). The most toxic time window varied depending on sampling site, compound, and species (see Supplemental Data. Appendices A-C).
Generally, TWA and fluctuating concentration-based margin of safety values correlate well across orders of magnitude (Figure 2). For G. pulex and P. promelas (Figure 2A), we observe that during the most toxic exposure periods the TWA method generally predicts higher margin of safety values; that is, it underestimates chemical stress. On average, the margin of safety is a factor of 1.4 higher with the TWA method during the most toxic periods, with the difference being more pronounced for 14 d (factor 1.7 higher) than for 3 d (factor 1.1 higher). This means that a time-proportional sampling strategy would underestimate the chemical stress by a factor of 1.4 on average (factor 1.7 when sampling 14 d).
For D. magna ( Figure 2B), we observe that during the most toxic exposure periods the TWA method predicts higher margin of safety values for "offspring" but lower margin of safety values for "length"; that is, it underestimates chemical stress for effects on D. magna offspring but overestimates chemical stress for effects on D. magna length. On average, the margin of safety calculated with the TWA method for length is 0.51 times the value calculated for fluctuating exposure. For effects on offspring, the TWA method calculates 1.1 times the value of the fluctuating exposure (14-d time window). Hence, time-proportional sampling overestimates chemical stress for effects on growth and slightly underestimates chemical stress for effects on offspring.
For Lemna ( Figure 2C) and algae ( Figure 2D), we observe that the margin of safety values calculated with the TWA method are mostly equal to those calculated with the fluctuating concentrations (outlier: Weierbach and diuron), and sometimes the TWA method overestimates chemical stress (i.e., it errs on the conservative side). On average, the margin of safety is a factor of 0.8 lower with the TWA method during the most toxic periods for Lemna with the 3-d time window and equal with the fluctuating concentrations for the 14-d time window. For algae, the margin of safety is a factor of 0.9 lower on average with the TWA method during the most toxic periods with the 3-and the 14-d time windows.
In summary, these findings suggest that time-proportional sampling would underestimate chemical stress for G. pulex, D. magna, and P. promelas and slightly overestimate chemical stress for Lemna and algae. The largest differences, during the most toxic time windows, were found for the chemical stress caused by diazinon on P. promelas in Weierbach, where the margin of safety for the 14-d TWA was 4 times higher than the margin of safety for the corresponding fluctuating concentration profile and for chemical stress of diuron in Lemna in Weierbach, where the margin of safety for the 3-d TWA was 9 times lower than the margin of safety of the corresponding fluctuating concentration profile. However, the chemical stress, approximated as margin of safety values, ranges over 12 orders of magnitude, whereas the differences between the timeproportional sampling method and the real, fluctuating exposure are comparatively small. In other words, a difference in the margin of safety value of <1 order of magnitude is small when margin of safety values range over 12 orders of magnitude (from 0.027 to 2.4 × 10 10 ; see also Supplemental Data). The margin of safety value is a proxy for chemical stress. Our analysis shows that the chemical stress in small Swiss streams can vary over 12 orders of magnitude depending on location, compound, and species. In this context the calculated error of the time-proportional sampling method (i.e., the difference in chemical stress from the real exposure profile) is relatively small.

How often does time-proportional sampling over-or underestimate chemical stress?
To answer this question, we analyzed the margin of safety values from all time windows over the whole length of the monitoring profiles.
The simulations with gammarids and fish show that during a 14-d period the TWA method deviates more often from the fluctuating exposure assessment than during a 3-d window (Figures 3 and 4). Both the 3-and the 14-d duration simulations show the time-proportional sampling more frequently underestimating chemical stress for G. pulex (17 and 32% of the time, respectively) than overestimating chemical stress (11 and 26% of the time, respectively; Figure 3). For P. promelas, the 3-d simulations underestimate chemical stress 13% of the time and overestimate chemical stress 4% of the time, and the 14-d simulations underestimate chemical stress 39% of the time and overestimate chemical stress 0% of the time (Figure 4). This is because peak exposures are driving the toxicity for diazinon and chlorpyrifos in P. promelas. Averaging out the peaks to lower TWA concentrations almost always reduces the toxicity for these 2 compounds in P. promelas; however, the absolute error is a factor of 2.1 or less (see margin of safety values, previous section).
The simulations with D. magna and imidacloprid show that over-and underestimations of chemical stress are frequent ( Figure 5). Overestimation of chemical stress in the timeproportional sampling occurs more frequently overall (44% of the time overestimated vs 24% underestimated). This is due to overestimation of chemical stress for effects on growth (58% of the time overestimated vs 12% underestimated), whereas effects on reproduction are generally underestimated by the time-proportional sampling (30% of the time overestimated vs 36% underestimated).
Simulations of biomass growth with duckweed and algae show that during a 14-d period the TWA method deviates more often from the fluctuating exposure assessment than during a 3-d window ( Figure 6). For duckweed (Lemna), we observe that the 3-d simulations overestimate chemical stress but that the 14-d simulations seem to underestimate chemical stress (28% of the time) more often than overestimating it (17% of the time). For algae, we observe that it depends more strongly on the location and substance sampled whether the 3-or 14-d duration over-or underestimates chemical stress compared with the other species. However, the timeproportional sampling would generally overestimate chemical stress for algae on average (

Insights from the margin of safety analysis and its limitations
To our knowledge, this is the first time that margin of safety  or LP50 (Ockleford et al. 2018) values were calculated for real exposure time series from environmental monitoring. A margin of safety value is a proxy for risk of toxicity and has a similar interpretation as the toxicity-exposure ratio or risk quotient (van Leeuwen and Vermeire 2007). Small margin of safety values, near 1 or below, indicate chemical stress; and large values indicate little or no chemical stress. Our analysis, using TKTD modeling to quantify effects on invertebrates and fish and population modeling for effects on algae and aquatic plants, confirmed the findings of Spycher et al. (2018), who calculated risk quotients for the same monitoring data and found several instances of potential risk to aquatic organisms. In the most toxic time windows, we calculated margin of safety values far below 1 for algae (La Tsatonire, diuron; Figure 2D), which indicates a potential risk to primary producers. We also found several instances of margin of safety values above but within an order of magnitude of 1. These indicate that for the effect endpoints and species combinations modeled, we would not expect effects from exposure to the modeled substances alone. Interestingly, we found a difference in margin of safety values for D. magna when calculated for effects on growth or reproduction, and we observed outlier data points for algae and Lemna with diuron in the Weierbach. The endpointspecific results for D. magna suggest that the physiological mode of action is important (see also Ashauer and Jager 2018), and it should also be noted that D. magna is less sensitive to imidacloprid than other crustacean species Roessink et al. 2013). Further, we found  that either under-or overestimation of risk for D. magna occurred frequently. We know already that TKTD model results are sensitive to changes in exposure patterns ), but we do not know if our findings would be observed for a wider range of exposure profiles too. Understanding the causes of these observations warrants further research.
It is important to bear in mind that we simulated only a small selection of sublethal effects in a very small number of species and locations. A more comprehensive understanding will become possible in the near future because more and more TKTD models parameterized for additional species are becoming available (e.g., Focks et al. 2018;Zimmer et al. 2018;Vighi et al. 2019;Dalhoff et al. 2020) as well as a better understanding of how species traits determine TK and TD (Van Den Berg et al. 2019;Dalhoff et al. 2020). Since our study, conducted mostly in 2017, new software for TKTD modeling with GUTS became available (Openguts 2019), which means that the problem with poor fits for GUTS-SIC-individual tolerance for 2 of our data sets could probably be overcome in the future. Also, we do not know how accurate our models are because TKTD model validation studies are rare (Augusiak et al. 2014;Jager and Ashauer 2018a). Most of our analysis was based on acute toxicity data, and all of our analyses concerned only exposure to a single substance at a time. Other possible effects of micropollutants on aquatic life, such as those on the endocrine system (Ankley and Villeneuve 2015), community-level impacts (Baert et al. 2016;Rohr et al. 2016), and epigenetic effects or behavioral changes (Nyman et al. 2013;Pedersen et al. 2013;Ford et al. 2018), might be ecologically relevant for chronic risk assessment of micropollutants as well as mixture toxicity (Ashauer et al. 2017;Escher et al. 2020) and interactions with other, nonchemical stressors (Holmstrup et al. 2010;Woodward et al. 2012  et al. 2017). Our species were selected based on the availability of appropriate TKTD models and the data required to parameterize these and do not necessarily include the most sensitive species for a given substance (Van Den Berg et al. 2019). However, these aspects were not considered because our aim was to compare TWA concentrations with time-proportional sampling.

Choosing a time-proportional sampling duration
Our aim was to evaluate water sampling strategies (Robertson and Roerish 1999;Ort et al. 2010), specifically from an ecotoxicological effects perspective, because previous research has focused only on comparing average concentrations derived with different sampling methods such as grab samples and passive sampling (Hyne et al. 2004;Fernández et al. 2014) or on theoretical (eco)toxicological considerations (Rozman and Doull 2000;Ashauer and Brown 2013). We found that when using a 3-d sampling duration our simulations showed considerably fewer time periods when time-proportional sampling either under-or overestimates chemical stress compared to 14-d sampling duration. Therefore, it is reasonable to use a 3-d sampling duration, or a similarly short sampling duration, for assessment of acute effects. For chronic effects assessment, a longer sampling duration is more realistic. With the exception of fish, the 14-d sampling duration simulated in the present study shows a balanced frequency of over-or underestimations of chemical stress and a small overall error compared with the magnitude of variation in chemical stress. Hence, using 14-d time-proportional sampling to measure micropollutants for assessment of chronic effects appears reasonable from the perspective of time-variable exposure, but more simulations with different compounds would be desirable to strengthen the evidence base, especially in the case of fish. Furthermore, the question answered in the present study is not the only consideration when setting the duration of the water sampling for the assessment of chronic risk-other aspects need to be considered too, for example, the duration of ecotoxicity studies (Baas et al. 2010;Ashauer and Brown 2013), the ecology of the relevant species (Calow et al. 1997;Rohr et al. 2016), and the risk-management context of the assessment (van Leeuwen and Vermeire 2007). Generally, the model simulations show that a longer sampling period (14 d) for time-proportional water sampling leads to more deviations from the real chemical stress than shorter sampling durations (3 d). However, the overall picture also shows that under-and overestimations of chemical stress occur with approximately equal frequency, although for some locations, compounds, and species we observed clear under-or overestimation of chemical stress (e.g., 14-d fish and 3-d Lemna). When under-or overestimations of chemical stress occur, they are very small compared with the overall variation in chemical stress observed. In this data set, the chemical stress to aquatic organisms from pesticides in Swiss streams ranged over 12 orders of magnitude, from 0.027 to 2.4 × 10 10 (margin of safety values, where small values indicate a chemical stress and large values indicate safety).

CONCLUSIONS
We found that the time-proportional sampling method is associated with an error of factor 4 or less in the margin of safety value toward underestimating chemical stress and an error of factor of 9 toward overestimating chemical stress. This error margin is small compared to 12 orders of magnitude variation in chemical stress across locations, compounds, and species. We conclude that time-proportional sampling is suitable for chemical water quality monitoring because it characterizes the concentration time series sufficiently well. Continuous time-proportional sampling for a period of 3 and 14 d for acute and chronic assessment, respectively, yields sufficiently accurate average concentrations to assess ecotoxicological effects.
Supplemental Data-The Supplemental Data are available on the Wiley Online Library at https://doi.org/10.1002/etc.4838.