fitswavecav {seawaveQ}R Documentation

Fit seasonal wave and continuous ancillary data for trend analysis

Description

Function to prepare data and fit the seawaveQ model.

Usage

fitswavecav(
  cdat,
  cavdat,
  tanm = "trend1",
  pnames,
  yrstart = 0,
  yrend = 0,
  tndbeg = 0,
  tndend = 0,
  iwcav = c("none"),
  dcol = "dates",
  qwcols = c("R", "P"),
  mclass = 1,
  numk = 4,
  alpha = 0.1,
  bootRCS = FALSE,
  nboot = 1000,
  plotfile = FALSE,
  textfile = FALSE
)

Arguments

cdat

is the concentration data

cavdat

is the continuous (daily) ancillary data

tanm

is a character identifier that names the trend analysis run. It is used to label output files.

pnames

are the parameters (water-quality constituents) to analyze (omit the starting character, for example for sulfate data indicated by P00945, enter "00945").

yrstart

is the starting year of the analysis (treated as January 1 of that year). Zero means the start date will be determined by the start date of cavdat, the continuous ancillary data.

yrend

is the ending year of the analysis (treated as December 31 of that year). Zero means the end date will be determined by the end date of cavdat, the continuous ancillary data.

tndbeg

is the beginning (in whole or decimal years) of the trend period. Zero means the begin date will be the beginning of the concentration data, cdat.

tndend

is the end of the trend (treated as December 31 of that year). Zero means the end date will be the end of the concentration data, cdat.

iwcav

is a character vector indicating which continuous ancillary variables to include, if none are used for analysis, use iwcav=c("none").

dcol

is the column name for the dates, should be the same for both cdat and cavdat

qwcols

is a character vector with the beginning of the column headers for remarks code (default is R), and beginning of column headers for concentration data (default is P for parameter).

mclass

indicates the class of model to use. A class 1 model is the traditional SEAWAVE-Q model that has a linear time trend. A class 2 model is a newer option for longer trend periods that uses a set of restricted cubic splines on the time variable to provide a more flexible model.

numk

is the number of knots in the restricted cubic spline model. The default is 4, and the recommended number is 3–7.

alpha

is the significance level or alpha values for statistical significance and confidence intervals.

bootRCS

is a logical value indicating whether or not to perform block bootstrapping for an attained significance level for the trend with restricted cubic splines. No bootstrapping is performed for the linear trend model.

nboot

is the number of bootstrap replicates. A large number, 10,000, is recommended, but this takes a long time.

plotfile

is by default FALSE. True will write pdf files of plots to the user's file system.

textfile

is by default FALSE. True will write text output files to the user's file system. These files are useful for detailed model comparisons, documenting session information, and for model archives.

Format

The data frame returned as the first element of the output list has one row for each chemical analyzed and the number of columns depends on the number of continuous ancillary variables used. The general format is as follows:

pname character parameter analyzed
mclass numeric a value of 1 or 2
jmod numeric the choice of pulse input function, an integer 1--14
hlife numeric the model half-life in months, an integer, 1 to 4 months
cmaxt numeric the decimal season of maximum concentration
scl numeric the scale factor from the survreg.object
loglik numeric the log-likelihood for the model
cint numeric coefficient for model intercept
cwave numeric coefficient for the seasonal wave
ctnd[alpahnumeric] numeric coefficient(s) for the trend component(s) of model
c[alphanumeric] numeric 0 or more coefficients for the continuous ancillary variables
seint numeric standard error for the intercept
sewave numeric standard error for the seasonal wave
setnd[alphanumeric] numeric standard error for the trend component(s)
se[alphanumeric] numeric 0 or more standard errors for the continuous ancillary variables
pvaltnd[alphanumeric] numeric the p-value for the trend component(s)

The data frame returned as the sixth element of the output list has one row for each chemical analyzed. The general format for linear models is described in pesticideTrendCalcs. The format for restricted cubic spline models is as described as follows:

pname character parameter analyzed
mclass numeric A value of 1 or 2
baseConc numeric the concentration at the beginning of the trend period
endCon numeric the concentration at the end of the trend period
rcsctndPpor numeric the concentration trend in percent over the trend period
rcsctndOrigPORPercentBase numeric the conc. trend in original units over period of record (calc. based on percent per year and base conc.)
pvalrcstnd numeric p-value, attained significance level, based on bootstrapping
ctndlklhd numeric trend likelihood

Details

Fits the seawaveQ model (Vecchia and others, 2008) using a seasonal wave and continuous ancillary variables (streamflow anomalies and other continuous variables such as conductivity or sediment) to model water quality. The version in the 2.0.0 update to the R package has an option to use restricted cubic splines as a more flexible definition of the temporal trend.

Value

A PDF file containing plots of the data and modeled concentration, a text file containing a summary of the survival regression call for each model selected, and a list. The first element of the list is a data frame described under format. The second element of the list is the summary of the survival regression call. The third element is the observed concentration data (censored and uncensored). The fourth element is the concentration data predicted by the model. The fifth element provides summary statistics for the predicted concentrations. The sixth element is a data frame that provides a summary of the trends with the columns pname (parameter name), mclass (a value of 1, indicating a linear trend model, a value of 2 for models using restricted cubic splines), and the columns describing the trends. For linear models see pesticideTrendCalcs. For models with restricted cubic splines, see the format section. See Ryberg and York (2020) for additional details.

Note

The assumed data format is one with columns for water-quality concentration values and a related column for qualification of those values, such as in the case of left-censored values less than a particular value. For example, a water-quality sample was collected and the laboratory analysis indicated that the concentration was less than 0.01 micrograms per liter. The USGS parameter code for simazine is 04035 (U.S. Geological Survey, 2018b). When the data are retrieved through the National Water Information System: Web Interface (https://waterdata.usgs.gov/nwis; U.S. Geological Survey, 2018a), the concentration values are in a column labeled P04035 and the qualification information, or remark codes, are in a column labeled R04035. To use this function, the argument pnames would be the unique identifier for simazine values and qualifications, 04035, and the qwcols argument would be c("R", "P") to indicate that the qualification column starts with an R and the values column starts with a P.

Other users may have data in different formats that can be modified to use with this function. For example, a user may have concentration values and qualification codes in one column, such as a column labeled simazine with the values 0.05, 0.10, <0.01, <0.01, and 0.90. In this case, the less thans and any other qualification codes should be placed in a separate column. The column names for the qualification codes and the concentration values should be the same with the exception of different beginning letters to indicate which column is which. The columns could be named Rsimazine and Psimazine. Then the argument pnames = "simazine" and the argument qwcols = c("R", "P").

Users should exercise caution when their water-quality data have multiple censoring limits and may want to recensor the data to a single censoring level. Censoring and recensoring issues are discussed in the text and Appendix 1 of Ryberg and others (2010).

None of the variations of SEAWAVE is a simple model. The model complexity increases with flow anomalies and with the addition of restricted cubic splines. As the number of parameters in the model increases or the degree of censoring increases, the sample size must also increase. See Ryberg and others (2020) for more details on sample size.

Author(s)

Aldo V. Vecchia and Karen R. Ryberg

References

Ryberg, K.R. and York, B.C., 2020, seawaveQ—An R package providing a model and utilities for analyzing trends in chemical concentrations in streams with a seasonal wave (seawave) and adjustment for streamflow (Q) and other ancillary variables: U.S. Geological Survey Open-File Report 2020–1082, 25 p., with 4 appendixes.

Ryberg, K.R., Vecchia, A.V., Martin, J.D., and Gilliom, R.J., 2010, Trends in pesticide concentrations in urban streams in the United States, 1992–2008: U.S. Geological Survey Scientific Investigations Report 2010–5139, 101 p., https://pubs.usgs.gov/sir/2010/5139/.

U.S. Geological Survey, 2018a, National Water Information System: Web Interface, accessed July 7, 2018, at https://waterdata.usgs.gov/nwis.

U.S. Geological Survey, 2018b, Parameter code definition: National Water Information System: Web Interface, accessed July 18, 2018, at https://nwis.waterdata.usgs.gov/usa/nwis/pmcodes.

Vecchia, A.V., Martin, J.D., and Gilliom, R.J., 2008, Modeling variability and trends in pesticide concentrations in streams: Journal of the American Water Resources Association, v. 44, no. 5, p. 1308–1324, https://dx.doi.org/10.1111/j.1752-1688.2008.00225.x.

See Also

The functions that fitswavecav calls internally:
prepData and fitMod.

Examples

data(swData)
modMoRivOmaha <- combineData(qwdat = qwMoRivOmaha, cqwdat = cqwMoRivOmaha)
myfitLinearTrend <- fitswavecav(cdat = modMoRivOmaha, cavdat = cqwMoRivOmaha, 
tanm = "myfitLinearTrend", pnames = c("04035", "04037", "04041"), yrstart = 1995, 
yrend = 2003, tndbeg = 1995, tndend = 2003, iwcav = c("flowa30", "flowa1"), 
dcol = "dates", qwcols = c("R", "P"))
# trend model results
myfitLinearTrend[[1]]
# example regression call
myfitLinearTrend[[2]][[1]]
# first few lines of observed concentrations
head(myfitLinearTrend[[3]])
# first few lines of predicted concentrations
head(myfitLinearTrend[[4]])
# summary statistics for predicted concentrations
myfitLinearTrend[[5]]
# summary of trends
myfitLinearTrend[[6]]
myfitRCSTrend <- fitswavecav(cdat = modMoRivOmaha, cavdat = cqwMoRivOmaha, 
tanm = "myfitRCSTrend", pnames = c("04035", "04037", "04041"), yrstart = 1995, 
yrend = 2003, tndbeg = 1995, tndend = 2003, iwcav = c("flowa30", "flowa1"), 
dcol = "dates", qwcols = c("R", "P"), mclass = 2, numk = 4, bootRCS = FALSE,
plotfile = FALSE, textfile = FALSE)

[Package seawaveQ version 2.0.2 Index]