fitswavecav {seawaveQ} | R Documentation |
Fit seasonal wave and continuous ancillary data for trend analysis
Description
Function to prepare data and fit the seawaveQ model.
Usage
fitswavecav(
cdat,
cavdat,
tanm = "trend1",
pnames,
yrstart = 0,
yrend = 0,
tndbeg = 0,
tndend = 0,
iwcav = c("none"),
dcol = "dates",
qwcols = c("R", "P"),
mclass = 1,
numk = 4,
alpha = 0.1,
bootRCS = FALSE,
nboot = 1000,
plotfile = FALSE,
textfile = FALSE
)
Arguments
cdat |
is the concentration data |
cavdat |
is the continuous (daily) ancillary data |
tanm |
is a character identifier that names the trend analysis run. It is used to label output files. |
pnames |
are the parameters (water-quality constituents) to analyze (omit the starting character, for example for sulfate data indicated by P00945, enter "00945"). |
yrstart |
is the starting year of the analysis (treated as January 1 of that year). Zero means the start date will be determined by the start date of cavdat, the continuous ancillary data. |
yrend |
is the ending year of the analysis (treated as December 31 of that year). Zero means the end date will be determined by the end date of cavdat, the continuous ancillary data. |
tndbeg |
is the beginning (in whole or decimal years) of the trend period. Zero means the begin date will be the beginning of the concentration data, cdat. |
tndend |
is the end of the trend (treated as December 31 of that year). Zero means the end date will be the end of the concentration data, cdat. |
iwcav |
is a character vector indicating which continuous ancillary variables to include, if none are used for analysis, use iwcav=c("none"). |
dcol |
is the column name for the dates, should be the same for both cdat and cavdat |
qwcols |
is a character vector with the beginning of the column headers for remarks code (default is R), and beginning of column headers for concentration data (default is P for parameter). |
mclass |
indicates the class of model to use. A class 1 model is the traditional SEAWAVE-Q model that has a linear time trend. A class 2 model is a newer option for longer trend periods that uses a set of restricted cubic splines on the time variable to provide a more flexible model. |
numk |
is the number of knots in the restricted cubic spline model. The default is 4, and the recommended number is 3–7. |
alpha |
is the significance level or alpha values for statistical significance and confidence intervals. |
bootRCS |
is a logical value indicating whether or not to perform block bootstrapping for an attained significance level for the trend with restricted cubic splines. No bootstrapping is performed for the linear trend model. |
nboot |
is the number of bootstrap replicates. A large number, 10,000, is recommended, but this takes a long time. |
plotfile |
is by default FALSE. True will write pdf files of plots to the user's file system. |
textfile |
is by default FALSE. True will write text output files to the user's file system. These files are useful for detailed model comparisons, documenting session information, and for model archives. |
Format
The data frame returned as the first element of the output list has
one row for each chemical analyzed and the number of columns depends on the
number of continuous ancillary variables used. The general format is as
follows:
pname | character | parameter analyzed |
mclass | numeric | a value of 1 or 2 |
jmod | numeric | the choice of pulse input function, an integer 1--14 |
hlife | numeric | the model half-life in months, an integer, 1 to 4 months |
cmaxt | numeric | the decimal season of maximum concentration |
scl | numeric | the scale factor from the
survreg.object |
loglik | numeric | the log-likelihood for the model |
cint | numeric | coefficient for model intercept |
cwave | numeric | coefficient for the seasonal wave |
ctnd[alpahnumeric] | numeric | coefficient(s) for the trend component(s) of model |
c[alphanumeric] | numeric | 0 or more coefficients for the continuous ancillary variables |
seint | numeric | standard error for the intercept |
sewave | numeric | standard error for the seasonal wave |
setnd[alphanumeric] | numeric | standard error for the trend component(s) |
se[alphanumeric] | numeric | 0 or more standard errors for the continuous ancillary variables |
pvaltnd[alphanumeric] | numeric | the p-value for the trend component(s) |
The data frame returned as the sixth element of the output list has
one row for each chemical analyzed. The general format for linear models is
described in pesticideTrendCalcs
. The format for restricted
cubic spline models is as described as follows:
pname | character | parameter analyzed |
mclass | numeric | A value of 1 or 2 |
baseConc | numeric | the concentration at the beginning of the trend period |
endCon | numeric | the concentration at the end of the trend period |
rcsctndPpor | numeric | the concentration trend in percent over the trend period |
rcsctndOrigPORPercentBase | numeric | the conc. trend in original units over period of record (calc. based on percent per year and base conc.) |
pvalrcstnd | numeric | p-value, attained significance level, based on bootstrapping |
ctndlklhd | numeric | trend likelihood |
Details
Fits the seawaveQ model (Vecchia and others, 2008) using a seasonal wave and continuous ancillary variables (streamflow anomalies and other continuous variables such as conductivity or sediment) to model water quality. The version in the 2.0.0 update to the R package has an option to use restricted cubic splines as a more flexible definition of the temporal trend.
Value
A PDF file containing plots of the data and modeled
concentration, a text file containing a summary of the survival
regression call for each model selected, and a list. The first element
of the list is a data frame described under format. The second element
of the list is the summary of the survival regression call. The third
element is the observed concentration data (censored and uncensored).
The fourth element is the concentration data predicted by the model.
The fifth element provides summary statistics for the predicted
concentrations. The sixth element is a data frame that provides a summary of
the trends with the columns pname (parameter name), mclass (a value of 1,
indicating a linear trend model, a value of 2 for models using restricted
cubic splines), and the columns describing the trends. For linear models
see pesticideTrendCalcs
. For models with restricted cubic
splines, see the format section. See Ryberg and York (2020) for additional
details.
Note
The assumed data format is one with columns for water-quality
concentration values and a related column for qualification of
those values, such as in the case of left-censored values less
than a particular value. For example, a water-quality sample
was collected and the laboratory analysis indicated that the
concentration was less than 0.01 micrograms per liter. The
USGS parameter code for simazine is 04035 (U.S. Geological Survey,
2018b). When the data are retrieved through the National Water
Information System: Web Interface
(https://waterdata.usgs.gov/nwis; U.S. Geological Survey, 2018a),
the concentration values are in a column labeled P04035 and the
qualification information, or remark codes, are in a column labeled
R04035. To use this function, the argument pnames would be the unique
identifier for simazine values and qualifications, 04035, and the
qwcols argument would be c("R", "P") to indicate that the
qualification column starts with an R and the values column starts with
a P.
Other users may have data in different formats that can be
modified to use with this function. For example, a user may have
concentration values and qualification codes in one column, such
as a column labeled simazine with the values 0.05, 0.10, <0.01,
<0.01, and 0.90. In this case, the less thans and any other
qualification codes should be placed in a separate column. The
column names for the qualification codes and the concentration values
should be the same with the exception of different beginning
letters to indicate which column is which. The columns could be
named Rsimazine and Psimazine. Then the argument pnames = "simazine"
and the argument qwcols = c("R", "P").
Users should exercise caution when their water-quality data have multiple censoring limits and may want to recensor the data to a single censoring level. Censoring and recensoring issues are discussed in the text and Appendix 1 of Ryberg and others (2010).
None of the variations of SEAWAVE is a simple model. The model complexity increases with flow anomalies and with the addition of restricted cubic splines. As the number of parameters in the model increases or the degree of censoring increases, the sample size must also increase. See Ryberg and others (2020) for more details on sample size.
Author(s)
Aldo V. Vecchia and Karen R. Ryberg
References
Ryberg, K.R. and York, B.C., 2020, seawaveQ—An R package providing a model and utilities for analyzing trends in chemical concentrations in streams with a seasonal wave (seawave) and adjustment for streamflow (Q) and other ancillary variables: U.S. Geological Survey Open-File Report 2020–1082, 25 p., with 4 appendixes.
Ryberg, K.R., Vecchia, A.V., Martin, J.D., and Gilliom, R.J., 2010, Trends in pesticide concentrations in urban streams in the United States, 1992–2008: U.S. Geological Survey Scientific Investigations Report 2010–5139, 101 p., https://pubs.usgs.gov/sir/2010/5139/.
U.S. Geological Survey, 2018a, National Water Information System: Web Interface, accessed July 7, 2018, at https://waterdata.usgs.gov/nwis.
U.S. Geological Survey, 2018b, Parameter code definition: National Water Information System: Web Interface, accessed July 18, 2018, at https://nwis.waterdata.usgs.gov/usa/nwis/pmcodes.
Vecchia, A.V., Martin, J.D., and Gilliom, R.J., 2008, Modeling variability and trends in pesticide concentrations in streams: Journal of the American Water Resources Association, v. 44, no. 5, p. 1308–1324, https://dx.doi.org/10.1111/j.1752-1688.2008.00225.x.
See Also
The functions that fitswavecav
calls internally:
prepData
and fitMod
.
Examples
data(swData)
modMoRivOmaha <- combineData(qwdat = qwMoRivOmaha, cqwdat = cqwMoRivOmaha)
myfitLinearTrend <- fitswavecav(cdat = modMoRivOmaha, cavdat = cqwMoRivOmaha,
tanm = "myfitLinearTrend", pnames = c("04035", "04037", "04041"), yrstart = 1995,
yrend = 2003, tndbeg = 1995, tndend = 2003, iwcav = c("flowa30", "flowa1"),
dcol = "dates", qwcols = c("R", "P"))
# trend model results
myfitLinearTrend[[1]]
# example regression call
myfitLinearTrend[[2]][[1]]
# first few lines of observed concentrations
head(myfitLinearTrend[[3]])
# first few lines of predicted concentrations
head(myfitLinearTrend[[4]])
# summary statistics for predicted concentrations
myfitLinearTrend[[5]]
# summary of trends
myfitLinearTrend[[6]]
myfitRCSTrend <- fitswavecav(cdat = modMoRivOmaha, cavdat = cqwMoRivOmaha,
tanm = "myfitRCSTrend", pnames = c("04035", "04037", "04041"), yrstart = 1995,
yrend = 2003, tndbeg = 1995, tndend = 2003, iwcav = c("flowa30", "flowa1"),
dcol = "dates", qwcols = c("R", "P"), mclass = 2, numk = 4, bootRCS = FALSE,
plotfile = FALSE, textfile = FALSE)