anlzMWRoutlier {MassWateR} | R Documentation |
Analyze outliers in results file
Description
Analyze outliers in results file
Usage
anlzMWRoutlier(
res = NULL,
param,
acc = NULL,
fset = NULL,
type = c("box", "jitterbox", "jitter"),
group,
dtrng = NULL,
repel = TRUE,
outliers = FALSE,
labsize = 3,
fill = "lightgrey",
alpha = 0.8,
width = 0.8,
yscl = "auto",
ttlsize = 1.2,
runchk = TRUE,
warn = TRUE
)
Arguments
res |
character string of path to the results file or |
param |
character string of the parameter to plot, must conform to entries in the |
acc |
character string of path to the data quality objectives file for accuracy or |
fset |
optional list of inputs with elements named |
type |
character indicating |
group |
character indicating whether the summaries are grouped by month, site, or week of year |
dtrng |
character string of length two for the date ranges as YYYY-MM-DD, optional |
repel |
logical indicating if overlapping outlier labels are offset |
outliers |
logical indicating if outliers are returned to the console instead of plotting |
labsize |
numeric indicating font size for the outlier labels |
fill |
numeric indicating fill color for boxplots |
alpha |
numeric from 0 to 1 indicating transparency of fill color |
width |
numeric for width of boxplots |
yscl |
character indicating one of |
ttlsize |
numeric value indicating font size of the title relative to other text in the plot |
runchk |
logical to run data checks with |
warn |
logical to return warnings to the console (default) |
Details
Outliers are defined following the standard ggplot
definition as 1.5 times the inter-quartile range of each boxplot. The data frame returned if outliers = TRUE
may vary based on the boxplot groupings defined by group
.
Specifying type = "box"
(default) will produce standard boxplots. Specifying type = "jitterbox"
will produce boxplots with non-outlier observations jittered on top. Specifying type = "jitter"
will suppress the boxplots and show only the jittered points and the outliers.
Specifying group = "week"
will group the samples by week of year using an integer specifying the week. Note that there can be no common month/day indicating the start of the week between years and an integer is the only way to compare summaries if the results data span multiple years.
The y-axis scaling as arithmetic (linear) or logarithmic can be set with the yscl
argument. If yscl = "auto"
(default), the scaling is determined automatically from the data quality objective file for accuracy, i.e., parameters with "log" in any of the columns are plotted on log10-scale, otherwise arithmetic. Setting yscl = "linear"
or yscl = "log"
will set the axis as linear or log10-scale, respectively, regardless of the information in the data quality objective file for accuracy.
Any entries in resdat
in the "Result Value"
column as "BDL"
or "AQL"
are replaced with appropriate values in the "Quantitation Limit"
column, if present, otherwise the "MDL"
or "UQL"
columns from the data quality objectives file for accuracy are used. Values as "BDL"
use one half of the appropriate limit.
Value
A ggplot
object that can be further modified if outliers = FALSE
, otherwise a data frame of outliers is returned.
Examples
# results data path
respth <- system.file('extdata/ExampleResults.xlsx', package = 'MassWateR')
# results data
resdat <- readMWRresults(respth)
# accuracy path
accpth <- system.file('extdata/ExampleDQOAccuracy.xlsx',
package = 'MassWateR')
# accuracy data
accdat <- readMWRacc(accpth)
# outliers by month
anlzMWRoutlier(res = resdat, param = 'DO', acc = accdat, group = 'month')
# outliers by site
anlzMWRoutlier(res = resdat, param = 'DO', acc = accdat, group = 'site')
# outliers by site, May through July 2021 only
anlzMWRoutlier(res = resdat, param = 'DO', acc = accdat, group = 'site',
dtrng = c('2022-05-01', '2022-07-31'))
# outliers by month, type as jitterbox
anlzMWRoutlier(res = resdat, param = 'DO', acc = accdat, group = 'month', type = 'jitterbox')
# outliers by month, type as jitter
anlzMWRoutlier(res = resdat, param = 'DO', acc = accdat, group = 'month', type = 'jitter')
# data frame output
anlzMWRoutlier(res = resdat, param = 'DO', acc = accdat, group = 'month', outliers = TRUE)