outlier {schoRsch} | R Documentation |
Screen Data for Outliers
Description
A chosen column of a data frame is screened for outliers, outliers are marked and/or eliminated. Either absolute lower and upper limits are applied, or outliers are identified based on z-transformed data. Either exact limits and/or cutoffs based on z-values need to be entered.
Usage
outlier(data, dv,
todo = "na", res.name = "outlier",
upper.limit = NaN, lower.limit = NaN,
limit.exact = FALSE,
upper.z = NaN, lower.z = NaN,
z.exact = FALSE, factors = NaN,
z.keep = TRUE, z.name = "zscores",
vsj = FALSE,
print.summary = TRUE)
Arguments
data |
A data frame containing the data to be screened as well was appropriate condition variables. |
dv |
Character string specifying the name of the variable within |
todo |
Character string specifying the fate of outliers: |
res.name |
Character string specifying the name of the variable to be used for marking outliers, default= |
upper.limit |
An optional numerical specifying the absolute upper limit defining outliers. |
lower.limit |
An optional numerical specifying the absolute lower limit defining outliers. |
limit.exact |
Logical, if |
upper.z |
An optional numerical specifying how much standard deviations within a cell a value must exceed to be identified as an outlier. |
lower.z |
An optional numerical specifying how much standard deviations within a cell a value must undercut to be identified as an outlier. |
factors |
A string or vector of strings (e.g., |
z.exact |
Logical, if |
z.keep |
Logical, if |
z.name |
Character string, specifying a name for the variable that should be used for storing z-scores. |
vsj |
To be implemented in a future version... |
print.summary |
Logical, if |
Details
If both, absolute limits and z-limits are specified, absolute limits are processed first and z-scores are computed for the remaining data points.
Value
outlier(data,...)
returns the original data frame with the outlier correction applied. This data frame also has one additional column containing flags for outliers (0
= not suspicious, 1
= outlier). If z-scores are requested, these scores are retured as an additional column.
Author(s)
Markus Janczyk, Roland Pfister