univariate {PDtoolkit} | R Documentation |
Univariate analysis
Description
univariate
returns the univariate statistics for risk factors supplied in data frame db
.
For numeric risk factors univariate report includes:
rf: Risk factor name.
rf.type: Risk factor class. This metric is always equal to
numeric
.bin.type: Bin type - special or complete cases.
bin: Bin type. If a
sc.method
argument is equal to"together"
, thenbin
andbin.type
have the same value. If thesc.method
argument is equal to"separately"
, then thebin
will contain all special cases that exist for analyzed risk factor (e.g.NA
,NaN
,Inf
).pct: Percentage of observations in each
bin
.cnt.unique: Number of unique values per
bin
.min: Minimum value.
p1, p5, p25, p50, p75, p95, p99: Percentile values.
avg: Mean value.
avg.se: Standard error of the mean.
max: Maximum value.
neg: Number of negative values.
pos: Number of positive values.
cnt.outliers: Number of outliers. Records above or below
Q75
\pm
1.5 * IQR
, whereIQR = Q75 - Q25
.sc.ind: Special case indicator. It takes value 1 if share of special cases exceeds
sc.threshold
otherwise 0.
For categorical risk factors univariate report includes:
rf: Risk factor name.
rf.type: Risk factor class. This metric is equal to one of:
character
,factor
orlogical
.bin.type: Bin type - special or complete cases.
bin: Bin type. If a
sc.method
argument is equal to"together"
, thenbin
andbin.type
have the same value. If thesc.method
argument is equal to"separately"
, then thebin
will contain all special cases that exist for analyzed risk factor (e.g.NA
,NaN
,Inf
).pct: Percentage of observations in each
bin
.cnt.unique: Number of unique values per
bin
.sc.ind: Special case indicator. It takes value 1 if share of special cases exceeds
sc.threshold
otherwise 0.
Usage
univariate(
db,
sc = c(NA, NaN, Inf, -Inf),
sc.method = "together",
sc.threshold = 0.2
)
Arguments
db |
Data frame of risk factors supplied for univariate analysis. |
sc |
Vector of special case elements. Default values are |
sc.method |
Define how special cases will be treated, all together or in separate bins.
Possible values are |
sc.threshold |
Threshold for special cases expressed as percentage of total number of observations.
If |
Value
The command univariate
returns the data frame with explained univariate metrics for numeric,
character, factor and logical class of risk factors.
Examples
suppressMessages(library(PDtoolkit))
data(gcd)
gcd$age[100:120] <- NA
gcd$age.bin <- ndr.bin(x = gcd$age, y = gcd$qual, y.type = "bina")[[2]]
gcd$age.bin <- as.factor(gcd$age.bin)
gcd$maturity.bin <- ndr.bin(x = gcd$maturity, y = gcd$qual, y.type = "bina")[[2]]
gcd$amount.bin <- ndr.bin(x = gcd$amount, y = gcd$qual, y.type = "bina")[[2]]
gcd$all.miss1 <- NaN
gcd$all.miss2 <- NA
gcd$tf <- sample(c(TRUE, FALSE), nrow(gcd), rep = TRUE)
#create date variable to confirm that it will not be processed by the function
gcd$dates <- Sys.Date()
str(gcd)
univariate(db = gcd)