desc.stat {monobinShiny} | R Documentation |
Descriptive statistics
Description
desc.stat
returns the descriptive statistics of numeric risk factor. Reported metrics covers mainly
univariate and part of bivariate analysis which are usually standard steps in credit rating model development.
Metrics are reported for special (if exists) and complete case groups separately.
Report includes:
risk.factor: Risk factor name.
type: Special case or complete case group.
bin: When special case method is
together
then bin is the same as type, otherwise all special cases are reported separately.cnt: Number of observations.
pct: Percentage of observations.
min: Minimum value.
p1, p5, p25, p50, p75, p95, p99: Percentile values.
avg: Mean value.
avg.se: Standard error of mean.
max: Maximum value.
neg: Number of negative values.
pos: Number of positive values.
cnt.outliers: Number of outliers. Records above and below
Q75 + 1.5 * IQR
, whereIQR = Q75 - Q25
, where IQR is interquartile range.
Usage
desc.stat(x, y, sc = c(NA, NaN, Inf), sc.method = "together")
Arguments
x |
Numeric risk factor. |
y |
Numeric target vector (binary or continuous). |
sc |
Numeric vector with special case elements. Default values are c(NA, NaN, Inf). Recommendation is to keep the default values always and add new ones if needed. Otherwise, if these values exist in x and are not defined in the sc vector, function will report the error. |
sc.method |
Define how special cases will be treated, all together or in separate bins.
Possible values are |
Value
Data frame of descriptive statistics metrics, separately for complete and special case groups.
Examples
suppressMessages(library(monobinShiny))
data(gcd)
desc.stat(x = gcd$age, y = gcd$qual)
gcd$age[1:10] <- NA
gcd$age[50:75] <- Inf
desc.stat(x = gcd$age, y = gcd$qual, sc.method = "together")
desc.stat(x = gcd$age, y = gcd$qual, sc.method = "separately")