get_stats.data.frame {COINr} | R Documentation |
Statistics of columns
Description
Takes a data frame and returns a table of statistics with entries for each column.
Usage
## S3 method for class 'data.frame'
get_stats(
x,
t_skew = 2,
t_kurt = 3.5,
t_avail = 0.65,
t_zero = 0.5,
t_unq = 0.5,
nsignif = 3,
...
)
Arguments
x |
A data frame with only numeric columns. |
t_skew |
Absolute skewness threshold. See details. |
t_kurt |
Kurtosis threshold. See details. |
t_avail |
Data availability threshold. See details. |
t_zero |
A threshold between 0 and 1 for flagging indicators with high proportion of zeroes. See details. |
t_unq |
A threshold between 0 and 1 for flagging indicators with low proportion of unique values. See details. |
nsignif |
Number of significant figures to round the output table to. |
... |
arguments passed to or from other methods. |
Details
The statistics (columns in the output table) are as follows (entries correspond to each column):
-
Min
: the minimum -
Max
: the maximum -
Mean
: the (arirthmetic) mean -
Median
: the median -
Std
: the standard deviation -
Skew
: the skew -
Kurt
: the kurtosis -
N.Avail
: the number of non-NA
values -
N.NonZero
: the number of non-zero values -
N.Unique
: the number of unique values -
Frc.Avail
: the fraction of non-NA
values -
Frc.NonZero
: the fraction of non-zero values -
Frc.Unique
: the fraction of unique values -
Flag.Avail
: a data availability flag - columns withFrc.Avail < t_avail
will be flagged as"LOW"
, else"ok"
. -
Flag.NonZero
: a flag for columns with a high proportion of zeros. Any columns withFrc.NonZero < t_zero
are flagged as"LOW"
, otherwise"ok"
. -
Flag.Unique
: a unique value flag - any columns withFrc.Unique < t_unq
are flagged as"LOW"
, otherwise"ok"
. -
Flag.SkewKurt
: a skew and kurtosis flag which is an indication of possible outliers. Any columns withabs(Skew) > t_skew
ANDKurt > t_kurt
are flagged as"OUT"
, otherwise"ok"
.
The aim of this table, among other things, is to check the basic statistics of each column/indicator, and identify
any possible issues for each indicator. For example, low data availability, having a high proportion of zeros and/or
a low proportion of unique values. Further, the combination of skew and kurtosis (i.e. the Flag.SkewKurt
column)
is a simple test for possible outliers, which may require treatment using Treat()
.
See also vignette("analysis")
.
Value
A data frame of statistics for each column
Examples
# stats of mtcars
get_stats(mtcars)