get_stats.data.frame {COINr}R Documentation

Statistics of columns

Description

Takes a data frame and returns a table of statistics with entries for each column.

Usage

## S3 method for class 'data.frame'
get_stats(
  x,
  t_skew = 2,
  t_kurt = 3.5,
  t_avail = 0.65,
  t_zero = 0.5,
  t_unq = 0.5,
  nsignif = 3,
  ...
)

Arguments

x

A data frame with only numeric columns.

t_skew

Absolute skewness threshold. See details.

t_kurt

Kurtosis threshold. See details.

t_avail

Data availability threshold. See details.

t_zero

A threshold between 0 and 1 for flagging indicators with high proportion of zeroes. See details.

t_unq

A threshold between 0 and 1 for flagging indicators with low proportion of unique values. See details.

nsignif

Number of significant figures to round the output table to.

...

arguments passed to or from other methods.

Details

The statistics (columns in the output table) are as follows (entries correspond to each column):

The aim of this table, among other things, is to check the basic statistics of each column/indicator, and identify any possible issues for each indicator. For example, low data availability, having a high proportion of zeros and/or a low proportion of unique values. Further, the combination of skew and kurtosis (i.e. the Flag.SkewKurt column) is a simple test for possible outliers, which may require treatment using Treat().

See also vignette("analysis").

Value

A data frame of statistics for each column

Examples

# stats of mtcars
get_stats(mtcars)


[Package COINr version 1.1.7 Index]