descr {collapse} | R Documentation |
Detailed Statistical Description of Data Frame
Description
descr
offers a fast and detailed description of each variable in a data frame. Since v1.9.0 it fully supports grouped and weighted computations.
Usage
descr(X, ...)
## Default S3 method:
descr(X, by = NULL, w = NULL, cols = NULL,
Ndistinct = TRUE, higher = TRUE, table = TRUE, sort.table = "freq",
Qprobs = c(0.01, 0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 0.95, 0.99), Qtype = 7L,
label.attr = "label", stepwise = FALSE, ...)
## S3 method for class 'grouped_df'
descr(X, w = NULL,
Ndistinct = TRUE, higher = TRUE, table = TRUE, sort.table = "freq",
Qprobs = c(0.01, 0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 0.95, 0.99), Qtype = 7L,
label.attr = "label", stepwise = FALSE, ...)
## S3 method for class 'descr'
as.data.frame(x, ..., gid = "Group")
## S3 method for class 'descr'
print(x, n = 14, perc = TRUE, digits = .op[["digits"]], t.table = TRUE, total = TRUE,
compact = FALSE, summary = !compact, reverse = FALSE, stepwise = FALSE, ...)
Arguments
X |
a (grouped) data frame or list of atomic vectors. Atomic vectors, matrices or arrays can be passed but will first be coerced to data frame using | |||||||||||||||||||||
by |
a factor, | |||||||||||||||||||||
w |
a numeric vector of (non-negative) weights. the default method also supports a one-sided formulas i.e. | |||||||||||||||||||||
cols |
select columns to describe using column names, indices a logical vector or selector function (e.g. | |||||||||||||||||||||
Ndistinct |
logical. | |||||||||||||||||||||
higher |
logical. Argument is passed down to | |||||||||||||||||||||
table |
logical. | |||||||||||||||||||||
sort.table |
an integer or character string specifying how the frequency table should be presented:
| |||||||||||||||||||||
Qprobs |
double. Probabilities for quantiles to compute on numeric variables, passed down to | |||||||||||||||||||||
Qtype |
integer. Quantile types 5-9 following Hyndman and Fan (1996) who recommended type 8, default 7 as in | |||||||||||||||||||||
label.attr |
character. The name of a label attribute to display for each variable (if variables are labeled). | |||||||||||||||||||||
... |
for | |||||||||||||||||||||
x |
an object of class 'descr'. | |||||||||||||||||||||
n |
integer. The maximum number of table elements to print for categorical variables. If the number of distinct elements is | |||||||||||||||||||||
perc |
logical. | |||||||||||||||||||||
digits |
integer. The number of decimals to print in statistics, quantiles and percentage tables. | |||||||||||||||||||||
t.table |
logical. | |||||||||||||||||||||
total |
logical. | |||||||||||||||||||||
compact |
logical. | |||||||||||||||||||||
summary |
logical. | |||||||||||||||||||||
reverse |
logical. | |||||||||||||||||||||
stepwise |
logical. | |||||||||||||||||||||
gid |
character. Name assigned to the group-id column, when describing data by groups. |
Details
descr
was heavily inspired by Hmisc::describe
, but is much faster and has more advanced statistical capabilities. It is principally a wrapper around qsu
, fquantile
(.quantile
), and fndistinct
for numeric variables, and computes frequency tables for categorical variables using qtab
. Date variables are summarized with fnobs
, fndistinct
and frange
.
Since v1.9.0 grouped and weighted computations are fully supported. The use of sampling weights will produce a weighted mean, sd, skewness and kurtosis, and weighted quantiles for numeric data. For categorical data, tables will display the sum of weights instead of the frequencies, and percentage tables as well as the percentage of missing values indicated next to 'Statistics' in print, be relative to the total sum of weights. All this can be done by groups. Grouped (weighted) quantiles are computed using BY
.
For larger datasets, calling the stepwise
option directly from descr()
is recommended, as precomputing the statistics for all variables before digesting the results can be time consuming.
The list-object returned from descr
can efficiently be converted to a tidy data frame using the as.data.frame
method. This representation will not include frequency tables computed for categorical variables.
Value
A 2-level nested list-based object of class 'descr'. The list has the same size as the dataset, and contains the statistics computed for each variable, which are themselves stored in a list containing the class, the label, the basic statistics and quantiles / tables computed for the variable (in matrix form).
The object has attributes attached providing the 'name' of the dataset, the number of rows in the dataset ('N'), an attribute 'arstat' indicating whether arrays of statistics where generated by passing arguments (e.g. pid
) down to qsu.default
, an attribute 'table' indicating whether table = TRUE
(i.e. the object could contain tables for categorical variables), and attributes 'groups' and/or 'weights' providing a GRP
object and/or weight vector for grouped and/or weighted data descriptions.
See Also
qsu
, qtab
, fquantile
, pwcor
, Summary Statistics, Fast Statistical Functions, Collapse Overview
Examples
## Simple Use
descr(iris)
descr(wlddev)
descr(GGDC10S)
# Some useful print options (also try stepwise argument)
print(descr(GGDC10S), reverse = TRUE, t.table = FALSE)
# For bigger data consider: descr(big_data, stepwise = TRUE)
# Generating a data frame
as.data.frame(descr(wlddev, table = FALSE))
## Weighted Desciptions
descr(wlddev, w = ~ replace_na(POP)) # replacing NA's with 0's for fquantile()
## Grouped Desciptions
descr(GGDC10S, ~ Variable)
descr(wlddev, ~ income)
print(descr(wlddev, ~ income), compact = TRUE)
## Grouped & Weighted Desciptions
descr(wlddev, ~ income, w = ~ replace_na(POP))
## Passing Arguments down to qsu.default: for Panel Data Statistics
descr(iris, pid = iris$Species)
descr(wlddev, pid = wlddev$iso3c)