Desc {DescTools} | R Documentation |
Describe Data
Description
Produce summaries of various types of variables. Calculate descriptive statistics for x and use Word as reporting tool for the numeric results and for descriptive plots. The appropriate statistics are chosen depending on the class of x. The general intention is to simplify the description process for lazy typers and return a quick, but rich summary.
Usage
Desc(x, ..., main = NULL, plotit = NULL, wrd = NULL)
## S3 method for class 'numeric'
Desc(
x,
main = NULL,
maxrows = NULL,
plotit = NULL,
sep = NULL,
digits = NULL,
...
)
## S3 method for class 'integer'
Desc(
x,
main = NULL,
maxrows = NULL,
plotit = NULL,
sep = NULL,
digits = NULL,
...
)
## S3 method for class 'factor'
Desc(
x,
main = NULL,
maxrows = NULL,
ord = NULL,
plotit = NULL,
sep = NULL,
digits = NULL,
...
)
## S3 method for class 'labelled'
Desc(
x,
main = NULL,
maxrows = NULL,
ord = NULL,
plotit = NULL,
sep = NULL,
digits = NULL,
...
)
## S3 method for class 'ordered'
Desc(
x,
main = NULL,
maxrows = NULL,
ord = NULL,
plotit = NULL,
sep = NULL,
digits = NULL,
...
)
## S3 method for class 'character'
Desc(
x,
main = NULL,
maxrows = NULL,
ord = NULL,
plotit = NULL,
sep = NULL,
digits = NULL,
...
)
## S3 method for class 'ts'
Desc(x, main = NULL, plotit = NULL, sep = NULL, digits = NULL, ...)
## S3 method for class 'logical'
Desc(
x,
main = NULL,
ord = NULL,
conf.level = 0.95,
plotit = NULL,
sep = NULL,
digits = NULL,
...
)
## S3 method for class 'Date'
Desc(
x,
main = NULL,
dprobs = NULL,
mprobs = NULL,
plotit = NULL,
sep = NULL,
digits = NULL,
...
)
## S3 method for class 'table'
Desc(
x,
main = NULL,
conf.level = 0.95,
verbose = 2,
rfrq = "111",
margins = c(1, 2),
plotit = NULL,
sep = NULL,
digits = NULL,
...
)
## Default S3 method:
Desc(
x,
main = NULL,
maxrows = NULL,
ord = NULL,
conf.level = 0.95,
verbose = 2,
rfrq = "111",
margins = c(1, 2),
dprobs = NULL,
mprobs = NULL,
plotit = NULL,
sep = NULL,
digits = NULL,
...
)
## S3 method for class 'data.frame'
Desc(x, main = NULL, plotit = NULL, enum = TRUE, sep = NULL, ...)
## S3 method for class 'list'
Desc(x, main = NULL, plotit = NULL, enum = TRUE, sep = NULL, ...)
## S3 method for class 'formula'
Desc(
formula,
data = parent.frame(),
subset,
main = NULL,
plotit = NULL,
digits = NULL,
...
)
## S3 method for class 'Desc'
print(
x,
digits = NULL,
plotit = NULL,
nolabel = FALSE,
sep = NULL,
nomain = FALSE,
...
)
## S3 method for class 'Desc'
plot(x, main = NULL, ...)
## S3 method for class 'palette'
Desc(x, ...)
Arguments
x |
the object to be described. This can be a data.frame, a list, a table or a vector of the classes: numeric, integer, factor, ordered factor, logical. |
... |
further arguments to be passed to or from other methods. For the internal default method these can include:
|
main |
(character|
|
plotit |
logical. Should a plot be created? The plot type will be
chosen according to the classes of variables (roughly following a
numeric-numeric, numeric-categorical, categorical-categorical logic).
Default can be defined by option |
wrd |
the pointer to a running MS Word instance, as created by
|
maxrows |
numeric; defines the maximum number of rows in a frequency
table to be reported. For factors with many levels it is often not
interesting to see all of them. Default is set to 12 most frequent ones
(resp. the first ones if For a numeric argument x If Setting |
sep |
character. The separator for the title. By default a line of
|
digits |
integer. With how many digits should the relative frequencies be formatted? Default can be set by DescToolsOptions(digits=x). |
ord |
character out of |
conf.level |
confidence level of the interval. If set to |
dprobs , mprobs |
a vector with the probabilities for the Chi-Square test
for days, resp. months, when describing a |
verbose |
integer out of |
rfrq |
a string with 3 characters, each of them being |
margins |
a vector, consisting out of 1 and/or 2. Defines the margin
sums to be included. Row margins are reported if margins is set to 1. Set it
to 2 for column margins and c(1,2) for both. |
enum |
logical, determining if in data.frames and lists a sequential number should be included in the main title. Default is TRUE. The reason for this option is, that if a Word report with enumerated headings is created, the numbers may be redundant or inconsistent. |
formula |
a formula of the form |
data |
an optional matrix or data frame containing the variables in the
formula |
subset |
an optional vector specifying a subset of observations to be used. |
nolabel |
logical, defining if labels (defined as attribute with the
name |
nomain |
logical, determines if the main title of the output is printed
or not, default is |
Details
A 2-dimensional table will be described with it's relative frequencies, a
short summary containing the total cases, the dimensions of the table,
chi-square tests and some association measures as phi-coefficient,
contingency coefficient and Cramer's V.
Tables with higher dimensions will simply be printed as flat table,
with marginal sums for the first and for the last dimension.
Desc
is a generic function. It dispatches to one of the methods above
depending on the class of its first argument. Typing ?Desc
+ TAB at the
prompt should present a choice of links: the help pages for each of these
Desc
methods (at least if you're using RStudio, which anyway is
recommended). You don't need to use the full name of the method although you
may if you wish; i.e., Desc(x)
is idiomatic R but you can bypass method
dispatch by going direct if you wish: Desc.numeric(x)
.
This function produces a rich description of a factor, containing length,
number of NAs, number of levels and detailed frequencies of all levels. The
order of the frequency table can be chosen between descending/ascending
frequency, labels or levels. For ordered factors the order default is
"level"
. Character vectors are treated as unordered factors Desc.char
converts x to a factor an processes x as factor.
Desc.ordered does nothing more than changing the standard order for the
frequencies to it's intrinsic order, which means order "level"
instead of "desc"
in the factor case.
Description interface for dates. We do here what seems reasonable for describing dates. We start with a short summary about length, number of NAs and extreme values, before we describe the frequencies of the weekdays and months, rounded up by a chi-square test.
A 2-dimensional table will be described with it's relative frequencies, a
short summary containing the total cases, the dimensions of the table,
chi-square tests and some association measures as phi-coefficient,
contingency coefficient and Cramer's V.
Tables with higher dimensions will simply be printed as flat table,
with marginal sums for the first and for the last dimension.
Note that NA
s cannot be handled by this interface, as tables in general come
in "as.is", say basically as a matrix without any further information about
potentially previously cleared NAs.
Description of a dichotomous variable. This can either be a logical vector,
a factor with two levels or a numeric variable with only two unique values.
The confidence levels for the relative frequencies are calculated by
BinomCI()
, method "Wilson"
on a confidence level defined
by conf.level
. Dichotomous variables can easily be condensed in one
graphical representation. Desc for a set of flags (=dichotomous variables)
calculates the frequencies, a binomial confidence interval and produces a
kind of dotplot with error bars. Motivation for this function is, that
dichotomous variable in general do not contain intense information.
Therefore it makes sense to condense the description of sets of dichotomous
variables.
The formula interface accepts the formula operators +
, :
,
*
, I()
, 1
and evaluates any function. The left hand
side and right hand side of the formula are evaluated the same way. The
variable pairs are processed in dependency of their classes.
Word
This function is not thought of being directly run by the end user.
It will normally be called automatically, when a pointer to a Word instance
is passed to the function Desc()
.
However DescWrd
takes
some more specific arguments concerning the Word output (like font
or
fontsize
), which can make it necessary to call the function directly.
Value
A list containing the following components:
length |
the length of the vector (n + NAs). |
n |
the valid entries (NAs are excluded) |
NAs |
number of NAs |
unique |
number of unique values. |
0s |
number of zeros |
mean |
arithmetic mean |
MeanSE |
standard error of the mean, as calculated by |
quant |
a table of quantiles, as calculated by quantile(x, probs = c(.05,.10,.25,.5,.75,.9,.95), na.rm = TRUE). |
sd |
standard deviation |
vcoef |
coefficient of variation: |
mad |
median absolute deviation ( |
IQR |
interquartile range |
skew |
skewness, as calculated by |
kurt |
kurtosis, as calculated by |
highlow |
the lowest and the highest values, reported with their frequencies in brackets, if > 1. |
frq |
a data.frame of absolute and relative frequencies given by
|
Author(s)
Andri Signorell andri@signorell.net
See Also
Other Statistical summary functions:
Abstract()
Examples
opt <- DescToolsOptions()
# implemented classes:
Desc(d.pizza$wrongpizza) # logical
Desc(d.pizza$driver) # factor
Desc(d.pizza$quality) # ordered factor
Desc(as.character(d.pizza$driver)) # character
Desc(d.pizza$week) # integer
Desc(d.pizza$delivery_min) # numeric
Desc(d.pizza$date) # Date
Desc(d.pizza)
Desc(d.pizza$wrongpizza, main="The wrong pizza delivered", digits=5)
Desc(table(d.pizza$area)) # 1-dim table
Desc(table(d.pizza$area, d.pizza$operator)) # 2-dim table
Desc(table(d.pizza$area, d.pizza$operator, d.pizza$driver)) # n-dim table
# expressions
Desc(log(d.pizza$temperature))
Desc(d.pizza$temperature > 45)
# supported labels
Label(d.pizza$temperature) <- "This is the temperature in degrees Celsius
measured at the time when the pizza is delivered to the client."
Desc(d.pizza$temperature)
# try as well: Desc(d.pizza$temperature, wrd=GetNewWrd())
z <- Desc(d.pizza$temperature)
print(z, digits=1, plotit=FALSE)
# plot (additional arguments are passed on to the underlying plot function)
plot(z, main="The pizza's temperature in Celsius", args.hist=list(breaks=50))
# formula interface for single variables
Desc(~ uptake + Type, data = CO2, plotit = FALSE)
# bivariate
Desc(price ~ operator, data=d.pizza) # numeric ~ factor
Desc(driver ~ operator, data=d.pizza) # factor ~ factor
Desc(driver ~ area + operator, data=d.pizza) # factor ~ several factors
Desc(driver + area ~ operator, data=d.pizza) # several factors ~ factor
Desc(driver ~ week, data=d.pizza) # factor ~ integer
Desc(driver ~ operator, data=d.pizza, rfrq="111") # alle rel. frequencies
Desc(driver ~ operator, data=d.pizza, rfrq="000",
verbose=3) # no rel. frequencies
Desc(price ~ delivery_min, data=d.pizza) # numeric ~ numeric
Desc(price + delivery_min ~ operator + driver + wrongpizza,
data=d.pizza, digits=c(2,2,2,2,0,3,0,0) )
Desc(week ~ driver, data=d.pizza, digits=c(2,2,2,2,0,3,0,0)) # define digits
Desc(delivery_min + weekday ~ driver, data=d.pizza)
# without defining data-parameter
Desc(d.pizza$delivery_min ~ d.pizza$driver)
# with functions and interactions
Desc(sqrt(price) ~ operator : factor(wrongpizza), data=d.pizza)
Desc(log(price+1) ~ cut(delivery_min, breaks=seq(10,90,10)),
data=d.pizza, digits=c(2,2,2,2,0,3,0,0))
# response versus all the rest
Desc(driver ~ ., data=d.pizza[, c("temperature","wine_delivered","area","driver")])
# all the rest versus response
Desc(. ~ driver, data=d.pizza[, c("temperature","wine_delivered","area","driver")])
# pairwise Descriptions
p <- CombPairs(c("area","count","operator","driver","temperature","wrongpizza","quality"), )
for(i in 1:nrow(p))
print(Desc(formula(gettextf("%s ~ %s", p$X1[i], p$X2[i])), data=d.pizza))
# get more flexibility, create the table first
tab <- as.table(apply(HairEyeColor, c(1,2), sum))
tab <- tab[,c("Brown","Hazel","Green","Blue")]
# display only absolute values, row and columnwise percentages
Desc(tab, row.vars=c(3, 1), rfrq="011", plotit=FALSE)
# do the plot by hand, while setting the colours for the mosaics
cols1 <- SetAlpha(c("sienna4", "burlywood", "chartreuse3", "slategray1"), 0.6)
cols2 <- SetAlpha(c("moccasin", "salmon1", "wheat3", "gray32"), 0.8)
plot(Desc(tab), col1=cols1, col2=cols2)
# choose alternative flavours for graphing numeric ~ factor using pipe
# (colors are recyled)
Desc(temperature ~ driver, data = d.pizza) |> plot(type="dens", col=Pal("Tibco"))
# use global format options for presentation
Fmt(abs=as.fmt(digits=0, big.mark=""))
Fmt(per=as.fmt(digits=2, fmt="%"))
Desc(area ~ driver, d.pizza, plotit=FALSE)
Fmt(abs=as.fmt(digits=0, big.mark="'"))
Fmt(per=as.fmt(digits=3, ldigits=0))
Desc(area ~ driver, d.pizza, plotit=FALSE)
# plot arguments can be fixed in detail
z <- Desc(BoxCox(d.pizza$temperature, lambda = 1.5))
plot(z, mar=c(0, 2.1, 4.1, 2.1), args.rug=TRUE, args.hist=list(breaks=50),
args.dens=list(from=0))
# The default description for count variables can be inappropriate,
# the density curve does not represent the variable well.
set.seed(1972)
x <- rpois(n = 500, lambda = 5)
Desc(x)
# but setting maxrows to Inf gives a better plot
Desc(x, maxrows = Inf)
# Output into word document (Windows-specific example) -----------------------
# by simply setting wrd=GetNewWrd()
## Not run:
# create a new word instance and insert title and contents
wrd <- GetNewWrd(header=TRUE)
# let's have a subset
d.sub <- d.pizza[,c("driver", "date", "operator", "price", "wrongpizza")]
# do just the univariate analysis
Desc(d.sub, wrd=wrd)
## End(Not run)
DescToolsOptions(opt)