summarize {dataMaid} | R Documentation |
Summarize a variable/dataset
Description
Generic shell function that produces a summary of a variable (or for each variable in an entire dataset), given a number of summary functions and depending on its data class.
Usage
summarize(v, reportstyleOutput = FALSE, summaries = setSummaries(), ...)
Arguments
v |
The variable (vector) or dataset (data.frame) to be summarized. |
reportstyleOutput |
Logical indicating whether the output should be formatted for inclusion in the report (escaped matrix) or not. Defaults to not. |
summaries |
A list of summaries to use on each supported variable type. We recommend
using |
... |
Additional argument passed to data class specific methods. |
Details
Summary functions are supplied using their
names (in character strings) in the class-specific argument, e.g.
characterSummaries = c("countMissing", "uniqueValues")
for character variables and
similarly for the remaining 7 data classes (factor, Date, labelled, haven_labelled, numeric, integer, logical).
Note that an overview of all available summaryFunction
s can be obtained by calling
allSummaryFunctions
.
The default choices of summaryFunctions
are available in data class specific functions, e.g.
defaultCharacterSummaries()
and defaultNumericSummaries()
.
A complete overview of all default options can be obtained by calling setSummaries()
A user defined summary function can be supplied using its function name. Note
however that it should take a vector as argument and return a list on the form
list(feature="Feature name", result="The result")
. More details on how to construct
valid summary functions are found in summaryFunction
.
Value
The return value depends on the value of reportstyleOutput
.
If reportstyleOutput = FALSE
(the default): If v
is a varibale,
a list of summaryResult
objects, one summaryResult
for each summary
function called on v
. If v
is a dataset, then summarize()
returns
a list of lists of summaryResult
objects instead; one list for each variable
in v
.
If reportstyleOutput = TRUE
:
If v
is a single variable: A matrix with two columns, feature
and
result
and one row for each summary function that was called. Character
strings in this matrix are escaped such that they are ready for Rmarkdown rendering.
If v
is a full dataset: A list of matrices as described above, one for each
variable in the dataset.
References
Petersen AH, Ekstrøm CT (2019). “dataMaid: Your Assistant for Documenting Supervised Data Quality Screening in R.” _Journal of Statistical Software_, *90*(6), 1-38. doi: 10.18637/jss.v090.i06 (doi: 10.18637/jss.v090.i06).
See Also
setSummaries
,
summaryFunction
, allSummaryFunctions
,
summaryResult
,
defaultCharacterSummaries
, defaultFactorSummaries
,
defaultLabelledSummaries
, defaultHavenlabelledSummaries
,
defaultNumericSummaries
, defaultIntegerSummaries
,
defaultLogicalSummaries
Examples
#Default summary for a character vector:
charV <- c("a", "b", "c", "a", "a", NA, "b", "0")
summarize(charV)
#Inspect default character summary functions:
defaultCharacterSummaries()
#Define a new summary function and add it to the summary for character vectors:
countZeros <- function(v, ...) {
res <- length(which(v == 0))
summaryResult(list(feature="No. zeros", result = res, value = res))
}
summarize(charV,
summaries = setSummaries(character = defaultCharacterSummaries(add = "countZeros")))
#Does nothing, as intV is not affected by characterSummaries
intV <- c(0:10)
summarize(intV,
summaries = setSummaries(character = defaultCharacterSummaries(add = "countZeros")))
#But supplying the argument for integer variables changes the summary:
summarize(intV, summaries = setSummaries(integer = "countZeros"))
#Summarize a full dataset:
data(cars)
summarize(cars)
#Summarize a variable and obtain report-style output (formatted for markdown)
summarize(charV, reportstyleOutput = TRUE)