summarize {dataMaid}R Documentation

Summarize a variable/dataset

Description

Generic shell function that produces a summary of a variable (or for each variable in an entire dataset), given a number of summary functions and depending on its data class.

Usage

summarize(v, reportstyleOutput = FALSE, summaries = setSummaries(), ...)

Arguments

v

The variable (vector) or dataset (data.frame) to be summarized.

reportstyleOutput

Logical indicating whether the output should be formatted for inclusion in the report (escaped matrix) or not. Defaults to not.

summaries

A list of summaries to use on each supported variable type. We recommend using setSummaries for creating this list and refer to the documentation of this function for more details.

...

Additional argument passed to data class specific methods.

Details

Summary functions are supplied using their names (in character strings) in the class-specific argument, e.g. characterSummaries = c("countMissing", "uniqueValues") for character variables and similarly for the remaining 7 data classes (factor, Date, labelled, haven_labelled, numeric, integer, logical). Note that an overview of all available summaryFunctions can be obtained by calling allSummaryFunctions.

The default choices of summaryFunctions are available in data class specific functions, e.g. defaultCharacterSummaries() and defaultNumericSummaries(). A complete overview of all default options can be obtained by calling setSummaries()

A user defined summary function can be supplied using its function name. Note however that it should take a vector as argument and return a list on the form list(feature="Feature name", result="The result"). More details on how to construct valid summary functions are found in summaryFunction.

Value

The return value depends on the value of reportstyleOutput.

If reportstyleOutput = FALSE (the default): If v is a varibale, a list of summaryResult objects, one summaryResult for each summary function called on v. If v is a dataset, then summarize() returns a list of lists of summaryResult objects instead; one list for each variable in v.

If reportstyleOutput = TRUE: If v is a single variable: A matrix with two columns, feature and result and one row for each summary function that was called. Character strings in this matrix are escaped such that they are ready for Rmarkdown rendering.

If v is a full dataset: A list of matrices as described above, one for each variable in the dataset.

References

Petersen AH, Ekstrøm CT (2019). “dataMaid: Your Assistant for Documenting Supervised Data Quality Screening in R.” _Journal of Statistical Software_, *90*(6), 1-38. doi: 10.18637/jss.v090.i06 (doi: 10.18637/jss.v090.i06).

See Also

setSummaries, summaryFunction, allSummaryFunctions, summaryResult, defaultCharacterSummaries, defaultFactorSummaries, defaultLabelledSummaries, defaultHavenlabelledSummaries, defaultNumericSummaries, defaultIntegerSummaries, defaultLogicalSummaries

Examples

#Default summary for a character vector:
   charV <- c("a", "b", "c", "a", "a", NA, "b", "0")
   summarize(charV)

#Inspect default character summary functions:
   defaultCharacterSummaries()

#Define a new summary function and add it to the summary for character vectors:
   countZeros <- function(v, ...) {
     res <- length(which(v == 0))
     summaryResult(list(feature="No. zeros", result = res, value = res))
   }
   summarize(charV, 
     summaries = setSummaries(character = defaultCharacterSummaries(add = "countZeros")))

 #Does nothing, as intV is not affected by characterSummaries
   intV <- c(0:10)
   summarize(intV, 
     summaries = setSummaries(character = defaultCharacterSummaries(add = "countZeros")))

 #But supplying the argument for integer variables changes the summary:
   summarize(intV, summaries = setSummaries(integer = "countZeros"))
   
 #Summarize a full dataset:
  data(cars)
  summarize(cars)
  
 #Summarize a variable and obtain report-style output (formatted for markdown)
  summarize(charV, reportstyleOutput = TRUE)


[Package dataMaid version 1.4.1 Index]