| freq {cleaner} | R Documentation | 
Frequency table
Description
Create a frequency table of a vector or a data.frame. It supports tidyverse's quasiquotation and RMarkdown for reports. Easiest practice is: data %>% freq(var) using the tidyverse.
top_freq can be used to get the top/bottom n items of a frequency table, with counts as names. It respects ties.
Usage
freq(x, ...)
## Default S3 method:
freq(
  x,
  sort.count = TRUE,
  nmax = getOption("max.print.freq"),
  na.rm = TRUE,
  row.names = TRUE,
  markdown = !interactive(),
  digits = 2,
  quote = NULL,
  header = TRUE,
  title = NULL,
  na = "<NA>",
  sep = " ",
  decimal.mark = getOption("OutDec"),
  big.mark = "",
  wt = NULL,
  ...
)
## S3 method for class 'factor'
freq(x, ..., droplevels = FALSE)
## S3 method for class 'matrix'
freq(x, ..., quote = FALSE)
## S3 method for class 'table'
freq(x, ..., sep = " ")
## S3 method for class 'numeric'
freq(x, ..., digits = 2)
## S3 method for class 'Date'
freq(x, ..., format = "yyyy-mm-dd")
## S3 method for class 'hms'
freq(x, ..., format = "HH:MM:SS")
is.freq(f)
top_freq(f, n)
header(f, property = NULL)
## S3 method for class 'freq'
print(
  x,
  nmax = getOption("max.print.freq", default = 10),
  markdown = !interactive(),
  header = TRUE,
  decimal.mark = getOption("OutDec"),
  big.mark = ifelse(decimal.mark != ",", ",", "."),
  ...
)
Arguments
| x | vector of any class or a  | 
| ... | up to nine different columns of  | 
| sort.count | sort on count, i.e. frequencies. This will be  | 
| nmax | number of row to print. The default,  | 
| na.rm | a logical value indicating whether  | 
| row.names | a logical value indicating whether row indices should be printed as  | 
| markdown | a logical value indicating whether the frequency table should be printed in markdown format. This will print all rows (except when  | 
| digits | how many significant digits are to be used for numeric values in the header (not for the items themselves, that depends on  | 
| quote | a logical value indicating whether or not strings should be printed with surrounding quotes. Default is to print them only around characters that are actually numeric values. | 
| header | a logical value indicating whether an informative header should be printed | 
| title | text to show above frequency table, at default to tries to coerce from the variables passed to  | 
| na | a character string that should be used to show empty ( | 
| sep | a character string to separate the terms when selecting multiple columns | 
| decimal.mark | the character to be used to indicate the numeric decimal point | 
| big.mark | character; if not empty used as mark between every 'big.interval' decimals before (hence big) the decimal point | 
| wt | frequency weights. If a variable, computes  | 
| droplevels | a logical value indicating whether in factors empty levels should be dropped | 
| format | a character to define the printing format (it supports  | 
| f | a frequency table | 
| n | number of top n items to return, use -n for the bottom n items. It will include more than  | 
| property | property in header to return this value directly | 
Details
Frequency tables (or frequency distributions) are summaries of the distribution of values in a sample. With the 'freq' function, you can create univariate frequency tables. Multiple variables will be pasted into one variable, so it forces a univariate distribution.
Input can be done in many different ways. Base R methods are:
freq(df$variable) freq(df[, "variable"])
Tidyverse methods are:
df$variable %>% freq()
df[, "variable"] %>% freq()
df %>% freq("variable")
df %>% freq(variable)
For numeric values of any class, these additional values will all be calculated with na.rm = TRUE and shown into the header:
- Mean, using - mean
- Standard Deviation, using - sd
- Coefficient of Variation (CV), the standard deviation divided by the mean 
- Mean Absolute Deviation (MAD), using - mad
- Tukey Five-Number Summaries (minimum, Q1, median, Q3, maximum), see NOTE below 
- Interquartile Range (IQR) calculated as - Q3 - Q1, see NOTE below
- Coefficient of Quartile Variation (CQV, sometimes called coefficient of dispersion) calculated as - (Q3 - Q1) / (Q3 + Q1), see NOTE below
- Outliers (total count and percentage), using - boxplot.stats
NOTE: These values are calculated using the same algorithm as used by Minitab and SPSS: p[k] = E[F(x[k])]. See Type 6 on the quantile page.
For dates and times of any class, these additional values will be calculated with na.rm = TRUE and shown into the header:
In factors, all factor levels that are not existing in the input data will be dropped at default.
The function top_freq will include more than n rows if there are ties. Use a negative number for n (like n = -3) to select the bottom n values.
Value
A data.frame (with an additional class "freq") with five columns: item, count, percent, cum_count and cum_percent.
Extending the freq() function
Interested in extending the freq() function with your own class? Add a method like below to your package, and optionally define some header info by passing a list to the .add_header parameter, like below example for class difftime. This example assumes that you use the roxygen2 package for package development.
#' @method freq difftime
#' @importFrom cleaner freq.default
#' @export
#' @noRd
freq.difftime <- function(x, ...) {
  freq.default(x = x, ...,
               .add_header = list(units = attributes(x)$units))
}
Be sure to call freq.default in your function and not just freq. Also, add cleaner to the Imports: field of your DESCRIPTION file, to make sure that it will be installed with your package, e.g.:
Imports: cleaner
Examples
freq(unclean$gender, markdown = FALSE)
freq(x = clean_factor(unclean$gender, 
                      levels = c("^m" = "Male", 
                                 "^f" = "Female")),
     markdown = TRUE,
     title = "Frequencies of a cleaned version for a markdown report!",
     header = FALSE,
     quote = TRUE)