data_tabulate {datawizard} | R Documentation |
Create frequency and crosstables of variables
Description
This function creates frequency or crosstables of variables, including the number of levels/values as well as the distribution of raw, valid and cumulative percentages. For crosstables, row, column and cell percentages can be calculated.
Usage
data_tabulate(x, ...)
## Default S3 method:
data_tabulate(
x,
by = NULL,
drop_levels = FALSE,
weights = NULL,
remove_na = FALSE,
proportions = NULL,
name = NULL,
verbose = TRUE,
...
)
## S3 method for class 'data.frame'
data_tabulate(
x,
select = NULL,
exclude = NULL,
ignore_case = FALSE,
regex = FALSE,
by = NULL,
drop_levels = FALSE,
weights = NULL,
remove_na = FALSE,
proportions = NULL,
collapse = FALSE,
verbose = TRUE,
...
)
## S3 method for class 'datawizard_tables'
as.data.frame(
x,
row.names = NULL,
optional = FALSE,
...,
stringsAsFactors = FALSE,
add_total = FALSE
)
Arguments
x |
A (grouped) data frame, a vector or factor. |
... |
not used. |
by |
Optional vector or factor. If supplied, a crosstable is created.
If |
drop_levels |
Logical, if |
weights |
Optional numeric vector of weights. Must be of the same length
as |
remove_na |
Logical, if |
proportions |
Optional character string, indicating the type of
percentages to be calculated. Only applies to crosstables, i.e. when |
name |
Optional character string, which includes the name that is used for printing. |
verbose |
Toggle warnings. |
select |
Variables that will be included when performing the required tasks. Can be either
If |
exclude |
See |
ignore_case |
Logical, if |
regex |
Logical, if |
collapse |
Logical, if |
row.names |
|
optional |
logical. If |
stringsAsFactors |
logical: should the character vector be converted to a factor? |
add_total |
For crosstables (i.e. when |
Details
There is an as.data.frame()
method, to return the frequency tables as a
data frame. The structure of the returned object is a nested data frame,
where the first column contains name of the variable for which frequencies
were calculated, and the second column is a list column that contains the
frequency tables as data frame. See 'Examples'.
Value
A data frame, or a list of data frames, with one frequency table as data frame per variable.
Crosstables
If by
is supplied, a crosstable is created. The crosstable includes <NA>
(missing) values by default. The first column indicates values of x
, the
first row indicates values of by
(including missing values). The last row
and column contain the total frequencies for each row and column, respectively.
Setting remove_na = FALSE
will omit missing values from the crosstable.
Setting proportions
to "row"
or "column"
will add row or column
percentages. Setting proportions
to "full"
will add relative frequencies
for the full table.
Note
There are print_html()
and print_md()
methods available for printing
frequency or crosstables in HTML and markdown format, e.g.
print_html(data_tabulate(x))
.
Examples
# frequency tables -------
# ------------------------
data(efc)
# vector/factor
data_tabulate(efc$c172code)
# drop missing values
data_tabulate(efc$c172code, remove_na = TRUE)
# data frame
data_tabulate(efc, c("e42dep", "c172code"))
# grouped data frame
suppressPackageStartupMessages(library(poorman, quietly = TRUE))
efc %>%
group_by(c172code) %>%
data_tabulate("e16sex")
# collapse tables
efc %>%
group_by(c172code) %>%
data_tabulate("e16sex", collapse = TRUE)
# for larger N's (> 100000), a big mark is automatically added
set.seed(123)
x <- sample(1:3, 1e6, TRUE)
data_tabulate(x, name = "Large Number")
# to remove the big mark, use "print(..., big_mark = "")"
print(data_tabulate(x), big_mark = "")
# weighted frequencies
set.seed(123)
efc$weights <- abs(rnorm(n = nrow(efc), mean = 1, sd = 0.5))
data_tabulate(efc$e42dep, weights = efc$weights)
# crosstables ------
# ------------------
# add some missing values
set.seed(123)
efc$e16sex[sample.int(nrow(efc), 5)] <- NA
data_tabulate(efc, "c172code", by = "e16sex")
# add row and column percentages
data_tabulate(efc, "c172code", by = "e16sex", proportions = "row")
data_tabulate(efc, "c172code", by = "e16sex", proportions = "column")
# omit missing values
data_tabulate(
efc$c172code,
by = efc$e16sex,
proportions = "column",
remove_na = TRUE
)
# round percentages
out <- data_tabulate(efc, "c172code", by = "e16sex", proportions = "column")
print(out, digits = 0)
# coerce to data frames
result <- data_tabulate(efc, "c172code", by = "e16sex")
as.data.frame(result)
as.data.frame(result)$table
as.data.frame(result, add_total = TRUE)$table