dt_counts_and_percents {libbib} | R Documentation |
Group by, count, and percent count in a data.table
Description
This function takes a (quoted) column to group by, counts the number of occurrences, sorts descending, and adds the percent of occurrences for each level of the grouped-by column.
Usage
dt_counts_and_percents(DT, group_by_this, percent.cutoff = 0, big.mark = FALSE)
Arguments
DT |
The data.table object to operate on |
group_by_this |
A quoted column to group by |
percent.cutoff |
A percent (out of 100) such that all the count percents lower than this number will be grouped into "OTHER" in the returned data.table (default is 0) |
big.mark |
If |
Details
For long-tailed count distributions, a cutoff on the percent can be placed; percent of counts lower than this percent will be grouped into a category called "OTHER". The percent is a number out of 100
The final row is a total count.
The quoted group-by variable must be a character or factor. If it is not, it will be temporarily converted into one and a warning is issued.
Value
Returns a data.table with three columns: the grouped-by column, a count column, and a percent column (out of 100) to two decimal places
Examples
iris_dt <- as.data.table(iris)
dt_counts_and_percents(iris_dt, "Species")
mt <- as.data.table(mtcars)
mt[, cyl:=factor(cyl)]
dt_counts_and_percents(mt, "cyl")
dt_counts_and_percents(mt, "cyl", percent.cutoff=25)