R: Group by, count, and percent count in a data.table

dt_counts_and_percents {libbib}

R Documentation

Group by, count, and percent count in a data.table

Description

This function takes a (quoted) column to group by, counts the number of occurrences, sorts descending, and adds the percent of occurrences for each level of the grouped-by column.

Usage

dt_counts_and_percents(DT, group_by_this, percent.cutoff = 0, big.mark = FALSE)

Arguments

`DT`	The data.table object to operate on
`group_by_this`	A quoted column to group by
`percent.cutoff`	A percent (out of 100) such that all the count percents lower than this number will be grouped into "OTHER" in the returned data.table (default is 0)
`big.mark`	If `FALSE` (default) the "count" column is left as an integer. If not `FALSE`, it must be a character to separate every three digits of the count. This turns the count column into a string.

Details

For long-tailed count distributions, a cutoff on the percent can be placed; percent of counts lower than this percent will be grouped into a category called "OTHER". The percent is a number out of 100

The final row is a total count.

The quoted group-by variable must be a character or factor. If it is not, it will be temporarily converted into one and a warning is issued.

Value

Returns a data.table with three columns: the grouped-by column, a count column, and a percent column (out of 100) to two decimal places

Examples


iris_dt <- as.data.table(iris)
dt_counts_and_percents(iris_dt, "Species")
mt <- as.data.table(mtcars)
mt[, cyl:=factor(cyl)]
dt_counts_and_percents(mt, "cyl")
dt_counts_and_percents(mt, "cyl", percent.cutoff=25)

[Package libbib version 1.6.4 Index]