colSums_if {quest}R Documentation

Column Sums Conditional on Frequency of Observed Values

Description

colSums_if calculates the sum of every column in a numeric or logical matrix conditional on the frequency of observed data. If the frequency of observed values in that column is less than (or equal to) that specified by ov.min, then NA is returned for that column. It also has the option to return a value other than 0 (e.g., NA) when all columns are NA, which differs from colSums(x, na.rm = TRUE).

Usage

colSums_if(
  x,
  ov.min = 1,
  prop = TRUE,
  inclusive = TRUE,
  impute = TRUE,
  allNA = NA_real_
)

Arguments

x

numeric or logical matrix. If not a matrix, it will be coerced to one.

ov.min

minimum frequency of observed values required per column. If prop = TRUE, then this is a decimal between 0 and 1. If prop = FALSE, then this is a integer between 0 and nrow(x).

prop

logical vector of length 1 specifying whether ov.min should refer to the proportion of observed values (TRUE) or the count of observed values (FALSE).

inclusive

logical vector of length 1 specifying whether the sum should be calculated if the frequency of observed values in a column is exactly equal to ov.min.

impute

logical vector of length 1 specifying if missing values should be imputed with the mean of observed values of x[, i]. If TRUE (default), this will make sums over the same rows with different amounts of observed data comparable.

allNA

numeric vector of length 1 specifying what value should be returned for columns that are all NA. This is most applicable when ov.min = 0 and inclusive = TRUE. The default is NA, which differs from colSums with na.rm = TRUE where 0 is returned. Note, the value is overwritten by NA if the frequency of observed values in that column is less than (or equal to) that specified by ov.min.

Details

Conceptually this function does: apply(X = x, MARGIN = 2, FUN = sum_if, ov.min = ov.min, prop = prop, inclusive = inclusive). But for computational efficiency purposes it does not because then the observed values conditioning would not be vectorized. Instead, it uses colSums and then inserts NAs for columns that have too few observed values.

Value

numeric vector of length = ncol(x) with names = colnames(x) providing the sum of each column or NA depending on the frequency of observed values.

See Also

colMeans_if rowSums_if rowMeans_if colSums

Examples

colSums_if(airquality)
colSums_if(x = airquality, ov.min = 150, prop = FALSE)
x <- data.frame("x" = c(1, 2, NA), "y" = c(1, NA, NA), "z" = c(NA, NA, NA))
colSums_if(x)
colSums_if(x, ov.min = 0)
colSums_if(x, ov.min = 0, allNA = 0)
identical(x = colSums(x, na.rm = TRUE),
   y = colSums_if(x, impute = FALSE, ov.min = 0, allNA = 0)) # identical to
   # colSums(x, na.rm = TRUE)

[Package quest version 0.2.0 Index]