colSums_if {quest} | R Documentation |
Column Sums Conditional on Frequency of Observed Values
Description
colSums_if
calculates the sum of every column in a numeric or logical
matrix conditional on the frequency of observed data. If the frequency of
observed values in that column is less than (or equal to) that specified by
ov.min
, then NA is returned for that column. It also has the option to
return a value other than 0 (e.g., NA) when all columns are NA, which differs
from colSums(x, na.rm = TRUE)
.
Usage
colSums_if(
x,
ov.min = 1,
prop = TRUE,
inclusive = TRUE,
impute = TRUE,
allNA = NA_real_
)
Arguments
x |
numeric or logical matrix. If not a matrix, it will be coerced to one. |
ov.min |
minimum frequency of observed values required per column. If
|
prop |
logical vector of length 1 specifying whether |
inclusive |
logical vector of length 1 specifying whether the sum should
be calculated if the frequency of observed values in a column is exactly
equal to |
impute |
logical vector of length 1 specifying if missing values should
be imputed with the mean of observed values of |
allNA |
numeric vector of length 1 specifying what value should be
returned for columns that are all NA. This is most applicable when
|
Details
Conceptually this function does: apply(X = x, MARGIN = 2, FUN = sum_if,
ov.min = ov.min, prop = prop, inclusive = inclusive)
. But for computational
efficiency purposes it does not because then the observed values conditioning
would not be vectorized. Instead, it uses colSums
and then inserts NAs
for columns that have too few observed values.
Value
numeric vector of length = ncol(x)
with names =
colnames(x)
providing the sum of each column or NA depending on the
frequency of observed values.
See Also
colMeans_if
rowSums_if
rowMeans_if
colSums
Examples
colSums_if(airquality)
colSums_if(x = airquality, ov.min = 150, prop = FALSE)
x <- data.frame("x" = c(1, 2, NA), "y" = c(1, NA, NA), "z" = c(NA, NA, NA))
colSums_if(x)
colSums_if(x, ov.min = 0)
colSums_if(x, ov.min = 0, allNA = 0)
identical(x = colSums(x, na.rm = TRUE),
y = colSums_if(x, impute = FALSE, ov.min = 0, allNA = 0)) # identical to
# colSums(x, na.rm = TRUE)