R: Describe Number of Cases in Data by Group

ncases_desc {quest}

R Documentation

Describe Number of Cases in Data by Group

Description

ncases_desc computes descriptive statistics about the number of cases by group in a data.frame. This is often done in diary studies to obtain information about compliance for the sample. Through the use of the ov.min, prop, and inclusive arguments, the user can specify how many missing values are allowed in a row for it to be counted. ncases_desc is simply ncases_by + psych::describe.

Usage

ncases_desc(
  data,
  vrb.nm = str2str::pick(names(data), val = grp.nm, not = TRUE),
  grp.nm,
  ov.min = 1,
  prop = TRUE,
  inclusive = TRUE,
  interp = FALSE,
  skew = TRUE,
  ranges = TRUE,
  trim = 0.1,
  type = 3,
  quant = c(0.25, 0.75),
  IQR = FALSE
)

Arguments

`data`	data.frame of data.
`vrb.nm`	character vector of colnames from `data` specifying the set of variables to base the ncases on.
`grp.nm`	character vector of colnames from `data` specifying the grouping variables.
`ov.min`	minimum frequency of observed values required per row. If `prop` = TRUE, then this is a decimal between 0 and 1. If `prop` = FALSE, then this is a integer between 0 and `length(vrb.nm)`.
`prop`	logical vector of length 1 specifying whether `ov.min` should refer to the proportion of observed values (TRUE) or the count of observed values (FALSE).
`inclusive`	logical vector of length 1 specifying whether the case should be included if the frequency of observed values in a row is exactly equal to `ov.min`.
`interp`	logical vector of length 1 specifying whether the median should be standard (FALSE) or interpolated (TRUE).
`skew`	logical vector of length 1 specifying whether skewness and kurtosis should be calculated (TRUE) or not (FALSE).
`ranges`	logical vector of length 1 specifying whether the minimum, maximum, and range (i.e., maximum - minimum) should be calculated (TRUE) or not (FALSE). Note, if `ranges` = FALSE, the trimmed mean and median absolute deviation is also not computed as per the `psych::describe` function behavior.
`trim`	numeric vector of length 1 specifying the top and bottom quantiles of data that are to be excluded when calculating the trimmed mean. For example, the default value of 0.1 means that only data within the 10th - 90th quantiles are used for calculating the trimmed mean.
`type`	numeric vector of length 1 specifying the type of skewness and kurtosis coefficients to compute. See the details of `psych::describe`. The options are 1, 2, or 3.
`quant`	numeric vector specifying the quantiles to compute. Foe example, the default value of c(0.25, 0.75) computes the 25th and 75th quantiles of the group number of cases. If `quant` = NULL, then no quantiles are returned.
`IQR`	logical vector of length 1 specifying whether to compute the Interquartile Range (TRUE) or not (FALSE), which is simply the 75th quantil - 25th quantile.

Value

numeric vector containing descriptive statistics about number of cases by group. Note, which elements are returned depends on the arguments. See each argument's description.

n: number of groups
mean: mean
sd: standard deviation
median: median (standard if interp = FALSE, interpolated if interp = TRUE)
trimmed: trimmed mean based on trim
mad: median absolute difference
min: minimum
max: maximum
range: maximum - minumum
skew: skewness
kurtosis: kurtosis
se: standard error of the mean
IQR: 75th quantile - 25th quantile
QX.XX: quantiles, which are named by quant (e.g., 0.25 = "Q0.25")

Examples

tmp_nm <- c("outcome","case","session","trt_time")
dat <- as.data.frame(lmeInfo::Bryant2016)[tmp_nm]
stats_by <- psych::statsBy(dat, group = "case") # doesn't include everything you want
ncases_desc(data = dat, grp.nm = "case")
dat2 <- as.data.frame(ChickWeight)
ncases_desc(data = dat2, grp.nm = "Chick")
ncases_desc(data = dat2, grp.nm = "Chick", trim = .05)
ncases_desc(data = dat2, grp.nm = "Chick", ranges = FALSE)
ncases_desc(data = dat2, grp.nm = "Chick", quant = NULL)
ncases_desc(data = dat2, grp.nm = "Chick", IQR = TRUE)