R: Subset a Data Frame Using Value Labels

slab {labelr}

R Documentation

Subset a Data Frame Using Value Labels

Description

slab ("subset using labels") allows one to filter rows and select columns from a data.frame based on variable-specific value label attributes.

Usage

slab(data, condition, ...)

Arguments

`data`	the data.frame from which columns rows will be filtered (and, possibly, columns selected)
`condition`	row-filtering conditions along the lines of base::subset() and/or dplyr::filter(). Note: Row-filtering conditions (to include condition==NULL) must be supplied. Conditions involving value-labeled variables must be expressed in terms of the value labels (see examples), else try `ssubset.`
`...`	Optionally supply one or more unquoted, comma-separated column names that identify columns to be retained (Note: If no columns are listed, all columns will be retained). Note: While row-filtering conditions may leverage standard operators (e.g., &, \|, ==, !=), the column-selection portion of call may not incorporate special characters or symbols, such as quotes, parentheses, colons, minus signs, exclamation points, or other operators. Only positive selection of columns is permitted (negative selection – i.e., "select not / select-all-except" specified columns is not supported)

Details

slab does base::subset-style data subsetting using variable value label meta-data that are associated with variable values but are not themselves values (i.e., will not appear in response to View(), head(), etc.). In other words, value labels are supplied to the slab() call to direct the filtering process, but those value labels are not displayed in the cells of the returned data.frame – the raw values themselves are. This functionality may be useful for interactively subsetting a data.frame, where character value labels may be more intuitive and easily recalled than the underlying variable values themselves (e.g., raceth=="White" & gender="F" may be more intuitive or readily recalled than raceth==3 & gender==2).

slabtakes as its arguments a labelr value-labeled data.frame, followed by condition-based row-filtering instructions (required) and a list of unquoted names of variables to be retained (optional; all variables returned by default).

Note 1: When using slab, any conditional row-filtering syntax involving value-labeled variables must be expressed in terms of those variables' value labels, not the raw values themselves. Filtering on non-value-labeled variables is also permitted, with those variables' filtering conditions being expressed in terms of raw values. Further, slab() calls may reference both types of columns (i.e., value-labeled variables and non-value-labeled variables), provided filtering conditions for the former are expressed in terms of value labels.

Note 2: slab (and labelr more broadly) is intended for moderate-sized (or smaller) data.frames, defined loosely as those with a few million or fewer rows. With a conventional (c. 2024) laptop, labelr operations on modest- sized (~100K rows) take seconds (or less); with larger (> a few million rows) data.frames, labelr may take several minutes (or run out of memory and fail altogether!), depending on the complexity of the call and the number and type of cells implicated in it.

See also flab, use_val_labs, add_val_labs, add_val1,add_quant_labs, add_quant1,
get_val_labs, drop_val_labs. For label-preserving subsetting tools that subset in terms of raw values (not value labels), see sfilter, sbrac, ssubset, sdrop.

Value

a labelr label attribute-preserving data.frame consisting of the selected rows that meet the filtering condition(s) and the columns whose (unquoted, comma-separated) names are passed to dots (...) (or all columns if no column names are passed).

Examples

# make toy demographic (gender, raceth, etc.) data set
set.seed(555)
df <- make_demo_data(n = 1000)

# let's add variable VALUE labels for variable "raceth"
df <- add_val_labs(df,
  vars = "raceth", vals = c(1:7),
  labs = c("White", "Black", "Hispanic", "Asian", "AIAN", "Multi", "Other"),
  max.unique.vals = 50
)

# let's add variable VALUE labels for variable "gender"
# note that, if we are labeling a single variable, we can use add_val1()
# distinction between add_val1() and add_val_labs() will become more
# meaningful when we get to our Likert example
df <- add_val1(
  data = df, gender, vals = c(0, 1, 2, 3, 4),
  labs = c("M", "F", "TR", "NB", "Diff-Term"), max.unique.vals = 50
)

# see what we did
# get_val_labs(df)
get_val_labs(df, "gender")
get_val_labs(df, "raceth")

# use --labels-- to subset w/ slab() ("*S*ubset using *lab*els")
dflab <- slab(df, raceth == "Asian" & gender == "F", id, gender)
head(dflab, 4)

# equivalently, use --values--- to filter w/ sfilter() ("*S*afe filter")
dfsf <- ssubset(df, raceth == 3 & gender == 1, gender, raceth)
head(dfsf, 4)

[Package labelr version 0.1.7 Index]