flab {labelr} | R Documentation |
Filter Data Frame Rows Using Variable Value Labels
Description
flab
("filter using labels") allows one to filter-subset a data.frame
based on variable-specific value label attributes.
Usage
flab(data, condition)
Arguments
data |
the data.frame from which columns will be selected. |
condition |
row-filtering conditions along the lines of base::subset() and/or dplyr::filter(), which may involve a combination of value labels (for value-labeled variables only) and actual values (for non-value-labeled variables only). |
Details
flab
accepts a labelr value-labeled data.frame, followed by condition-
based row-filtering instructions (akin to base::subset
or dplyr::filter
),
expressed in terms of variable value labels that exist only as meta-data
(i.e., not visible using View()
, head()
, etc.), and returns the filtered
data.frame in terms of the values themselves. In other words, value labels
are supplied to the flab()
call to direct the filtering process, but those
value labels are not displayed in the cells of the returned data.frame – the
raw values themselves are. This functionality may be useful for interactively
subsetting a data.frame, where character value labels may be more intuitive
and easily recalled than the underlying variable values themselves (e.g.,
raceth=="White" & gender="F" may be more intuitive or readily recalled than
raceth==3 & gender==2).
Note 1: When using flab
, any conditional row-filtering syntax involving
value-labeled variables must be expressed in terms of those variables' value
labels, not the raw values themselves. Filtering on non-value-labeled
variables is also permitted, with those variables' filtering conditions being
expressed in terms of raw values. Further, flab()
calls may reference both
types of columns (i.e., value-labeled variables and non-value-labeled
variables), provided filtering conditions for the former are expressed in
terms of value labels.
Note 2: flab
(and labelr more broadly) is intended for moderate-sized (or
smaller) data.frames, defined loosely as those with a few million or fewer
rows. With a conventional (c. 2024) laptop, labelr operations on modest-
sized (~100K rows) take seconds (or less); with larger (> a few million rows)
data.frames, labelr may take several minutes (or run out of memory and fail
altogether!), depending on the complexity of the call and the number and type
of cells implicated in it.
See also slab
, use_val_labs
, add_val_labs
, add_val1
, add_quant_labs
, add_quant1
,
get_val_labs
, drop_val_labs
. For label-preserving subsetting tools that subset in
terms of raw values (not value labels), see sfilter
, sbrac
, ssubset
, sdrop
.
Value
a labelr label attribute-preserving data.frame consisting of the selected rows that meet the filtering condition(s).
Examples
# make toy demographic (gender, raceth, etc.) data set
set.seed(555)
df <- make_demo_data(n = 1000) # another labelr:: function
# let's add variable VALUE labels for variable "raceth"
df <- add_val_labs(df,
vars = "raceth", vals = c(1:7),
labs = c("White", "Black", "Hispanic", "Asian", "AIAN", "Multi", "Other"),
max.unique.vals = 50
)
# let's add variable VALUE labels for variable "gender"
# note that, if we are labeling a single variable, we can use add_val1()
# distinction between add_val1() and add_val_labs() will become more meaningful
# when we get to our Likert example
df <- add_val1(
data = df, gender, vals = c(0, 1, 2, 3, 4),
labs = c("M", "F", "TR", "NB", "Diff-Term"), max.unique.vals = 50
)
# see what we did
# get_val_labs(df)
get_val_labs(df, "gender")
get_val_labs(df, "raceth")
# use --labels-- to filter w/ flab() ("*F*ilter *lab*el")
dflab <- flab(df, raceth == "Asian" & gender == "F")
head(dflab, 4)
# equivalently, use --values--- to filter w/ sfilter() ("*S*afe filter")
dfsf <- sfilter(df, raceth == 3 & gender == 1)
head(dfsf, 4)