filter_cip {midfieldr} | R Documentation |
Subset rows that include matches to search strings
Description
Subset a CIP data frame, retaining rows that match or partially match a vector of character strings. Columns are not subset unless selected in an optional argument.
Usage
filter_cip(keep_text = NULL, ..., drop_text = NULL, cip = NULL, select = NULL)
Arguments
keep_text |
Character vector of search text for retaining rows,
not case-sensitive. Can be empty if |
... |
Not used for passing values; forces subsequent arguments to be referable only by name. |
drop_text |
Optional character vector of search text for dropping rows, default NULL. |
cip |
Data frame to be searched. Default |
select |
Optional character vector of column names to return, default all columns. |
Details
Search terms can include regular expressions. Uses grepl()
, therefore
non-character columns (if any) that can be coerced to character are also
searched for matches. Columns are subset by the values in select
after the
search concludes.
If none of the optional arguments are specified, the function returns the original data frame.
Value
A data frame in data.table
format, a subset of cip
,
with the following properties: exclude rows that match
elements of drop_text
; of the remaining rows, include those that
match elements of keep_text
; if select
is empty, all columns are
preserved, otherwise only columns included in select
are retained;
grouping structures are not preserved.
Examples
# Subset using keywords
filter_cip(keep_text = "engineering")
# Multiple passes to narrow the results
first_pass <- filter_cip("civil")
second_pass <- filter_cip("engineering", cip = first_pass)
filter_cip(drop_text = "technology", cip = second_pass)
# drop_text argument, when used, must be named
filter_cip("civil engineering", drop_text = "technology")
# Subset using numerical codes
filter_cip(keep_text = c("050125", "160501"))
# Subset using regular expressions
filter_cip(keep_text = "^54")
filter_cip(keep_text = c("^1407", "^1408"))
# Select columns
filter_cip(keep_text = "^54", select = c("cip6", "cip4name"))