filter_cip {midfieldr}R Documentation

Subset rows that include matches to search strings

Description

Subset a CIP data frame, retaining rows that match or partially match a vector of character strings. Columns are not subset unless selected in an optional argument.

Usage

filter_cip(keep_text = NULL, ..., drop_text = NULL, cip = NULL, select = NULL)

Arguments

keep_text

Character vector of search text for retaining rows, not case-sensitive. Can be empty if drop_text is used.

...

Not used for passing values; forces subsequent arguments to be referable only by name.

drop_text

Optional character vector of search text for dropping rows, default NULL.

cip

Data frame to be searched. Default cip.

select

Optional character vector of column names to return, default all columns.

Details

Search terms can include regular expressions. Uses grepl(), therefore non-character columns (if any) that can be coerced to character are also searched for matches. Columns are subset by the values in select after the search concludes.

If none of the optional arguments are specified, the function returns the original data frame.

Value

A data frame in data.table format, a subset of cip, with the following properties: exclude rows that match elements of drop_text; of the remaining rows, include those that match elements of keep_text; if select is empty, all columns are preserved, otherwise only columns included in select are retained; grouping structures are not preserved.

Examples

# Subset using keywords
filter_cip(keep_text = "engineering")


    # Multiple passes to narrow the results
    first_pass <- filter_cip("civil")
    second_pass <- filter_cip("engineering", cip = first_pass)
    filter_cip(drop_text = "technology", cip = second_pass)
    
    # drop_text argument, when used, must be named
    filter_cip("civil engineering", drop_text = "technology")
    
    # Subset using numerical codes
    filter_cip(keep_text = c("050125", "160501"))
    
    # Subset using regular expressions
    filter_cip(keep_text = "^54")
    filter_cip(keep_text = c("^1407", "^1408"))
    
    # Select columns
    filter_cip(keep_text = "^54", select = c("cip6", "cip4name"))


[Package midfieldr version 1.0.2 Index]