split_map_filter_reduce {libbib} | R Documentation |
Split, Map, Filter, and Reduce a string vector
Description
This function takes a vector of strings, splits those strings on a particular character; string; or regex patters, applies a user-specified function to each sub-element of the now split element, filters those sub-elements using a user-specified function, and, finally, recombines each element's sub-elements using a user specified reduction function.
Usage
split_map_filter_reduce(
x,
sep = ";",
fixed = TRUE,
mapfun = identity,
filterfun = identity,
reduxfun = car,
cl = 0
)
Arguments
x |
A vector of strings |
sep |
A character to use containing a character, string, or
regular expression pattern to split each element by.
If |
fixed |
Should it be split by a fixed string/character or
a regular expression (default is |
mapfun |
A vectorized function that will be applied to the
sub-elements (after splitting) of each element in x
(default is |
filterfun |
A vectorized function that, when given a vector
returns the same vector with un-wanted elements
removed
(default is |
reduxfun |
A vectorized function that, when given a vector,
will combine all of it's elements into one value
(default is |
cl |
An integer to indicate the number of child processes should be used to parallelize the work-load. If 0, the workload will not be parallelized. Can also take a cluster object created by 'makeCluster' (default is 0) |
Details
Since this operation cannot be vectorized, if the user specifies
a non-zero cl
argument, the workload will be parallelized
and cl
many child processes will be spawned to do the work.
The package pbapply
will be used to do this.
See examples
for more information and ideas on why this
might be useful for, as an example, batch normalizing ISBNs that,
for each bibliographic record, is separated by a semicolon
Value
Returns a vector
See Also
Examples
someisbns <- c("9782711875177;garbage-isbn;2711875172;2844268900",
"1861897952; 978-1-86189-795-4")
# will return only the first ISBN for each record
split_map_filter_reduce(someisbns)
# "9782711875177" "1861897952"
# will return only the first ISBN for each record, after normalizing
# each ISBN
split_map_filter_reduce(someisbns, mapfun=function(x){normalize_isbn(x, convert.to.isbn.13=TRUE)})
# "9782711875177" "9781861897954"
# will return all ISBNs, for each record, separated by a semicolon
# after applying normalize_isbn to each ISBN
# note the duplicates introduced after normalization occurs
split_map_filter_reduce(someisbns, mapfun=function(x){normalize_isbn(x, convert.to.isbn.13=TRUE)},
reduxfun=recombine_with_sep_closure())
# "9782711875177;NA;9782711875177;9782844268907" "9781861897954;9781861897954"
# After splitting each items ISBN list by semicolon, this runs
# normalize_isbn in each of them. Duplicates are produced when
# an ISBN 10 converts to an ISBN 13 that is already in the ISBN
# list for the item. NAs are produced when an ISBN fails to normalize.
# Then, all duplicates and NAs are removed. Finally, the remaining
# ISBNs, for each record, are pasted together using a space as a separator
split_map_filter_reduce(someisbns, mapfun=function(x){normalize_isbn(x, convert.to.isbn.13=TRUE)},
filterfun=remove_duplicates_and_nas,
reduxfun=recombine_with_sep_closure(" "))
# "9782711875177 9782844268907" "9781861897954"