bnumrange {bread} | R Documentation |
Pre-filters a data file using column numerical range before loading it in memory
Description
Simple wrapper for data.table::fread() allowing to filter data by numerical value from a file with the Unix 'awk' command. This method is useful if you want to load a file too large for your available memory (and encounter the 'cannot allocate vector of size' error #' for example).
Usage
bnumrange(
file = NULL,
range_min = NULL,
range_max = NULL,
numrange_columns = NULL,
...
)
Arguments
file |
String. Name or full path to a file compatible with data.table::fread() |
range_min |
Vector of numeric. One or several minimal values used to filter (inclusively, as in superior OR EQUAL to that value) the data from the input file. Each element of the vector should correspond to the numrange_column to be filtered. |
range_max |
Vector of numeric. One or several maximal values used to filter (inclusively, as in inferior OR EQUAL to that value) the data from the input file. Each element of the vector should correspond to the numrange_column to be filtered. |
numrange_columns |
Vector of strings or numeric. The columns to be filtered should be indicated through their names or their index number. Each element of the vector should correspond to the range_min and range_man values with which it will be filtered. |
... |
Arguments that must be passed to data.table::fread() like 'sep' and 'dec'. |
Value
A dataframe
Warning
The value comparisons are inclusive, meaning inferior/superior OR EQUAL
Examples
file <- system.file('extdata', 'test.csv', package = 'bread')
## Filtering with only min value
## Filtering on 2 columns
bnumrange(file = file, range_min = c(2006, 1500), range_max = c(2010, 1990),
numrange_columns = c(1,3))
bnumrange(file = file, range_min = c(2000, 1500), range_max = c(2005, 1990),
numrange_columns = c('YEAR', 'PRICE'), sep = ';')