bsubset {bread}R Documentation

Pre-subsets rows of a data file by index number before loading it in memory

Description

Simple wrapper for data.table::fread() allowing to subset rows of data from a file with the Unix 'sed' or 'awk' commands. This method is useful if you want to load a file too large for your available memory (and encounter the 'cannot allocate vector of size' error). You need to input the index number of the first and last rows you want to load in memory with fread(), or alternatively use either the head or tail arguments to subset the first or last rows of the file.

Usage

bsubset(
  file = NULL,
  head = NULL,
  tail = NULL,
  first_row = NULL,
  last_row = NULL,
  ...
)

Arguments

file

String. Full path to a file

head

Numeric. How many rows starting from the first in the file.

tail

Numeric. How many rows starting from the last in the file.

first_row

Numeric. First row of the portion of the file to subset.

last_row

Numeric. Last row of the portion of the file to subset.

...

Arguments that must be passed to data.table::fread() like 'sep'.

Value

A dataframe containing the subsetted rows

Examples

file <- system.file('extdata', 'test.csv', package = 'bread')
## Head or Tail... for the first n or last n rows
bsubset(file = file, head = 5)
## Subset from the middle of a file
bsubset(file = file, first_row = 5, last_row = 10)
## first_row defaults as 1 and last_row as the last row of the file
bsubset(file = file, first_row = 5)
bsubset(file = file, last_row = 10)

[Package bread version 0.4.1 Index]