R: Read chunkwise data from text files

read_csv_chunkwise {chunked}

R Documentation

Read chunkwise data from text files

Description

read_csv_chunk will open a connection to a text file. Subsequent dplyr verbs and commands are recorded until collect, write_csv_chunkwise is called. In that case the recorded commands will be executed chunk by chunk. This

Usage

read_csv_chunkwise(
  file,
  chunk_size = 10000L,
  header = TRUE,
  sep = ",",
  dec = ".",
  stringsAsFactors = FALSE,
  ...
)

read_csv2_chunkwise(
  file,
  chunk_size = 10000L,
  header = TRUE,
  sep = ";",
  dec = ",",
  ...
)

read_table_chunkwise(
  file,
  chunk_size = 10000L,
  header = TRUE,
  sep = " ",
  dec = ".",
  ...
)

read_laf_chunkwise(laf, chunk_size = 10000L)

Arguments

`file`	path of texst file
`chunk_size`	size of the chunks te be read
`header`	Does the csv file have a header with column names?
`sep`	field separator to be used
`dec`	decimal separator to be used
`stringsAsFactors`	`logical` should string be read as factors?
`...`	not used `read_laf_chunkwise` reads chunkwise from a LaF object created with `laf_open`. It offers more control over data specification.
`laf`	laf object created using LaF

Details

read_csv_chunkwise can be best combined with write_csv_chunkwise or insert_chunkwise_into (see example)

Examples

# create csv file for demo purpose
in_file <- file.path(tempdir(), "in.csv")
write.csv(women, in_file, row.names = FALSE, quote = FALSE)

#
women_chunked <-
  read_chunkwise(in_file) %>%  #open chunkwise connection
  mutate(ratio = weight/height) %>%
  filter(ratio > 2) %>%
  select(height, ratio) %>%
  inner_join(data.frame(height=63:66)) # you can join with data.frames!

# no processing done until
out_file <- file.path(tempdir(), "processed.csv")
women_chunked %>%
  write_chunkwise(file=out_file)

head(women_chunked) # works (without processing all data...)

iris_file <- file.path(tempdir(), "iris.csv")
write.csv(iris, iris_file, row.names = FALSE, quote= FALSE)

iris_chunked <-
  read_chunkwise(iris_file, chunk_size = 49) %>% # 49 for demo purpose
  group_by(Species) %>%
  summarise(sepal_length = mean(Sepal.Length), n=n()) # note that mean is per chunk

[Package chunked version 0.6.0 Index]