R: Read an Arrow multi-file dataset and create 'sf' object

read_sf_dataset {sfarrow}

R Documentation

Read an Arrow multi-file dataset and create `sf` object

Description

Read an Arrow multi-file dataset and create sf object

Usage

read_sf_dataset(dataset, find_geom = FALSE)

Arguments

`dataset`	a `Dataset` object created by `arrow::open_dataset` or an `arrow_dplyr_query`
`find_geom`	logical. Only needed when returning a subset of columns. Should all available geometry columns be selected and added to to the dataset query without being named? Default is `FALSE` to require geometry column(s) to be selected specifically.

Details

This function is primarily for use after opening a dataset with arrow::open_dataset. Users can then query the arrow Dataset using dplyr methods such as filter or select. Passing the resulting query to this function will parse the datasets and create an sf object. The function expects consistent geographic metadata to be stored with the dataset in order to create sf objects.

Value

object of class sf

Examples

# read spatial object
nc <- sf::st_read(system.file("shape/nc.shp", package="sf"), quiet = TRUE)

# create random grouping
nc$group <- sample(1:3, nrow(nc), replace = TRUE)

# use dplyr to group the dataset. %>% also allowed
nc_g <- dplyr::group_by(nc, group)

# write out to parquet datasets
tf <- tempfile()  # create temporary location
on.exit(unlink(tf))
# partitioning determined by dplyr 'group_vars'
write_sf_dataset(nc_g, path = tf)

list.files(tf, recursive = TRUE)

# open parquet files from dataset
ds <- arrow::open_dataset(tf)

# create a query. %>% also allowed
q <- dplyr::filter(ds, group == 1)

# read the dataset (piping syntax also works)
nc_d <- read_sf_dataset(dataset = q)

nc_d
plot(sf::st_geometry(nc_d))

[Package sfarrow version 0.4.1 Index]

Read an Arrow multi-file dataset and create sf object