R: Perform a spatial count-by-key operation based on two Sedona...

sedona_spatial_join_count_by_key {apache.sedona}

R Documentation

Perform a spatial count-by-key operation based on two Sedona spatial RDDs.

Description

For each element p from spatial_rdd, count the number of unique elements q from query_window_rdd such that (p, q) satisfies the spatial relation specified by join_type.

Usage

sedona_spatial_join_count_by_key(
  spatial_rdd,
  query_window_rdd,
  join_type = c("contain", "intersect"),
  partitioner = c("quadtree", "kdbtree"),
  index_type = c("quadtree", "rtree")
)

Arguments

`spatial_rdd`	Spatial RDD containing geometries to be queried.
`query_window_rdd`	Spatial RDD containing the query window(s).
`join_type`	Type of the join query (must be either "contain" or "intersect"). If `join_type` is "contain", then a geometry from `spatial_rdd` will match a geometry from the `query_window_rdd` if and only if the former is fully contained in the latter. If `join_type` is "intersect", then a geometry from `spatial_rdd` will match a geometry from the `query_window_rdd` if and only if the former intersects the latter.
`partitioner`	Spatial partitioning to apply to both `spatial_rdd` and `query_window_rdd` to facilitate the join query. Can be either a grid type (currently "quadtree" and "kdbtree" are supported) or a custom spatial partitioner object. If `partitioner` is NULL, then assume the same spatial partitioner has been applied to both `spatial_rdd` and `query_window_rdd` already and skip the partitioning step.
`index_type`	Controls how `spatial_rdd` and `query_window_rdd` will be indexed (unless they are indexed already). If "NONE", then no index will be constructed and matching geometries will be identified in a doubly nested- loop iterating through all possible pairs of elements from `spatial_rdd` and `query_window_rdd`, which will be inefficient for large data sets.

Value

A spatial RDD containing the join-count-by-key results.

Examples

library(sparklyr)
library(apache.sedona)

sc <- spark_connect(master = "spark://HOST:PORT")

if (!inherits(sc, "test_connection")) {
  input_location <- "/dev/null" # replace it with the path to your input file
  rdd <- sedona_read_dsv_to_typed_rdd(
    sc,
    location = input_location,
    delimiter = ",",
    type = "point",
    first_spatial_col_index = 1L
  )
  query_rdd_input_location <- "/dev/null" # replace it with the path to your input file
  query_rdd <- sedona_read_shapefile_to_typed_rdd(
    sc,
    location = query_rdd_input_location,
    type = "polygon"
  )
  join_result_rdd <- sedona_spatial_join_count_by_key(
    rdd,
    query_rdd,
    join_type = "intersect",
    partitioner = "quadtree"
  )
}