R: Locate Event

locate_event {ulex}

R Documentation

Locate Event

Description

Locate Event

Usage

locate_event(
  text,
  landmark_gazetteer,
  landmark_gazetteer.name_var = "name",
  landmark_gazetteer.type_var = "type",
  roads,
  roads.name_var = "name",
  areas,
  areas.name_var = "name",
  event_words,
  prepositions_list = list(c("at", "next to", "around", "just after", "opposite", "opp",
    "apa", "hapa", "happened at", "just before", "at the", "outside", "right before"),
    c("near", "after", "toward", "along", "towards", "approach"), c("past", "from",
    "on")),
  junction_words = c("intersection", "junction"),
  false_positive_phrases = "",
  type_list = NULL,
  clost_dist_thresh = 500,
  fuzzy_match = TRUE,
  fuzzy_match.min_word_length = c(5, 11),
  fuzzy_match.dist = c(1, 2),
  fuzzy_match.ngram_max = 3,
  fuzzy_match.first_letters_same = TRUE,
  fuzzy_match.last_letters_same = TRUE,
  quiet = TRUE,
  mc_cores = 1
)

Arguments

`text`	Vector of texts to be geolocated.
`landmark_gazetteer`	`sf` spatial data.frame representing landmarks.
`landmark_gazetteer.name_var`	Name of variable indicating `name` of landmark.
`landmark_gazetteer.type_var`	Name of variable indicating `type` of landmark.
`roads`	`sf` spatial data.frame representing roads.
`roads.name_var`	Name of variable indicating `name` of road.
`areas`	`sf` spatial data.frame representing areas, such as administrative areas or neighborhoods.
`areas.name_var`	Name of variable indicating `name` of area.
`event_words`	Vector of event words, representing events to be geocoded.
`prepositions_list`	List of vectors of prepositions. Order of list determines order of preposition precedence. (Default: `list(c("at", "next to","around", "just after", "opposite","opp", "apa", "hapa","happened at", "just before","at the","outside", "right before"), c("near", "after", "toward", "along", "towards", "approach"), c("past","from","on"))`).
`junction_words`	Vector of junction words to check for when determining intersection of roads. (Default: `c("intersection", "junction")`).
`false_positive_phrases`	Common words found in text that include spurious location references (eg, githurai bus is the name of a bus, but githurai is also a place). These may be common phrases that should be checked and ignored in the text. (Default: `""`).
`type_list`	List of vectors of types. Order of list determines order or type precedence. (Default: `NULL`).
`clost_dist_thresh`	Distance (meters) as to what is considered "close"; for example, when considering whether a landmark is close to a road. (Default: `500`).
`fuzzy_match`	Whether to implement fuzzy matching of landmarks using levenstein distance. (Default: `TRUE`).
`fuzzy_match.min_word_length`	Minimum word length to use for fuzzy matching; vector length must be the same as `fuzzy_match.dist`. (Default: `c(5,11)`).
`fuzzy_match.dist`	Allowable levenstein distances for fuzzy matching; vector length must be same as `fuzzy_match.min_word_length`. (Default: `c(1,2)`).
`fuzzy_match.ngram_max`	The number of n-grams that should be extracted from text to calculate a levensteing distance against landmarks. For example, if the text is composed of 5 words: w1 w2 w3 w4 and `fuzzy_match.ngram_max = 3`, the function extracts `⁠w1 w2 w3⁠` and compares the levenstein distance to all landmarks. Then in checks `⁠w2 w3 w4⁠`, etc. (Default: `3`).
`fuzzy_match.first_letters_same`	When implementing a fuzzy match, should the first letter of the original and found word be the same? (Default: `TRUE`).
`fuzzy_match.last_letters_same`	When implementing a fuzzy match, should the last letter of the original and found word be the same? (Default: `TRUE`).
`quiet`	If `FALSE`, prints text that is being geocoded. (Default: `TRUE`).
`mc_cores`	If > 1, uses geolocates events in parallel across multiple cores relying on the `parallel` package. (Default: `1`).

Value

sf spatial dataframe of geolocated events.

Examples

library(ulex)
library(sf)

## Landmarks
landmarks_sf <- data.frame(lat = runif(3),
                           lon = runif(3),
                           name = c("restaurant", "bank", "hotel"),
                           type = c("poi", "poi", "poi")) |>
  st_as_sf(coords = c("lon", "lat"),
           crs = 4326)

## Road
coords <- matrix(runif(4), ncol = 2)
road_sf <- coords |>
  st_linestring() |>
  st_sfc(crs = 4326)
road_sf <- st_sf(geometry = road_sf)
road_sf$name <- "main st"

## Area
n <- 5
coords <- matrix(runif(2 * n, min = 0, max = 10), ncol = 2)
coords <- rbind(coords, coords[1,])
polygon <- st_polygon(list(coords))
area_sf <- st_sfc(polygon, crs = 4326)
area_sf <- st_sf(geometry = area_sf)
area_sf$name <- "place"

## Locate Event
event_sf <- locate_event(text = "accident near hotel",
                         landmark_gazetteer = landmarks_sf,
                         roads = road_sf,
                         areas = area_sf,
                         event_words = c("accident", "crash"))

[Package ulex version 0.1.0 Index]