locate_event {ulex}R Documentation

Locate Event

Description

Locate Event

Usage

locate_event(
  text,
  landmark_gazetteer,
  landmark_gazetteer.name_var = "name",
  landmark_gazetteer.type_var = "type",
  roads,
  roads.name_var = "name",
  areas,
  areas.name_var = "name",
  event_words,
  prepositions_list = list(c("at", "next to", "around", "just after", "opposite", "opp",
    "apa", "hapa", "happened at", "just before", "at the", "outside", "right before"),
    c("near", "after", "toward", "along", "towards", "approach"), c("past", "from",
    "on")),
  junction_words = c("intersection", "junction"),
  false_positive_phrases = "",
  type_list = NULL,
  clost_dist_thresh = 500,
  fuzzy_match = TRUE,
  fuzzy_match.min_word_length = c(5, 11),
  fuzzy_match.dist = c(1, 2),
  fuzzy_match.ngram_max = 3,
  fuzzy_match.first_letters_same = TRUE,
  fuzzy_match.last_letters_same = TRUE,
  quiet = TRUE,
  mc_cores = 1
)

Arguments

text

Vector of texts to be geolocated.

landmark_gazetteer

sf spatial data.frame representing landmarks.

landmark_gazetteer.name_var

Name of variable indicating name of landmark.

landmark_gazetteer.type_var

Name of variable indicating type of landmark.

roads

sf spatial data.frame representing roads.

roads.name_var

Name of variable indicating name of road.

areas

sf spatial data.frame representing areas, such as administrative areas or neighborhoods.

areas.name_var

Name of variable indicating name of area.

event_words

Vector of event words, representing events to be geocoded.

prepositions_list

List of vectors of prepositions. Order of list determines order of preposition precedence. (Default: list(c("at", "next to","around", "just after", "opposite","opp", "apa", "hapa","happened at", "just before","at the","outside", "right before"), c("near", "after", "toward", "along", "towards", "approach"), c("past","from","on"))).

junction_words

Vector of junction words to check for when determining intersection of roads. (Default: c("intersection", "junction")).

false_positive_phrases

Common words found in text that include spurious location references (eg, githurai bus is the name of a bus, but githurai is also a place). These may be common phrases that should be checked and ignored in the text. (Default: "").

type_list

List of vectors of types. Order of list determines order or type precedence. (Default: NULL).

clost_dist_thresh

Distance (meters) as to what is considered "close"; for example, when considering whether a landmark is close to a road. (Default: 500).

fuzzy_match

Whether to implement fuzzy matching of landmarks using levenstein distance. (Default: TRUE).

fuzzy_match.min_word_length

Minimum word length to use for fuzzy matching; vector length must be the same as fuzzy_match.dist. (Default: c(5,11)).

fuzzy_match.dist

Allowable levenstein distances for fuzzy matching; vector length must be same as fuzzy_match.min_word_length. (Default: c(1,2)).

fuzzy_match.ngram_max

The number of n-grams that should be extracted from text to calculate a levensteing distance against landmarks. For example, if the text is composed of 5 words: w1 w2 w3 w4 and fuzzy_match.ngram_max = 3, the function extracts ⁠w1 w2 w3⁠ and compares the levenstein distance to all landmarks. Then in checks ⁠w2 w3 w4⁠, etc. (Default: 3).

fuzzy_match.first_letters_same

When implementing a fuzzy match, should the first letter of the original and found word be the same? (Default: TRUE).

fuzzy_match.last_letters_same

When implementing a fuzzy match, should the last letter of the original and found word be the same? (Default: TRUE).

quiet

If FALSE, prints text that is being geocoded. (Default: TRUE).

mc_cores

If > 1, uses geolocates events in parallel across multiple cores relying on the parallel package. (Default: 1).

Value

sf spatial dataframe of geolocated events.

Examples

library(ulex)
library(sf)

## Landmarks
landmarks_sf <- data.frame(lat = runif(3),
                           lon = runif(3),
                           name = c("restaurant", "bank", "hotel"),
                           type = c("poi", "poi", "poi")) |>
  st_as_sf(coords = c("lon", "lat"),
           crs = 4326)

## Road
coords <- matrix(runif(4), ncol = 2)
road_sf <- coords |>
  st_linestring() |>
  st_sfc(crs = 4326)
road_sf <- st_sf(geometry = road_sf)
road_sf$name <- "main st"

## Area
n <- 5
coords <- matrix(runif(2 * n, min = 0, max = 10), ncol = 2)
coords <- rbind(coords, coords[1,])
polygon <- st_polygon(list(coords))
area_sf <- st_sfc(polygon, crs = 4326)
area_sf <- st_sf(geometry = area_sf)
area_sf$name <- "place"

## Locate Event
event_sf <- locate_event(text = "accident near hotel",
                         landmark_gazetteer = landmarks_sf,
                         roads = road_sf,
                         areas = area_sf,
                         event_words = c("accident", "crash"))

[Package ulex version 0.1.0 Index]