locate_event {ulex} | R Documentation |
Locate Event
Description
Locate Event
Usage
locate_event(
text,
landmark_gazetteer,
landmark_gazetteer.name_var = "name",
landmark_gazetteer.type_var = "type",
roads,
roads.name_var = "name",
areas,
areas.name_var = "name",
event_words,
prepositions_list = list(c("at", "next to", "around", "just after", "opposite", "opp",
"apa", "hapa", "happened at", "just before", "at the", "outside", "right before"),
c("near", "after", "toward", "along", "towards", "approach"), c("past", "from",
"on")),
junction_words = c("intersection", "junction"),
false_positive_phrases = "",
type_list = NULL,
clost_dist_thresh = 500,
fuzzy_match = TRUE,
fuzzy_match.min_word_length = c(5, 11),
fuzzy_match.dist = c(1, 2),
fuzzy_match.ngram_max = 3,
fuzzy_match.first_letters_same = TRUE,
fuzzy_match.last_letters_same = TRUE,
quiet = TRUE,
mc_cores = 1
)
Arguments
text |
Vector of texts to be geolocated. |
landmark_gazetteer |
|
landmark_gazetteer.name_var |
Name of variable indicating |
landmark_gazetteer.type_var |
Name of variable indicating |
roads |
|
roads.name_var |
Name of variable indicating |
areas |
|
areas.name_var |
Name of variable indicating |
event_words |
Vector of event words, representing events to be geocoded. |
prepositions_list |
List of vectors of prepositions. Order of list determines order of preposition precedence. (Default: |
junction_words |
Vector of junction words to check for when determining intersection of roads. (Default: |
false_positive_phrases |
Common words found in text that include spurious location references (eg, githurai bus is the name of a bus, but githurai is also a place). These may be common phrases that should be checked and ignored in the text. (Default: |
type_list |
List of vectors of types. Order of list determines order or type precedence. (Default: |
clost_dist_thresh |
Distance (meters) as to what is considered "close"; for example, when considering whether a landmark is close to a road. (Default: |
fuzzy_match |
Whether to implement fuzzy matching of landmarks using levenstein distance. (Default: |
fuzzy_match.min_word_length |
Minimum word length to use for fuzzy matching; vector length must be the same as |
fuzzy_match.dist |
Allowable levenstein distances for fuzzy matching; vector length must be same as |
fuzzy_match.ngram_max |
The number of n-grams that should be extracted from text to calculate a levensteing distance against landmarks. For example, if the text is composed of 5 words: w1 w2 w3 w4 and |
fuzzy_match.first_letters_same |
When implementing a fuzzy match, should the first letter of the original and found word be the same? (Default: |
fuzzy_match.last_letters_same |
When implementing a fuzzy match, should the last letter of the original and found word be the same? (Default: |
quiet |
If |
mc_cores |
If > 1, uses geolocates events in parallel across multiple cores relying on the |
Value
sf
spatial dataframe of geolocated events.
Examples
library(ulex)
library(sf)
## Landmarks
landmarks_sf <- data.frame(lat = runif(3),
lon = runif(3),
name = c("restaurant", "bank", "hotel"),
type = c("poi", "poi", "poi")) |>
st_as_sf(coords = c("lon", "lat"),
crs = 4326)
## Road
coords <- matrix(runif(4), ncol = 2)
road_sf <- coords |>
st_linestring() |>
st_sfc(crs = 4326)
road_sf <- st_sf(geometry = road_sf)
road_sf$name <- "main st"
## Area
n <- 5
coords <- matrix(runif(2 * n, min = 0, max = 10), ncol = 2)
coords <- rbind(coords, coords[1,])
polygon <- st_polygon(list(coords))
area_sf <- st_sfc(polygon, crs = 4326)
area_sf <- st_sf(geometry = area_sf)
area_sf$name <- "place"
## Locate Event
event_sf <- locate_event(text = "accident near hotel",
landmark_gazetteer = landmarks_sf,
roads = road_sf,
areas = area_sf,
event_words = c("accident", "crash"))