step_geodist {recipes} | R Documentation |
Distance between two locations
Description
step_geodist()
creates a specification of a recipe step that will
calculate the distance between points on a map to a reference location.
Usage
step_geodist(
recipe,
lat = NULL,
lon = NULL,
role = "predictor",
trained = FALSE,
ref_lat = NULL,
ref_lon = NULL,
is_lat_lon = TRUE,
log = FALSE,
name = "geo_dist",
columns = NULL,
keep_original_cols = TRUE,
skip = FALSE,
id = rand_id("geodist")
)
Arguments
recipe |
A recipe object. The step will be added to the sequence of operations for this recipe. |
lon , lat |
Selector functions to choose which variables are
used by the step. See |
role |
For model terms created by this step, what analysis role should they be assigned? By default, the new columns created by this step from the original variables will be used as predictors in a model. |
trained |
A logical to indicate if the quantities for preprocessing have been estimated. |
ref_lon , ref_lat |
Single numeric values for the location of the reference point. |
is_lat_lon |
A logical: Are coordinates in latitude and longitude? If
|
log |
A logical: should the distance be transformed by the natural log function? |
name |
A single character value to use for the new predictor column. If a column exists with this name, an error is issued. |
columns |
A character string of the selected variable names. This field
is a placeholder and will be populated once |
keep_original_cols |
A logical to keep the original variables in the
output. Defaults to |
skip |
A logical. Should the step be skipped when the
recipe is baked by |
id |
A character string that is unique to this step to identify it. |
Details
step_geodist
uses the Pythagorean theorem to calculate Euclidean
distances if is_lat_lon
is FALSE. If is_lat_lon
is TRUE, the Haversine
formula is used to calculate the great-circle distance in meters.
Value
An updated version of recipe
with the new step added to the
sequence of any existing operations.
Tidying
When you tidy()
this step, a tibble is returned with
columns latitude
, longitude
, ref_latitude
, ref_longitude
,
is_lat_lon
, name
, and id
:
- latitude
character, name of latitude variable
- longitude
character, name of longitude variable
- ref_latitude
numeric, location of latitude reference point
- ref_longitude
numeric, location of longitude reference point
- is_lat_lon
character, the summary function name
- name
character, name of resulting variable
- id
character, id of this step
Case weights
The underlying operation does not allow for case weights.
References
https://en.wikipedia.org/wiki/Haversine_formula
See Also
Other multivariate transformation steps:
step_classdist()
,
step_classdist_shrunken()
,
step_depth()
,
step_ica()
,
step_isomap()
,
step_kpca()
,
step_kpca_poly()
,
step_kpca_rbf()
,
step_mutate_at()
,
step_nnmf()
,
step_nnmf_sparse()
,
step_pca()
,
step_pls()
,
step_ratio()
,
step_spatialsign()
Examples
data(Smithsonian, package = "modeldata")
# How close are the museums to Union Station?
near_station <- recipe(~., data = Smithsonian) %>%
update_role(name, new_role = "location") %>%
step_geodist(
lat = latitude, lon = longitude, log = FALSE,
ref_lat = 38.8986312, ref_lon = -77.0062457,
is_lat_lon = TRUE
) %>%
prep(training = Smithsonian)
bake(near_station, new_data = NULL) %>%
arrange(geo_dist)
tidy(near_station, number = 1)