R: Shortest Paths and/ or Distances between Locations

shortest_paths {spaths}

R Documentation

Shortest Paths and/ or Distances between Locations

Description

The shortest paths and/ or distance between locations in a grid according to Dijkstra's (1959) algorithm.

Usage

shortest_paths(
  rst,
  origins,
  destinations = NULL,
  output = c("distances", "lines", "both"),
  output_class = NULL,
  origin_names = NULL,
  destination_names = NULL,
  pairwise = FALSE,
  contiguity = c("queen", "rook"),
  spherical = TRUE,
  radius = 6371010,
  extent = NULL,
  dist_comp = c("spaths", "terra"),
  tr_fun = NULL,
  v_matrix = FALSE,
  tr_directed = TRUE,
  pre = TRUE,
  early_stopping = TRUE,
  bidirectional = FALSE,
  update_rst = NULL,
  touches = TRUE,
  ncores = NULL,
  par_lvl = c("update_rst", "points"),
  show_progress = FALSE,
  bar_limit = 150L,
  path_type = c("int", "unsigned short int"),
  distance_type = c("double", "float", "int", "unsigned short int")
)

Arguments

`rst`	SpatRaster (terra), RasterLayer (raster), matrix, or list of matrices object. RasterLayers are converted to SpatRasters. Pixels with non-NA values in all layers mark the cells through which the algorithm may pass. The values of `rst` can be accessed in `tr_fun`.
`origins`	Origin points at which the shortest paths start. If `rst` is a SpatRaster or RasterLayer object, these points can be passed as a single SpatVector (terra), sf (sf), or Spatial* (sp) object. sf and sp objects are converted to SpatVectors. Polygons and lines are converted to points using their centroid. If `rst` is a matrix or list of matrices, `origins` must be a single matrix, data.frame, or data.table of coordinates with columns named `"x"` and `"y"`. The coordinates must refer to points in the reference system that `rst` utilizes. Lines and polygons are thus not accepted in this case. Details on which points the function connects are outlined below.
`destinations`	Destination points to which the shortest paths are derived. It defaults to `NULL`, resulting in the function to compute shortest paths between the `origins` points. Otherwise, the same input rules as for `origins` apply. Details on which points the function connects are outlined below.
`output`	`"distances"` (default), `"lines"`, or `"both"`. `"distances"` lists the total transition costs along the shortest paths. By default, it is the distance between origin and destination in meters, if `rst` is an unprojected SpatRaster or RasterLayer or if `dist_comp = "terra"`. Otherwise, it is denoted in the projection's units. If you pass another function to `tr_fun`, the total transition cost is measured in the units of `tr_fun`'s results. `"lines"` returns the shortest paths as spatial lines. `"both"` returns both distances and lines. `"distances"` is faster and requires less RAM than `"lines"` or `"both"`.
`output_class`	Class of the returned object. With `output = "distances"`, the options are `"data.table"` (default) and `"data.frame"`. With `output = "lines"` or `output = "both"`, the options are `"SpatVector"` (default when `rst` is a SpatRaster or RasterLayer) and `"list"` (default when `rst` is a matrix or a list of matrices). In the case of `"list"`, the attributes, the line coordinates, and the CRS are returned as individual list elements. The first element in the list of line coordinates refers to the first row in the attributes table etc. `"SpatVector"` is only available with a SpatRaster or RasterLayer `rst`. `output_class` changes the format of the returned object, not the information it contains.
`origin_names`	Character specifying the name of the column in the `origins` object used to label the origins in the output object. It defaults to row numbers.
`destination_names`	Character specifying the name of the column in the `destinations` object used to label the destinations in the output object. It defaults to row numbers.
`pairwise`	Logical specifying whether to compute pairwise paths, if `origins` and `destinations` have equally many rows. If `TRUE`, the function computes the shortest path between the first origin and the first destination, the second origin and the second destination, etc. Otherwise, it derives the shortest paths from all origins to all destinations. `pairwise = TRUE` can alter the order in which results are returned. Check the output's origins and destinations columns for the respective order.
`contiguity`	`"queen"` (default) for queen's case contiguity or `"rook"` for rook's case contiguity. In the latter case, the algorithm only moves between horizontally or vertically adjacent cells. In the former case, it is also travels between diagonally adjacent cells. `"rook"` is more efficient than `"queen"` as it implies fewer edges.
`spherical`	Logical specifying whether coordinates are unprojected, i.e. lonlat, and refer to a sphere, e.g. a planet, if `rst` is a matrix or a list of matrices. It defaults to `TRUE`. If `FALSE`, the function assumes the coordinates to originate from a planar projection. This argument has no effect when `rst` is a SpatRaster or RasterLayer.
`radius`	Radius of the object, e.g. planet, if `spherical = TRUE`. This argument has no effect when `rst` is a SpatRaster or RasterLayer.
`extent`	Vector of length 4, specifying the extent of `rst`, if `rst` is a matrix or a list of matrices. It must contain xmin, xmax, ymin, and ymax, in that order. The argument has no effect when `rst` is a SpatRaster or RasterLayer.
`dist_comp`	Method to compute distances between adjacent cells. `"spaths"` (default) or `"terra"`. The default `"spaths"` uses spherical (Haversine) distances in case of lonlat data and planar (Euclidean) distances in case of projected (non-lonlat) data. The functions are optimized based on the fact that many inter-pixel distances are identical. Modelling the planet as a perfect sphere is in line with e.g. the s2 package, but is of course an oversimplification. With `"terra"`, the function derives distances via `terra::distance`. Because this computes all inter-pixel distances separately, it is slower than the `"spaths"` approach. It does take the non-spherical nature of the planet into account though. With `tr_fun`, you can specify a custom distance function that uses neither `"spaths"` nor `"terra"` distances.
`tr_fun`	The transition function based on which to compute edge weights, i.e. the travel cost between adjacent grid cells. Defaults to the geographic distance between pixel centroids. Permitted function parameter names are `d` (distance between the pixel centroids), `x1` (x coordinate or longitude of the first cell), `x2` (x coordinate or longitude of the second cell), `y1` (y coordinate or latitude of the first cell), `y2` (y coordinate or latitude of the second cell), `v1` (`rst` layers' values from the first cell), `v2` (`rst` layers' values from the second cell), and `nc` (number of CPU cores according to the `ncores` argument). If the data is unprojected, i.e. lonlat, or if `dist_comp = "terra"`, `d` is measured in meters. Otherwise, it uses the units of the CRS. If `rst` has one layer, the values are passed to `v1` and `v2` as vectors, otherwise they are passed as a data table where the first column refers to the first layer, the second column to the second layer etc. Note that data tables are data frames.
`v_matrix`	Logical specifying whether to pass values to `v1` and `v2` in `tr_fun` as matrices (`TRUE`) instead of data tables in the multi-layer case and vectors in the single-layer case (`FALSE`). It defaults to `FALSE`. Setting it to `TRUE` might e.g. be useful when defining `tr_fun` as a C++ Armadillo function.
`tr_directed`	Logical specifying whether `tr_fun` creates a directed graph. In a directed graph, transition costs can be asymmetric. Traveling from cells A to B may imply a different cost than traveling from B to A. It defaults to `TRUE` and only has an effect when `tr_fun` is not `NULL`. The default without `tr_fun` constructs an undirected graph.
`pre`	Logical specifying whether to compute the distances between neighboring cells before executing the shortest paths algorithm in C++. `pre` only has an effect, when no `tr_fun` is specified and `dist_comp = "spaths"`, as the distances are otherwise imported from R. `TRUE` (default) is in the vast majority of cases faster than `FALSE`. `FALSE` computes distances between neighboring cells while the shortest paths algorithm traverses the graph. This requires less RAM, but is slower than `TRUE`, unless `early_stopping = TRUE` and all points are close to each other. `TRUE`'s speed advantage is even larger, when `update_rst` is not `NULL`.
`early_stopping`	Logical specifying whether to stop the shortest path algorithm once the target cells are reached. It defaults to `TRUE`, which can be faster than `FALSE`, if the points are close to each other compared to the full set of `rst` cells. If at least one points pair is far from each other, `FALSE` is the faster setting. `FALSE` computes the distance to all cells and then extracts the distance to the target cells. It, therefore, does not check for each visited cell, whether it is in the set of targets. `TRUE` and `FALSE` produce the same result and only differ in terms of computational performance.
`bidirectional`	Logical specifying whether to produce paths or distances in both directions, if destinations are not specified and no directed transition function is given. In that case, the distance and the path from point A to point B is the same as the distance and path from point B to point A. `FALSE` (default) only returns distances or paths in one direction. Declaring `TRUE` returns distances or paths in both directions. This parameter's objective is to control the return object's RAM requirement. It only has an effect, if destinations are not specified and no directed transition function is given.
`update_rst`	Object updating `rst` with moving barriers. It defaults to `NULL`, corresponding to `rst` not being updated. If `rst` is a SpatRaster or RasterLayer, `update_rst` can be a SpatVector (terra), sf (sf), or Spatial* (sp) object, or a list of them. sf and sp objects are converted to SpatVectors. The function updates `rst` by setting any cell intersecting with `update_rst` to NA, thereby not allowing the shortest paths algorithm to pass through that cell. `update_rst` only sets non-NA cells to NA, not vice versa. The elements of `update_rst` always update the unmodified `rst`. I.e. if `update_rst` is a list of two polygons, the shortest paths are derived three times: once based on the not updated `rst`, denoted layer 0 in the output, once based on `rst` updated with the first polygon, referred to as layer 1, and once based on `rst` updated with the second polygon, termed layer 2. The second polygon updates the unmodified `rst`, not the `rst` updated by the first polygon. If `rst` is a matrix or a list of matrices, `update_rst` can be a vector of cell numbers, a matrix, or a list of either. Analogously to the SpatRaster case, these objects mark which cells to set to NA. As in terra, cell numbers start with 1 in the top left corner and then increase first from left to right and then from top to bottom. The cell numbers in the vector and the NA cells in the matrix identify the pixels to set to NA. Accordingly, the matrix is of equal dimensions as `rst`.
`touches`	Logical specifying the `touches` argument of `terra::extract` used when `update_rst` is a SpatVector, sf, or Spatial* object. It defaults to `TRUE`. If `FALSE`, the function only removes cells on the line render path or with the center point inside a polygon.
`ncores`	An integer specifying the number of CPU cores to use. It defaults to the number of cores installed on the machine. A value of 1 induces a single-threaded process.
`par_lvl`	`"points"` or `"update_rst"`, indicating the level at which to parallelize when using multiple cores and `update_rst` is a list. `"points"` parallelizes over the origin (and destination) point combinations in both the base grid not updated by `update_rst` and the grids updated with `update_rst`. The default `"update_rst"` is equivalent to `"points"` in the base grid, but parallelizes at the `update_rst` list level in the updated grid stage.
`show_progress`	Logical specifying whether the function prints messages on progress. It defaults to `FALSE`.
`bar_limit`	Integer specifying until up to how many paths or list elements of `update_rst` to display a progress bar, if `show_progress = TRUE`. It defaults to 150, in which case the function prints one `=` per computed path, if there are no more than 150 paths requested. In the grids updated with `update_rst`, the function displays one `=` per processed `update_rst` list element, not per path. In parallel applications, the progress bar can notably slow the execution as the functions only permit one thread to write to output at a time. Do not set the argument too high to avoid R crashes from text buffer overflows.
`path_type`	Data type with which C++ stores cell numbers. `"int"` (default) is the 4 byte signed integers that R also uses and is the fastest option. `"unsigned short int"` is a 2 byte unsigned integer which requires less RAM than `"int"`, but only works if there are less than 65,535 non-NA cells and is comparatively slower because it requires type conversions.
`distance_type`	Data type with which C++ stores distances. `"double"` (default) is a double precision 8 byte floating point number. It is the fastest and most precise option and also used by R as its `numeric` data type. `"float"` is a single precision 4 byte floating point number, which stores decimal values less precisely than `"double"` does and is comparatively slower because it requires type conversions. `"int"` and `"unsigned short int"` are the integer types described in the `path_type` documentation above. With `"int"` and `"unsigned short int"`, distances are rounded to integers. When employing these integers types, the distance between any cells in `rst`, not just the cells of interest, must not exceed 2,147,483,647 and 65,535 respectively. The distance difference caused by rounding double values to another type can accumulate along the shortest paths and can result in notable distance deviation in the output. The recommendations is to stick with the default `"double"` unless the machine does not have enough RAM to run the function otherwise.

Details

This function computes shortest paths and/ or distance between locations in a grid using Dijkstra's algorithm. Examples are a ship navigating around land masses or a hiker traversing mountains.

Let us explore the ship example to illustrate how shortest_paths works. To compute shortest paths between ports around the world, you start with a global SpatRaster, in which all land pixels are set to NA and all ocean pixels are set a non-NA value, e.g. 1. A SpatVector marks port locations as points on water pixels. Passing these two objects to the parameters rst and origins respectively derives the shortest paths from each port to all other ports conditional on ships solely traversing water pixels and returns the distances, i.e. lengths of these paths. If you are not interested in the distances, but in the spatial lines themselves, set output to "lines". If you want to obtain both, set it to "both".

In a different application, you do not want to compute the paths between all ports, but only the paths between the ports on the northern hemisphere and the ports on the southern hemisphere, but not the paths between ports within the same hemisphere. To assess this, you split the ports into two SpatVectors and pass them to origins and destinations respectively. What if you do not want to connect all orgins to all destinations? Set pairwise to TRUE to connect the first origin just to the first destination, the second origin to the second destination, etc.

By default, the distance or transition cost between adjacent cells of the input grid is the geographic distance between the cells' centroids. What if the boat is a sailing vessel that minimizes travel time conditional on wind speed, wind direction, and ocean currents? Construct a SpatRaster with three layers containing information on the three variables respectively. Define a transition function that combines the three layers into a travel time measure and pass the SpatRaster to rst and this function to tr_fun. tr_fun makes this package very versatile. With custom transition functions, you can take this software out of the geo-spatial context and e.g. apply it to biomedical research.

Going back to the ship routing example, consider that there are hurricanes in the Caribbean. Ships traveling from India to Australia do not care, but ships traveling from Mexico to the Netherlands have to go around the storm and must not take the shortest path through the hurricane. You have ten SpatVector polygons delineating the extent of the hurricane on ten different days. You want to know what the shortest paths are given that ships must go around the polygon on that day. Calling shortest_paths ten times with ten different SpatRasters would be very inefficient. This would assemble the graph ten times and recompute also paths unaffected by the hurricanes, such as the path between India and Australia, in each iteration. Instead, pass the SpatVector polygons to update_rst. shortest_paths then produces the shortest paths for a hurricane-free route and all ten hurricane days, only reestimating the paths that are affected by the hurricane polygon on a specific day.

Applications to Earth should always pass a SpatRaster or RasterLayer to rst. The option to use a matrix or a list of matrices is meant for applications to other planets, non-geo-spatial settings, and users who cannot install the terra package on their system.

The largest source of runtime inefficiency is the quantity of non-NA pixels in the rst grid. Limit the rst argument to the relevant area. E.g. crop the grid to the North Atlantic when computing shipping routes between Canada and France. And set regions through which the shortest path does certainly not pass to NA.

shortest_paths is optimized for computational performance. Most of its functions are written in C++ and it does not use a general purpose graph library, but comes with its custom graph-theoritical implementation tailored to gridded inputs.

The vignette provides further details on the package.

Value

If 'output = "distances"', the output is by default returned as a data table. If you want the result to be a data frame only, not a data table, set 'output_class' to '"data.frame"'. If 'output' is '"lines"' or '"both"', the the function returns a SpatVector, if 'rst' is a SpatRaster or a RasterLayer, and a list, if 'rst' is a matrix or a list of matrices. Explicitly setting 'output_class' to '"list"' returns a list in any case. 'output_class = "SpatVector"', however, returns a SpatVector only if 'rst' is a SpatRaster or a RasterLayer.

If output = "distances" or output = "both", the distances variable marks which points are connected. Unconnected point pairs have an Inf distance, if distance_type = "double" or distance_type = "float", and an NA distance, if distance_type = "int" or distance_type = "unsigned short int". If output = "lines", the connected variable marks which points are connected. Points are connected, when it is possible to travel between them via non-NA cells in rst.

References

Dijkstra, E. W. 1959. "A note on two problems in connexion with graphs." Numerische Mathematik 1 (1): 269–71.

Examples


# Generate example data
set.seed(2L)
input_grid <- terra::rast(crs = "epsg:4326", resolution = 2, vals = sample(c(1L, NA_integer_),
  16200L, TRUE, c(0.8, 0.2)))
origin_pts <- rnd_locations(5L, output_type = "SpatVector")
origin_pts$name <- sample(letters, 5)
destination_pts <- rnd_locations(5L, output_type = "SpatVector")

# Compute distances
shortest_paths(input_grid, origin_pts)
shortest_paths(input_grid, origin_pts, bidirectional = TRUE)
shortest_paths(input_grid, origin_pts, destination_pts)
shortest_paths(input_grid, origin_pts, origin_names = "name")
shortest_paths(input_grid, origin_pts, destination_pts, pairwise = TRUE)

# Compute lines
shortest_paths(input_grid, origin_pts, output = "lines")

# Compute distances and lines
shortest_paths(input_grid, origin_pts, output = "both")

# Use custom transition function
input_grid[input_grid == 1L] <- stats::runif(terra::freq(input_grid, value = 1L)$count,
  max = 1000)
shortest_paths(input_grid, origin_pts, tr_fun = function(d, v1, v2) sqrt(d^2 + abs(v2 - v1)^2),
  tr_directed = FALSE)

# Compute distances with grid updating
barrier <- terra::vect("POLYGON ((-179 1, 30 1, 30 0, -179 0, -179 1))", crs = "epsg:4326")
shortest_paths(input_grid, origin_pts, update_rst = barrier)
barriers <- list(barrier, terra::vect("POLYGON ((0 0, 0 89, 1 89, 1 0, 0 0))",
  crs = "epsg:4326"))
shortest_paths(input_grid, origin_pts, update_rst = barriers)

[Package spaths version 1.1.3 Index]