mt_filter_unique {move2} | R Documentation |
Filter out duplicated records from a move2
object
Description
-
mt_filter_unique
: returns amove2
from which duplicated records have been removed -
mt_unique
: returns a logical vector indicating the unique records By default columns that have a duplicated timestamps and track identifier are filtered
Usage
mt_filter_unique(x, ...)
mt_unique(
x,
criterion = c("subsets", "subsets_equal", "sample", "first", "last"),
additional_columns = NULL,
...
)
Arguments
x |
The |
... |
Arguments passed on to the |
criterion |
The criterion to decide what records to filter out. For more information see Details below. |
additional_columns |
In some cases different sensors or tracking devices might have the same combination of time and track identifier. It might, for example, be desirable to retain records from an accelerometer and gps recorded at the same time. This argument can be used to indicate additional column to include in the grouping within which the records should not be duplicated. See the examples below for its usage. |
Details
To make an informed choice of how to remove duplicates, we recommend to first try to understand why the data set has duplicates.
Several methods for filtering duplicates are available the options can be controlled through the criterion
argument:
-
"subsets"
: Only records that are a subset of other records are omitted. Some tracking devices first transmit an smaller dataset that does not contain all information, therefore some records may be the same as others only containing additionalNA
values. This strategy only omits those (duplicated) records. As a result duplicates that contain unique information are retained, the dataset is thus not guaranteed to not have unique records afterwards. -
"subsets_equal"
: The same as"subsets"
however not exact equivalence is tested usingbase::identical()
but ratherbase::all.equal()
is used. This makes it possible to allow for small numeric differences to be considered equal. This can however reduce speed considerably. -
"sample"
: In this case one record is randomly selected from the duplicated records. -
"first"
: Select the first location from a set of duplicated locations. Note that reordering the data will affect which record is selected. For movebank data no specific order is enforced, ensure that the order of the locations is like you expect (same goes for"last"
). -
"last"
: Select the last location from a set of duplicated locations.
Value
mt_unique
returns a logical vector indicating the unique records.
mt_filter_unique
returns a filtered move2
object
See Also
Other filter:
mt_filter_movebank_visible()
,
mt_filter_per_interval()
Examples
m <- mt_sim_brownian_motion(1:2)[rep(1:4, 4), ]
m$sensor_type <- as.character(gl(2, 4))
m$sensor_type_2 <- as.character(gl(2, 8))
table(mt_unique(m, "sample"))
mt_filter_unique(m[, c("time", "track", "geometry")])
mt_filter_unique(m[, c("time", "track", "geometry", "sensor_type")],
additional_columns = sensor_type
)
if (requireNamespace("dplyr")) {
mt_filter_unique(m, additional_columns = across(all_of(c("sensor_type", "sensor_type_2"))))
}
mt_filter_unique(m, "sample")
mt_filter_unique(m, "first")
m$sensor_type[1:12] <- NA
mt_filter_unique(m[, c("time", "track", "geometry", "sensor_type")])
## Sometimes it is desirable to not consider specific columns for finding
## the unique records. For example the record identifier like `event_id`
## in movebank This can be done by reducing the data.frame used to identify
## the unique records e.g.:
m$event_id <- seq_len(nrow(m))
m[mt_unique(m |> dplyr::select(-event_id, -ends_with("type_2"))), ]
## Note that because we subset the full original data.frame the
## columns are not lost
## This example is to retain the duplicate entry which contains the least
## number of columns with NA values
mv <- mt_read(mt_example())
mv <- dplyr::bind_rows(mv, mv[1:10, ])
mv[, "eobs:used-time-to-get-fix"] <- NA
mv_no_dup <- mv |>
dplyr::mutate(n_na = rowSums(is.na(pick(everything())))) |>
dplyr::arrange(n_na) |>
mt_filter_unique(criterion = "first")