funique {collapse} | R Documentation |
Fast Unique Elements / Rows
Description
funique
is an efficient alternative to unique
(or unique.data.table, kit::funique, dplyr::distinct
).
fnunique
is an alternative to NROW(unique(x))
(or data.table::uniqueN, kit::uniqLen, dplyr::n_distinct
).
fduplicated
is an alternative to duplicated
(or duplicated.data.table
, kit::fduplicated
).
The collapse versions are versatile and highly competitive.
any_duplicated(x)
is faster than any(fduplicated(x))
. Note that for atomic vectors, anyDuplicated
is currently more efficient if there are duplicates at the beginning of the vector.
Usage
funique(x, ...)
## Default S3 method:
funique(x, sort = FALSE, method = "auto", ...)
## S3 method for class 'data.frame'
funique(x, cols = NULL, sort = FALSE, method = "auto", ...)
## S3 method for class 'sf'
funique(x, cols = NULL, sort = FALSE, method = "auto", ...)
# Methods for indexed data / compatibility with plm:
## S3 method for class 'pseries'
funique(x, sort = FALSE, method = "auto", drop.index.levels = "id", ...)
## S3 method for class 'pdata.frame'
funique(x, cols = NULL, sort = FALSE, method = "auto", drop.index.levels = "id", ...)
fnunique(x) # Fast NROW(unique(x)), for vectors and lists
fduplicated(x, all = FALSE) # Fast duplicated(x), for vectors and lists
any_duplicated(x) # Simple logical TRUE|FALSE duplicates check
Arguments
x |
a atomic vector or data frame / list of equal-length columns. | |||||||||||||||||||||
sort |
logical. | |||||||||||||||||||||
method |
an integer or character string specifying the method of computation:
| |||||||||||||||||||||
cols |
compute unique rows according to a subset of columns. Columns can be selected using column names, indices, a logical vector or a selector function (e.g. | |||||||||||||||||||||
... |
arguments passed to | |||||||||||||||||||||
drop.index.levels |
character. Either | |||||||||||||||||||||
all |
logical. |
Details
If all values/rows are already unique, then x
is returned. Otherwise a copy of x
with duplicate rows removed is returned. See group
for some additional computational details.
The sf method simply ignores the geometry column when determining unique values.
Methods for indexed data also subset the index accordingly.
any_duplicated
is currently simply implemented as fnunique(x) < NROW(x)
, which means it does not have facilities to terminate early, and users are advised to use anyDuplicated
with atomic vectors if chances are high that there are duplicates at the beginning of the vector. With no duplicate values or data frames, any_duplicated
is considerably faster than anyDuplicated
.
Value
funique
returns x
with duplicate elements/rows removed, fnunique
returns an integer giving the number of unique values/rows, fduplicated
gives a logical vector with TRUE
indicating duplicated elements/rows.
Note
These functions treat lists like data frames, unlike unique
which has a list method to determine uniqueness of (non-atomic/heterogeneous) elements in a list.
No matrix method is provided. Please use the alternatives provided in package kit with matrices.
See Also
fndistinct
, group
, Fast Grouping and Ordering, Collapse Overview.
Examples
funique(mtcars$cyl)
funique(gv(mtcars, c(2,8,9)))
funique(mtcars, cols = c(2,8,9))
fnunique(gv(mtcars, c(2,8,9)))
fduplicated(gv(mtcars, c(2,8,9)))
fduplicated(gv(mtcars, c(2,8,9)), all = TRUE)
any_duplicated(gv(mtcars, c(2,8,9)))
any_duplicated(mtcars)