stri_unique {stringi} | R Documentation |
Extract Unique Elements
Description
This function returns a character vector like str
,
but with duplicate elements removed.
Usage
stri_unique(str, ..., opts_collator = NULL)
Arguments
str |
a character vector |
... |
additional settings for |
opts_collator |
a named list with ICU Collator's options,
see |
Details
As usual in stringi, no attributes are copied.
Unlike unique
, this function
tests for canonical equivalence of strings (and not
whether the strings are just bytewise equal). Such an operation
is locale-dependent. Hence, stri_unique
is significantly
slower (but much better suited for natural language processing)
than its base R counterpart.
See also stri_duplicated
for indicating non-unique elements.
Value
Returns a character vector.
Author(s)
Marek Gagolewski and other contributors
References
Collation - ICU User Guide, https://unicode-org.github.io/icu/userguide/collation/
See Also
The official online manual of stringi at https://stringi.gagolewski.com/
Gagolewski M., stringi: Fast and portable character string processing in R, Journal of Statistical Software 103(2), 2022, 1-59, doi:10.18637/jss.v103.i02
Other locale_sensitive:
%s<%()
,
about_locale
,
about_search_boundaries
,
about_search_coll
,
stri_compare()
,
stri_count_boundaries()
,
stri_duplicated()
,
stri_enc_detect2()
,
stri_extract_all_boundaries()
,
stri_locate_all_boundaries()
,
stri_opts_collator()
,
stri_order()
,
stri_rank()
,
stri_sort_key()
,
stri_sort()
,
stri_split_boundaries()
,
stri_trans_tolower()
,
stri_wrap()
Examples
# normalized and non-Unicode-normalized version of the same code point:
stri_unique(c('\u0105', stri_trans_nfkd('\u0105')))
unique(c('\u0105', stri_trans_nfkd('\u0105')))
stri_unique(c('gro\u00df', 'GROSS', 'Gro\u00df', 'Gross'), strength=1)