xtfrm2 {stringx} | R Documentation |
Sort Strings
Description
The sort
method for objects of class character
(sort.character
) uses the locale-sensitive Unicode collation
algorithm to arrange strings in a vector with regards to a
chosen lexicographic order.
xtfrm2
and [DEPRECATED] xtfrm
generate an integer vector
that sort in the same way as its input, and hence can be used
in conjunction with order
or rank
.
Usage
xtfrm2(x, ...)
## Default S3 method:
xtfrm2(x, ...)
## S3 method for class 'character'
xtfrm2(
x,
...,
locale = NULL,
strength = 3L,
alternate_shifted = FALSE,
french = FALSE,
uppercase_first = NA,
case_level = FALSE,
normalisation = FALSE,
numeric = FALSE
)
xtfrm(x)
## Default S3 method:
xtfrm(x)
## S3 method for class 'character'
xtfrm(x)
## S3 method for class 'character'
sort(
x,
...,
decreasing = FALSE,
na.last = NA,
locale = NULL,
strength = 3L,
alternate_shifted = FALSE,
french = FALSE,
uppercase_first = NA,
case_level = FALSE,
normalisation = FALSE,
numeric = FALSE
)
Arguments
x |
character vector whose elements are to be sorted |
... |
further arguments passed to other methods |
locale |
|
strength |
|
alternate_shifted |
|
french |
|
uppercase_first |
|
case_level |
|
normalisation |
|
numeric |
|
decreasing |
single logical value; if |
na.last |
single logical value; if |
Details
What 'xtfrm' stands for the current author does not know, but would appreciate someone's enlightening him.
Value
sort.character
returns a character vector, with only
the names
attribute preserved. Note that the output vector
may be shorter than the input one.
xtfrm2.character
and xtfrm.character
return an integer vector;
most attributes are preserved.
Differences from Base R
Replacements for the default S3 methods sort
and xtfrm
for character vectors
implemented with stri_sort
and stri_rank
.
Collation in different locales is difficult and non-portable across platforms [fixed here – using services provided by ICU]
Overloading
xtfrm.character
has no effect in R, because S3 method dispatch is done internally with hard-coded support for character arguments. Thus, we needed to replace the genericxtfrm
with the one that callsUseMethod
[fixed here]-
xtfrm
does not support customisation of the linear ordering relation it is based upon [fixed by introducing...
argument to the new generic,xtfrm2
] Neither
order
,rank
, norsort.list
is a generic, therefore they should have to be rewritten from scratch to allow the inclusion of our patches; interestingly,order
even callsxtfrm
, but only for classed objects [not fixed here – see Examples for a workaround]-
xtfrm
for objects of typecharacter
does not preserve the names attribute (but does so fornumeric
) [fixed here] -
sort
seems to preserve only the names attribute which makes sense ifna.last
isNA
, because the resulting vector might be shorter [not fixed here as it would break compatibility with other sorting methods] Note that
sort
by default removes missing values whatsoever, whereasorder
hasna.last=TRUE
[not fixed here as it would break compatibility with other sorting methods]
Author(s)
See Also
The official online manual of stringx at https://stringx.gagolewski.com/
Related function(s): strcoll
Examples
x <- c("a1", "a100", "a101", "a1000", "a10", "a10", "a11", "a99", "a10", "a1")
base::sort.default(x) # lexicographic sort
sort(x, numeric=TRUE) # calls stringx:::sort.character
xtfrm2(x, numeric=TRUE) # calls stringx:::xtfrm2.character
rank(xtfrm2(x, numeric=TRUE), ties.method="average") # ranks with averaged ties
order(xtfrm2(x, numeric=TRUE)) # ordering permutation
x[order(xtfrm2(x, numeric=TRUE))] # equivalent to sort()
# order a data frame w.r.t. decreasing ids and increasing vals
d <- data.frame(vals=round(runif(length(x)), 1), ids=x)
d[order(-xtfrm2(d[["ids"]], numeric=TRUE), d[["vals"]]), ]