rename_cols {ricu} | R Documentation |
ICU class data utilities
Description
Several utility functions for working with id_tbl
and ts_tbl
objects
are available, including functions for changing column names, removing
columns, as well as aggregating or removing rows. An important thing to
note is that as id_tbl
(and consequently ts_tbl
) inherits from
data.table
, there are several functions provided by the data.table
package that are capable of modifying id_tbl
in a way that results in an
object with inconsistent state. An example for this is
data.table::setnames()
: if an ID column or the index column name is
modified without updating the attribute marking the column as such, this
leads to an invalid object. As data.table::setnames()
is not an S3
generic function, the only way to control its behavior with respect to
id_tbl
objects is masking the function. As such an approach has its own
down-sides, a separate function, rename_cols()
is provided, which is able
to handle column renaming correctly.
Usage
rename_cols(
x,
new,
old = colnames(x),
skip_absent = FALSE,
by_ref = FALSE,
...
)
rm_cols(x, cols, skip_absent = FALSE, by_ref = FALSE)
change_interval(x, new_interval, cols = time_vars(x), by_ref = FALSE)
change_dur_unit(x, new_unit, by_ref = FALSE)
rm_na(x, cols = data_vars(x), mode = c("all", "any"))
## S3 method for class 'id_tbl'
sort(
x,
decreasing = FALSE,
by = meta_vars(x),
reorder_cols = TRUE,
by_ref = FALSE,
...
)
is_sorted(x)
## S3 method for class 'id_tbl'
duplicated(x, incomparables = FALSE, by = meta_vars(x), ...)
## S3 method for class 'id_tbl'
anyDuplicated(x, incomparables = FALSE, by = meta_vars(x), ...)
## S3 method for class 'id_tbl'
unique(x, incomparables = FALSE, by = meta_vars(x), ...)
is_unique(x, ...)
## S3 method for class 'id_tbl'
aggregate(
x,
expr = NULL,
by = meta_vars(x),
vars = data_vars(x),
env = NULL,
...
)
dt_gforce(
x,
fun = c("mean", "median", "min", "max", "sum", "prod", "var", "sd", "first", "last",
"any", "all"),
by = meta_vars(x),
vars = data_vars(x),
na_rm = !fun %in% c("first", "last")
)
replace_na(x, val, type = "const", ...)
Arguments
x |
Object to query |
new , old |
Replacement names and existing column names for renaming columns |
skip_absent |
Logical flag for ignoring non-existent column names |
by_ref |
Logical flag indicating whether to perform the operation by reference |
... |
Ignored |
cols |
Column names of columns to consider |
new_interval |
Replacement interval length specified as scalar-valued
|
new_unit |
New |
mode |
Switch between |
decreasing |
Logical flag indicating the sort order |
by |
Character vector indicating which combinations of columns from
|
reorder_cols |
Logical flag indicating whether to move the |
incomparables |
Not used. Here for S3 method consistency |
expr |
Expression to apply over groups |
vars |
Column names to apply the function to |
env |
Environment to look up names in |
fun |
Function name (as string) to apply over groups |
na_rm |
Logical flag indicating how to treat |
val |
Replacement value (if |
type |
character, one of "const", "locf" or "nocb". Defaults to |
Details
Apart from a function for renaming columns while respecting attributes
marking columns a index or ID columns, several other utility functions are
provided to make handling of id_tbl
and ts_tbl
objects more convenient.
Sorting
An id_tbl
or ts_tbl
object is considered sorted when rows are in
ascending order according to columns as specified by meta_vars()
. This
means that for an id_tbl
object rows have to be ordered by id_vars()
and for a ts_tbl
object rows have to be ordered first by id_vars()
,
followed by the index_var()
. Calling the S3 generic function
base::sort()
on an object that inherits form id_tbl
using default
arguments yields an object that is considered sorted. For convenience
(mostly in printing), the column by which the table was sorted are moved to
the front (this can be disabled by passing FALSE
as reorder_cols
argument). Internally, sorting is handled by either setting a
data.table::key()
in case decreasing = FALSE
or be calling
data.table::setorder()
in case decreasing = TRUE
.
Uniqueness
On object inheriting form id_tbl
is considered unique if it is unique in
terms of the columns as specified by meta_vars()
. This means that for an
id_tbl
object, either zero or a single row is allowed per combination of
values in columns id_vars()
and consequently for ts_tbl
objects a
maximum of one row is allowed per combination of time step and ID. In order
to create a unique id_tbl
object from a non-unique id_tbl
object,
aggregate()
will combine observations that represent repeated
measurements within a group.
Aggregating
In order to turn a non-unique id_tbl
or ts_tbl
object into an object
considered unique, the S3 generic function stats::aggregate()
is
available. This applied the expression (or function specification) passed
as expr
to each combination of grouping variables. The columns to be
aggregated can be controlled using the vars
argument and the grouping
variables can be changed using the by
argument. The argument expr
is
fairly flexible: it can take an expression that will be evaluated in the
context of the data.table
in a clean environment inheriting from env
,
it can be a function, or it can be a string in which case dt_gforce()
is
called. The default value NULL
chooses a string dependent on data types,
where numeric
resolves to median
, logical
to sum
and character
to
first
.
As aggregation is used in concept loading (see load_concepts()
),
performance is important. For this reason, dt_gforce()
allows for any of
the available functions to be applied using the GForce
optimization of
data.table
(see data.table::datatable.optimize).
Value
Most of the utility functions return an object inheriting from
id_tbl
, potentially modified by reference, depending on the type of the
object passed as x
. The functions is_sorted()
, anyDuplicated()
and
is_unique()
return logical flags, while duplicated()
returns a logical
vector of the length nrow(x)
.
Examples
tbl <- id_tbl(a = rep(1:5, 4), b = rep(1:2, each = 10), c = rnorm(20),
id_vars = c("a", "b"))
is_unique(tbl)
is_sorted(tbl)
is_sorted(tbl[order(c)])
identical(aggregate(tbl, list(c = sum(c))), aggregate(tbl, "sum"))
tbl <- aggregate(tbl, "sum")
is_unique(tbl)
is_sorted(tbl)