dt_mutate {maditr} | R Documentation |
'dplyr'-like interface for data.table.
Description
Subset of 'dplyr' verbs to work with data.table. Note that there is no
group_by
verb - use by
or keyby
argument when needed.
-
dt_mutate
adds new variables or modify existing variables. Ifdata
is data.table then it modifies in-place. -
dt_summarize
computes summary statistics. Splits the data into subsets, computes summary statistics for each, and returns the result in the "data.table" form. -
dt_summarize_all
is the same asdt_summarize
but work over all non-grouping variables. -
dt_filter
selects rows/cases where conditions are true. Rows where the condition evaluates to NA are dropped. -
dt_select
selects column/variables from the data set. Range of variables are supported, e. g. vs:carb. Characters which start with '^' or end with '$' considered as Perl-style regular expression patterns. For example, '^Petal' returns all variables started with 'Petal'. 'Width$' returns all variables which end with 'Width'. Pattern '^.' matches all variables and pattern '^.*my_str' is equivalent tocontains "my_str"
. See examples. -
dt_arrange
sorts dataset by variable(-s). Use '-' to sort in descending order. Ifdata
is data.table then it modifies in-place.
Usage
dt_mutate(data, ..., by)
dt_summarize(data, ..., by, keyby, fun = NULL)
dt_summarize_all(data, fun, by, keyby)
dt_summarise(data, ..., by, keyby, fun = NULL)
dt_summarise_all(data, fun, by, keyby)
dt_select(data, ...)
dt_filter(data, ...)
dt_arrange(data, ..., na.last = FALSE)
Arguments
data |
data.table/data.frame data.frame will be automatically converted
to data.table. |
... |
List of variables or name-value pairs of summary/modifications
functions. The name will be the name of the variable in the result. In the
|
by |
unquoted name of grouping variable of list of unquoted names of grouping variables. For details see data.table |
keyby |
Same as |
fun |
function which will be applied to all variables in
|
na.last |
logical. FALSE by default. If TRUE, missing values in the data are put last; if FALSE, they are put first. |
Value
data.table
Examples
# examples from 'dplyr'
# newly created variables are available immediately
mtcars %>%
dt_mutate(
cyl2 = cyl * 2,
cyl4 = cyl2 * 2
) %>%
head()
# you can also use dt_mutate() to remove variables and
# modify existing variables
mtcars %>%
dt_mutate(
mpg = NULL,
disp = disp * 0.0163871 # convert to litres
) %>%
head()
# window functions are useful for grouped mutates
mtcars %>%
dt_mutate(
rank = rank(-mpg, ties.method = "min"),
keyby = cyl) %>%
print()
# You can drop variables by setting them to NULL
mtcars %>% dt_mutate(cyl = NULL) %>% head()
# A summary applied without by returns a single row
mtcars %>%
dt_summarise(mean = mean(disp), n = .N)
# Usually, you'll want to group first
mtcars %>%
dt_summarise(mean = mean(disp), n = .N, by = cyl)
# Multiple 'by' - variables
mtcars %>%
dt_summarise(cyl_n = .N, by = list(cyl, vs))
# Newly created summaries immediately
# doesn't overwrite existing variables
mtcars %>%
dt_summarise(disp = mean(disp),
sd = sd(disp),
by = cyl)
# You can group by expressions:
mtcars %>%
dt_summarise_all(mean, by = list(vsam = vs + am))
# filter by condition
mtcars %>%
dt_filter(am==0)
# filter by compound condition
mtcars %>%
dt_filter(am==0, mpg>mean(mpg))
# select
mtcars %>% dt_select(vs:carb, cyl)
mtcars %>% dt_select(-am, -cyl)
# regular expression pattern
dt_select(iris, "^Petal") # variables which start from 'Petal'
dt_select(iris, "Width$") # variables which end with 'Width'
# move Species variable to the front.
# pattern "^." matches all variables
dt_select(iris, Species, "^.")
# pattern "^.*i" means "contains 'i'"
dt_select(iris, "^.*i")
dt_select(iris, 1:4) # numeric indexing - all variables except Species
# sorting
dt_arrange(mtcars, cyl, disp)
dt_arrange(mtcars, -disp)