loop {matrixset} | R Documentation |
Apply functions to each matrix of a matrixset
Description
The apply_matrix
function applies functions to each matrix of a matrixset
.
The apply_row
/apply_column
functions do the same but separately for each
row/column. The functions can be applied to all matrices or only a subset.
The dfl
/dfw
versions differ in their output format and when possible,
always return a tibble()
.
Empty matrices are simply left unevaluated. How that impacts the returned result depends on which flavor of apply_* has been used. See ‘Value’ for more details.
If .matrix_wise
is FALSE
, the function (or expression) is multivariate in
the sense that all matrices are accessible at once, as opposed to each of them
in turn.
See section "Multivariate".
Usage
apply_row(.ms, ..., .matrix = NULL, .matrix_wise = TRUE, .input_list = FALSE)
apply_row_dfl(
.ms,
...,
.matrix = NULL,
.matrix_wise = TRUE,
.input_list = FALSE,
.force_name = FALSE
)
apply_row_dfw(
.ms,
...,
.matrix = NULL,
.matrix_wise = TRUE,
.input_list = FALSE,
.force_name = FALSE
)
apply_column(
.ms,
...,
.matrix = NULL,
.matrix_wise = TRUE,
.input_list = FALSE
)
apply_column_dfl(
.ms,
...,
.matrix = NULL,
.matrix_wise = TRUE,
.input_list = FALSE,
.force_name = FALSE
)
apply_column_dfw(
.ms,
...,
.matrix = NULL,
.matrix_wise = TRUE,
.input_list = FALSE,
.force_name = FALSE
)
apply_matrix(
.ms,
...,
.matrix = NULL,
.matrix_wise = TRUE,
.input_list = FALSE
)
apply_matrix_dfl(
.ms,
...,
.matrix = NULL,
.matrix_wise = TRUE,
.input_list = FALSE,
.force_name = FALSE
)
apply_matrix_dfw(
.ms,
...,
.matrix = NULL,
.matrix_wise = TRUE,
.input_list = FALSE,
.force_name = FALSE
)
Arguments
.ms |
|
... |
expressions, separated by commas. They can be specified in one of the following way:
The expressions can be named; these names will be used to provide names to the results. |
.matrix |
matrix indices of which matrix to apply functions to. The
default, If not Numeric values are coerced to integer as by Character vectors will be matched to the matrix names of the object. Can also be logical vectors, indicating elements/slices to replace. Such
vectors are NOT recycled, which is an important difference with usual
matrix replacement. It means that the Can also be negative integers, indicating elements/slices to leave out of the replacement. |
.matrix_wise |
|
.input_list |
|
.force_name |
This can be useful in situation of grouping. As the functions are evaluated independently within each group, there could be situations where function outcomes are of length 1 for some groups and lenght 2 or more in other groups. See examples. |
Value
A list for every matrix in the matrixset object. Each list is itself a list.
For apply_matrix
, it is a list of the function values - NULL
if the matrix
was empty. Otherwise, it is a list with one element for each row/column -
these elements will be NULL
if the corresponding matrix was empty. And
finally, for apply_row
/apply_column
, each of these sub-list is a list,
the results of each function.
If each function returns a vector
of the same dimension, you can use either
the _dfl
or the _dfw
version. What they do is to return a list of
tibble
s. The dfl
version will stack the function results in a long format
while the dfw
version will put them side-by-side, in a wide format. An
empty matrix will be returned for empty input matrices.
If the functions returned vectors of more than one element, there will be a column to store the values and one for the function ID (dfl), or one column per combination of function/result (dfw)
See the grouping section to learn about the result format in the grouping context.
Pronouns
The rlang
pronouns .data
and .env
are available. Two scenarios for
which they can be useful are:
The annotation names are stored in a character variable. You can make use of the variable by using
.data[[var]]
. See the example for an illustration of this.You want to make use of a global variable that has the same name as an annotation. You can use
.env[[var]]
or.env$var
to make sure to use the proper variable.
The matrixset package defines its own pronouns: .m, .i and .j, which
are discussed in the function specification argument (...
).
It is not necessary to import any of the pronouns (or load rlang
in the
case of .data
and .env
) in a interactive session.
It is useful however when writing a package to avoid the R CMD check
notes.
As needed, you can import .data
and .env
(from rlang
) or any of .m,
.i or .j from matrixset
.
Multivariate
The default behavior is to apply a function or expression to a single
matrix and each matrices of the matrixset
object are provided sequentially
to the function/expression.
If .matrix_wise
is FALSE
, all matrices are provided at once to the
functions/expressions. They can be provided in two fashions:
separately (default behavior). Each matrix can be referred by
.m1
, ...,.mn
, wheren
is the number of matrices. Note that this is the number as determined by.matrix
.For
apply_row
(and dfl/dfw variants), use.i1
,.i2
and so on instead. What the functions/expressions have access to in this case is the first row of the first matrix, the first row of the second matrix and so on. Then, continuing the loop, the second row of each matrix will be accessible, and so onSimilarly, use
.j1
and so on for theapply_column
family.Anonymous functions will be understood as a function with multiple arguments. In the example
apply_row(ms, mean, .matrix_wise = FALSE)
, if there are 3 matrices in thems
object,mean
is understood asmean(.i1, .i2, .i3)
. Note that this would fail because of themean
function.In a list (
.list_input = TRUE
). The list will have an element per matrix. The list can be referred using the same pronouns (.m
,.i
,.j
), and the matrix, by the matrix names or position.
For the multivariate setting, empty matrices are given as is, so it is
important that provided functions can deal with such a scenario. An
alternative is to skip the empty matrices with the .matrix
argument.
Grouped matrixsets
If groups have been defined, functions will be evaluated within them. When both row and column grouping has been registered, functions are evaluated at each cross-combination of row/column groups.
The output format is different when the .ms
matrixset object is grouped.
A list for every matrix is still returned, but each of these lists now holds
a tibble.
Each tibble has a column called .vals
, where the function results are
stored. This column is a list, one element per group. The group labels are
given by the other columns of the tibble. For a given group, things are like
the ungrouped version: further sub-lists for rows/columns - if applicable -
and function values.
The dfl/dfw versions are more similar in their output format to their ungrouped version. The format is almost identical, except that additional columns are reported to identify the group labels.
See the examples.
Examples
# The firs example takes the whole matrix average, while the second takes
# every row average
(mn_mat <- apply_matrix(student_results, mean))
(mn_row <- apply_row(student_results, mean))
# More than one function can be provided. It's a good idea in this case to
# name them
(mn_col <- apply_column(student_results, avr=mean, med=median))
# the dfl/dfw versions returns nice tibbles - if the functions return values
# of the same length.
(mn_l <- apply_column_dfl(student_results, avr=mean, med=median))
(mn_w <- apply_column_dfw(student_results, avr=mean, med=median))
# There is no difference between the two versions for length-1 vector results.
# hese will differ, however
(rg_l <- apply_column_dfl(student_results, rg=range))
(rg_w <- apply_column_dfw(student_results, rg=range))
# More complex examples can be used, by using pronouns and data annotation
(vals <- apply_column(student_results, avr=mean, avr_trim=mean(.j, trim=.05),
reg=lm(.j ~ teacher)))
# You can wrap complex function results, such as for lm, into a list, to use
# the dfl/dfr version
(vals_tidy <- apply_column_dfw(student_results, avr=mean, avr_trim=mean(.j, trim=.05),
reg=list(lm(.j ~ teacher))))
# You can provide complex expressions by using formulas
(r <- apply_column(student_results,
res= ~ {
log_score <- log(.j)
p <- predict(lm(log_score ~ teacher + class))
.j - exp(p)
}))
# the .data pronoun can be useful to use names stored in variables
fn <- function(nm) {
if (!is.character(nm) && length(nm) != 1) stop("this example won't work")
apply_column(student_results, lm(.j ~ .data[[nm]]))
}
fn("teacher")
# You can use variables that are outside the scope of the matrixset object.
# You don't need to do anything special if that variable is not named as an
# annotation
pass_grade <- 0.5
(passed <- apply_row_dfw(student_results, pass = ~ .i >= pass_grade))
# use .env if shares an annotation name
previous_year_score <- 0.5
(passed <- apply_row_dfw(student_results, pass = ~ .i >= .env$previous_year_score))
# Grouping structure makes looping easy. Look at the output format
cl_prof_gr <- row_group_by(student_results, class, teacher)
(gr_summ <- apply_column(cl_prof_gr, avr=mean, med=median))
(gr_summ_tidy <- apply_column_dfw(cl_prof_gr, avr=mean, med=median))
# to showcase how we can play with format
(gr_summ_tidy_long <- apply_column_dfl(cl_prof_gr, summ = ~ c(avr=mean(.j), med=median(.j))))
# It is even possible to combine groupings
cl_prof_program_gr <- column_group_by(cl_prof_gr, program)
(mat_summ <- apply_matrix(cl_prof_program_gr, avr = mean, med = median, rg = range))
# it doesn' make much sense, but this is to showcase format
(summ_gr <- apply_matrix(cl_prof_program_gr, avr = mean, med = median, rg = range))
(summ_gr_long <- apply_column_dfl(cl_prof_program_gr,
ct = ~ c(avr = mean(.j), med = median(.j)),
rg = range))
(summ_gr_wide <- apply_column_dfw(cl_prof_program_gr,
ct = c(avr = mean(.j), med = median(.j)),
rg = range))
# This is an example where you may want to use the .force_name argument
(apply_matrix_dfl(column_group_by(student_results, program), FC = colMeans(.m)))
(apply_matrix_dfl(column_group_by(student_results, program), FC = colMeans(.m),
.force_name = TRUE))