R: Add meta info from another 'matrixset' or a 'data.frame'

join {matrixset}

R Documentation

Add meta info from another `matrixset` or a `data.frame`

Description

The operation is done through a join operation between the row meta info data.frame (join_row_info()) of .ms and y (or its row meta info data.frame if it is a matrixset object). The function join_column_info() does the equivalent operation for column meta info.

The default join operation is a left join (type == 'left'), but most of dplyr's joins are available ('left', 'inner', 'right', 'full', 'semi' or 'anti').

The matrixset paradigm of unique row/column names is enforced so if a .ms data.frame row matches multiple ones in y, this results in an error.

Usage

join_row_info(
  .ms,
  y,
  type = "left",
  by = NULL,
  adjust = FALSE,
  suffix = c(".x", ".y"),
  na_matches = c("na", "never")
)

join_column_info(
  .ms,
  y,
  type = "left",
  by = NULL,
  adjust = FALSE,
  suffix = c(".x", ".y"),
  na_matches = c("na", "never")
)

Arguments

`.ms`	A `matrixset` object
`y`	A `matrixset` object or a `data.frame`.
`type`	Joining type, one of 'left', 'inner', 'right', 'full', 'semi' or 'anti'.
`by`	The names of the variable to join by. The default, `NULL`, results in slightly different behavior depending if `y` is a `matrixset` or a `data.frame`. If a `matrixset`, the meta info tag of each object (the tag is the column that holds the row names/column names in the meta info data frame - typically ".rowname" or ".colname" unless specified otherwise at `matrixset` creation) is used for `by`. If a `data.frame`, a natural join is used. For more details, see `dplyr`'s `dplyr::join()`. Note that the cross-join is not available.
`adjust`	A logical. By default (`FALSE`), the join operation is not permitted to filter or augment the number of rows of the meta info data frame. If `TRUE`, this will be allowed. In the case where the data frame is augmented, the matrices of `.ms` will be augmented accordingly by padding with `NA`s ( except for the `NULL` matrices). Alternatively, `adjust` can be a single string, one of 'pad_x' or 'from_y'. Choosing "pad_x" is equivalent to `TRUE`. When choosing "from_y", padding is done using values from `y`, but only if `y` is a `matrixset` for `y` matrices that are named the same in `x` If padding rows, only columns common between `x` and `y` will use `y` values. The same logic is applied when padding columns. Other values are padded with `NA`.
`suffix`	Suffixes added to disambiguate trait variables. See `dplyr`'s `dplyr::join()`.
`na_matches`	How to handle missing values when matching. See `dplyr`'s `dplyr::join()`.

Value

A matrixset with updated row or column meta info, with all .ms traits and y traits. If some traits share the same names - and were not included in by - suffixes will be appended to these names.

If adjustment was allowed, the dimensions of the new matrixset may differ from the original one.

Groups

When y is a matrixset, only groups from .ms are used, if any. Group update is the same as in dplyr.

Examples

ms1 <- remove_row_annotation(student_results, class, teacher)
ms <- join_row_info(ms1, student_results)

ms <- join_row_info(ms1, student_results, by = c(".rowname", "previous_year_score"))

# This will throw an error
ms2 <- remove_row_annotation(filter_row(student_results, class %in% c("classA", "classC")),
                             class, teacher, previous_year_score)
ms <- ms <- tryCatch(join_row_info(ms2, student_results, type = "full"),
                     error = function(e) e)
is(ms, "error") # TRUE
ms$message

# Now it works.
ms <- join_row_info(ms2, student_results, type = "full", adjust = TRUE)
dim(ms2)
dim(ms)
matrix_elm(ms, 1)

[Package matrixset version 0.3.0 Index]