join {matrixset}R Documentation

Add meta info from another matrixset or a data.frame

Description

The operation is done through a join operation between the row meta info data.frame (join_row_info()) of .ms and y (or its row meta info data.frame if it is a matrixset object). The function join_column_info() does the equivalent operation for column meta info.

The default join operation is a left join (type == 'left'), but most of dplyr's joins are available ('left', 'inner', 'right', 'full', 'semi' or 'anti').

The matrixset paradigm of unique row/column names is enforced so if a .ms data.frame row matches multiple ones in y, this results in an error.

Usage

join_row_info(
  .ms,
  y,
  type = "left",
  by = NULL,
  adjust = FALSE,
  suffix = c(".x", ".y"),
  na_matches = c("na", "never")
)

join_column_info(
  .ms,
  y,
  type = "left",
  by = NULL,
  adjust = FALSE,
  suffix = c(".x", ".y"),
  na_matches = c("na", "never")
)

Arguments

.ms

A matrixset object

y

A matrixset object or a data.frame.

type

Joining type, one of 'left', 'inner', 'right', 'full', 'semi' or 'anti'.

by

The names of the variable to join by. The default, NULL, results in slightly different behavior depending if y is a matrixset or a data.frame. If a matrixset, the meta info tag of each object (the tag is the column that holds the row names/column names in the meta info data frame - typically ".rowname" or ".colname" unless specified otherwise at matrixset creation) is used for by. If a data.frame, a natural join is used. For more details, see dplyr's dplyr::join(). Note that the cross-join is not available.

adjust

A logical. By default (FALSE), the join operation is not permitted to filter or augment the number of rows of the meta info data frame. If TRUE, this will be allowed. In the case where the data frame is augmented, the matrices of .ms will be augmented accordingly by padding with NAs ( except for the NULL matrices).

Alternatively, adjust can be a single string, one of 'pad_x' or 'from_y'. Choosing "pad_x" is equivalent to TRUE. When choosing "from_y", padding is done using values from y, but only

  1. if y is a matrixset

  2. for y matrices that are named the same in x

  3. If padding rows, only columns common between x and y will use y values. The same logic is applied when padding columns.

Other values are padded with NA.

suffix

Suffixes added to disambiguate trait variables. See dplyr's dplyr::join().

na_matches

How to handle missing values when matching. See dplyr's dplyr::join().

Value

A matrixset with updated row or column meta info, with all .ms traits and y traits. If some traits share the same names - and were not included in by - suffixes will be appended to these names.

If adjustment was allowed, the dimensions of the new matrixset may differ from the original one.

Groups

When y is a matrixset, only groups from .ms are used, if any. Group update is the same as in dplyr.

Examples

ms1 <- remove_row_annotation(student_results, class, teacher)
ms <- join_row_info(ms1, student_results)

ms <- join_row_info(ms1, student_results, by = c(".rowname", "previous_year_score"))

# This will throw an error
ms2 <- remove_row_annotation(filter_row(student_results, class %in% c("classA", "classC")),
                             class, teacher, previous_year_score)
ms <- ms <- tryCatch(join_row_info(ms2, student_results, type = "full"),
                     error = function(e) e)
is(ms, "error") # TRUE
ms$message

# Now it works.
ms <- join_row_info(ms2, student_results, type = "full", adjust = TRUE)
dim(ms2)
dim(ms)
matrix_elm(ms, 1)


[Package matrixset version 0.3.0 Index]