Join {str2str}R Documentation

Join (or Merge) a List of Data-frames

Description

Join merges a list of data.frames into a single data.frame. It is a looped version of plyr::join that allows you to merge more than 2 data.frames in the same function call. It is different from plyr::join_all because it allows you to join by the row.names.

Usage

Join(
  data.list,
  by,
  type = "full",
  match = "all",
  rownamesAsColumn = FALSE,
  rtn.rownames.nm = "row_names"
)

Arguments

data.list

list of data.frames of data.

by

character vector specifying what colnames to merge data.list by. It can include "0" which specifies the rownames of data.list. If you are merging by rownames, then you can only merge by rownames and not other columns as well. This is because rownames, by definition, have all unique values. Note, it is assumed that no data.frame in data.list has a colname of "0", otherwise unexpected results are possible. If by is NULL, then all common columns will be used for merging. This is not recommended as it can result in Join merging different data.frames in data.list by different columns.

type

character vector of length 1 specifying the type of merge. Options are the following: 1. "full" = all rows from any of the data.frames in data.list, 2. "left" = only rows from the first data.frame in data.list: data.list[[1L]]), 3. "right" = only rows from the last data.frame in data.list: data.list[[length(data.list)]], 4. "inner" = only rows present in each and every of the data.frames in data.list. See join.

match

character vector of length 1 specifying whether merged elements should be repeated in each row of the return object when duplicate values exist on the by columns. If "all", the merged elements will only appear in every row of the return object with repeated values. If "first", only the merged elements will only appear in the first row of the return object with subsequent rows containing NAs. See join.

rownamesAsColumn

logical vector of length 1 specifying whether the original rownames in data.list should be a column in the return object. If TRUE, the rownames are a column and the returned data.frame has default row.names 1:nrow. If FALSE, the returned data.frame has rownames from the merging.

rtn.rownames.nm

character vector of length 1 specifying what the names of the rownames column should be in the return object. The rtn.rownames.nm argument is only used if rownamesAsColumn = TRUE.

Details

Join is a polished rendition of Reduce(f = plyr::join, x = data.list). A future version of the function might allow for the init and right arguments from Reduce.

Value

data.frame of all uniquely colnamed columns from data.list with the rows included specified by type and rownames specified by rownamesAsColumn. Similar to plyr::join, Join returns the rows in the same order as they appeared in data.list.

See Also

join_all join merge

Examples


# by column
mtcars1 <- mtcars
mtcars1$"id" <- row.names(mtcars)
mtcars2 <- data.frame("id" = mtcars1$"id", "forward" = 1:32)
mtcars3 <- data.frame("id" = mtcars1$"id", "backward" = 32:1)
mtcars_list <- list(mtcars1, mtcars2, mtcars3)
by_column <- Join(data.list = mtcars_list, by = "id")
by_column2 <- Join(data.list = mtcars_list, by = "id", rownamesAsColumn = TRUE)
by_column3 <- Join(data.list = mtcars_list, by = NULL)

# by rownames
mtcars1 <- mtcars
mtcars2 <- data.frame("forward" = 1:32, row.names = row.names(mtcars))
mtcars3 <- data.frame("backward" = 32:1, row.names = row.names(mtcars))
by_rownm <- Join(data.list = list(mtcars1, mtcars2, mtcars3), by = "0")
by_rownm2 <- Join(data.list = list(mtcars1, mtcars2, mtcars3), by = "0",
   rownamesAsColumn = TRUE)
identical(x = by_column[names(by_column) != "id"],
   y = by_rownm) # same as converting rownames to a column in the data
identical(x = by_column2[names(by_column2) != "id"],
   y = by_rownm2) # same as converting rownames to a column in the data

# inserted NAs (by columns)
mtcars1 <- mtcars[1:4]
mtcars2 <- setNames(obj = as.data.frame(scale(x = mtcars1[-1],
   center = TRUE, scale = FALSE)), nm = paste0(names(mtcars1[-1]), "_c"))
mtcars3 <- setNames(obj = as.data.frame(scale(x = mtcars1[-1],
   center = FALSE, scale = TRUE)), nm = paste0(names(mtcars1[-1]), "_s"))
tmp <- lapply(X = list(mtcars1, mtcars2, mtcars3), FUN = function(dat)
   dat[sample(x = row.names(dat), size = 10), ])
mtcars_list <- lapply(X = tmp, FUN = reshape::namerows)
by_column_NA <- Join(data.list = mtcars_list, by = "id") # join by row.names
by_column_NA2 <- Join(data.list = mtcars_list, by = "id", rownamesAsColumn = TRUE)
identical(x = row.names(by_column_NA), # rownames from any data.frame are retained
   y = Reduce(f = union, x = lapply(X = mtcars_list, FUN = row.names)))

# inserted NAs (by rownames)
mtcars1 <- mtcars[1:4]
mtcars2 <- setNames(obj = as.data.frame(scale(x = mtcars1, center = TRUE, scale = FALSE)),
   nm = paste0(names(mtcars1), "_c"))
mtcars3 <- setNames(obj = as.data.frame(scale(x = mtcars1, center = FALSE, scale = TRUE)),
   nm = paste0(names(mtcars1), "_s"))
mtcars_list <- lapply(X = list(mtcars1, mtcars2, mtcars3), FUN = function(dat)
   dat[sample(x = row.names(dat), size = 10), ])
by_rownm_NA <- Join(data.list = mtcars_list, by = "0") # join by row.names
by_rownm_NA2 <- Join(data.list = mtcars_list, by = "0", rownamesAsColumn = TRUE)
identical(x = row.names(by_rownm_NA), # rownames from any data.frame are retained
   y = Reduce(f = union, x = lapply(X = mtcars_list, FUN = row.names)))

# types of joins
Join(data.list = mtcars_list, by = "0", type = "left") # only rows included in mtcars1
Join(data.list = mtcars_list, by = "0", type = "right") # only rows included in mtcars3
Join(data.list = mtcars_list, by = "0", type = "inner") # only rows included in
   # all 3 data.frames (might be empty due to random chance from sample() call)

# errors returned
tmp <- str2str::try_expr(
   Join(data.list = list(mtcars, as.matrix(mtcars), as.matrix(mtcars)))
)
print(tmp[["error"]]) # "The elements with the following positions in
   # `data.list` are not data.frames: 2 , 3"
tmp <- str2str::try_expr(
   Join(data.list = replicate(n = 3, mtcars, simplify = FALSE), by = 0)
)
print(tmp[["error"]]) # "Assertion on 'by' failed: Must be of type
   # 'character' (or 'NULL'), not 'double'."
tmp <- str2str::try_expr(
   Join(data.list = replicate(n = 3, mtcars, simplify = FALSE), by = c("0","mpg"))
)
print(tmp[["error"]]) # "If '0' is a value in `by`, then it must be the
   # only value and `by` must be length 1."
tmp <- str2str::try_expr(
   Join(data.list = list(attitude, attitude, mtcars), by = "mpg")
)
print(tmp[["error"]]) # "The data.frames associated with the following positions in
   # `data.list` do not contain the `by` columns: 1 , 2"


[Package str2str version 1.0.0 Index]