R: Merge species data sets on common columns (species)

join {analogue}

R Documentation

Merge species data sets on common columns (species)

Description

Merges any number of species matrices on their common columns to create a new data set with number of columns equal to the number of unqiue columns across all data frames. Needed for analysis of fossil data sets with respect to training set samples.

Usage

join(..., verbose = FALSE, na.replace = TRUE, split = TRUE, value = 0,
     type = c("outer", "left", "inner"))

## S3 method for class 'join'
head(x, ...)

## S3 method for class 'join'
tail(x, ...)

Arguments

`...`	for `join`, data frames containing the data sets to be merged. For the `head` and `tail` methods, additional arguments to `head` and `tail`, in particular `"n"` to control the number of rows of each `join`ed data set to display.
`verbose`	logical; if `TRUE`, the function prints out the dimensions of the data frames in `"\dots"`, as well as those of the returned, merged data frame.
`na.replace`	logical; samples where a column in one data frame that have no matching column in the other will contain missing values (`NA`). If `na.replace` is `TRUE`, these missing values are replaced with zeros. This is standard practice in ecology and palaeoecology. If you want to replace with another value, then set `na.replace` to `FALSE` and do the replacement later.
`split`	logical; should the merged data sets samples be split back into individual data frames, but now with common columns (i.e. species)?
`value`	numeric; value to replace `NA` with if `na.replace` is `TRUE`.
`type`	logical; type of join to perform. `"outer"` returns the union of the variables in data frames to be merged, such that the resulting objects have columns for all variables found across all the data frames to be merged. `"left"` returns the left outer (or the left) join, such that the merged data frames contain the set of variables found in the first supplied data frame. `"inner"` returns the inner join, such that the merged data frame contain the intersection of the variables in the supplied data frames. See Details.
`x`	an object of class `"join"`, usually the result of a call to `join`.

Details

When merging multiple data frames the set of variables in the merged data can be determined via a number of routes. join provides for two (currently) join types; the outer join and the left outer (or simply the left) join. Which type of join is performed is determined by the argument type.

The outer join returns the union of the set of variables found in the data frames to be merged. This means that the resulting data frame(s) contain columns for all the variable observed across all the data frames supplied for merging.

With the left outer join the resulting data frame(s) contain only the set of variables found in the first data frame provided.

The inner join returns the intersection of the set of variables found in the supplied data frames. The resulting data frame(s) contains the variables common to all supplied data frames.

Value

If split = TRUE, an object of class "join", a list of data frames, with as many components as the number of data frames originally merged.

Otherwise, an object of class c("join", "data.frame"), a data frame containing the merged data sets.

head.join and tail.join return a list, each component of which is the result of a call to head or tail on each data set compont of the joined object.

Author(s)

Gavin L. Simpson

Examples

## load the example data
data(swapdiat, swappH, rlgh)

## merge training and test set on columns
dat <- join(swapdiat, rlgh, verbose = TRUE)

## extract the merged data sets and convert to proportions
swapdiat <- dat[[1]] / 100
rlgh <- dat[[2]] / 100

## merge training and test set using left join
head(join(swapdiat, rlgh, verbose = TRUE, type = "left"))

## load the example data
data(ImbrieKipp, SumSST, V12.122)

## merge training and test set on columns
dat <- join(ImbrieKipp, V12.122, verbose = TRUE)

## extract the merged data sets and convert to proportions
ImbrieKipp <- dat[[1]] / 100
V12.122 <- dat[[2]] / 100

## show just the first few lines of each data set
head(dat, n = 4)

## show just the last few lines of each data set
tail(dat, n = 4)

## merge training and test set using inner join
head(join(ImbrieKipp, V12.122, verbose = TRUE, type = "inner"))

## merge training and test set using outer join and replace
## NA with -99.9
head(join(ImbrieKipp, V12.122, verbose = TRUE, value = -99.9))

[Package analogue version 0.17-6 Index]