R: Add or replace data frame columns

add_columns {sjmisc}

R Documentation

Add or replace data frame columns

Description

add_columns() combines two or more data frames, but unlike cbind or dplyr::bind_cols(), this function binds data as last columns of a data frame (i.e., behind columns specified in ...). This can be useful in a "pipe"-workflow, where a data frame returned by a previous function should be appended at the end of another data frame that is processed in add_colums().

replace_columns() replaces all columns in data with identically named columns in ..., and adds remaining (non-duplicated) columns from ... to data.

add_id() simply adds an ID-column to the data frame, with values from 1 to nrow(data), respectively for grouped data frames, values from 1 to group size. See 'Examples'.

Usage

add_columns(data, ..., replace = TRUE)

replace_columns(data, ..., add.unique = TRUE)

add_id(data, var = "ID")

Arguments

`data`	A data frame. For `add_columns()`, will be bound after data frames specified in `...`. For `replace_columns()`, duplicated columns in `data` will be replaced by columns in `...`.
`...`	More data frames to combine, resp. more data frames with columns that should replace columns in `data`.
`replace`	Logical, if `TRUE` (default), columns in `...` with identical names in `data` will replace the columns in `data`. The order of columns after replacing is preserved.
`add.unique`	Logical, if `TRUE` (default), remaining columns in `...` that did not replace any column in `data`, are appended as new columns to `data`.
`var`	Name of new the ID-variable.

Value

For add_columns(), a data frame, where columns of data are appended after columns of ....

For replace_columns(), a data frame where columns in data will be replaced by identically named columns in ..., and remaining columns from ... will be appended to data (if add.unique = TRUE).

For add_id(), a new column with ID numbers. This column is always the first column in the returned data frame.

Note

For add_columns(), by default, columns in data with identical names like columns in one of the data frames in ... will be dropped (i.e. variables with identical names in ... will replace existing variables in data). Use replace = FALSE to keep all columns. Identical column names will then be renamed, to ensure unique column names (which happens by default when using dplyr::bind_cols()). When replacing columns, replaced columns are not added to the end of the data frame. Rather, the original order of columns will be preserved.

Examples

data(efc)
d1 <- efc[, 1:3]
d2 <- efc[, 4:6]

if (require("dplyr") && require("sjlabelled")) {
head(bind_cols(d1, d2))
add_columns(d1, d2) %>% head()

d1 <- efc[, 1:3]
d2 <- efc[, 2:6]

add_columns(d1, d2, replace = TRUE) %>% head()
add_columns(d1, d2, replace = FALSE) %>% head()

# use case: we take the original data frame, select specific
# variables and do some transformations or recodings
# (standardization in this example) and add the new, transformed
# variables *to the end* of the original data frame
efc %>%
  select(e17age, c160age) %>%
  std() %>%
  add_columns(efc) %>%
  head()

# new variables with same name will overwrite old variables
# in "efc". order of columns is not changed.
efc %>%
  select(e16sex, e42dep) %>%
  to_factor() %>%
  add_columns(efc) %>%
  head()

# keep both old and new variables, automatically
# rename variables with identical name
efc %>%
  select(e16sex, e42dep) %>%
  to_factor() %>%
  add_columns(efc, replace = FALSE) %>%
  head()

# create sample data frames
d1 <- efc[, 1:10]
d2 <- efc[, 2:3]
d3 <- efc[, 7:8]
d4 <- efc[, 10:12]

# show original
head(d1)

library(sjlabelled)
# slightly change variables, to see effect
d2 <- as_label(d2)
d3 <- as_label(d3)

# replace duplicated columns, append remaining
replace_columns(d1, d2, d3, d4) %>% head()

# replace duplicated columns, omit remaining
replace_columns(d1, d2, d3, d4, add.unique = FALSE) %>% head()

# add ID to dataset
library(dplyr)
data(mtcars)
add_id(mtcars)

mtcars %>%
  group_by(gear) %>%
  add_id() %>%
  arrange(gear, ID) %>%
  print(n = 100)
}

[Package sjmisc version 2.8.10 Index]