R: Determine row or column on the fly

.find {tabshiftr}

R Documentation

Determine row or column on the fly

Description

Find the location of a variable not based on it's columns/rows, but based on a regular expression or function

Usage

.find(
  fun = NULL,
  pattern = NULL,
  col = NULL,
  row = NULL,
  invert = FALSE,
  relative = FALSE
)

Arguments

`fun`	[`character(1)`] function to identify columns or rows in the input table on the fly.
`pattern`	[`character(1)`] character string containing a regular expression to identify columns or rows in the input table on the fly.
`col`	[`integerish(1)`] optionally, in case this function should only be applied to certain columns, provides this here.
`row`	[`integerish(1)`] optionally, in case this function should only be applied to certain rows, provides this here.
`invert`	[`logical(1)`] whether or not the identified columns or rows should be inverted, i.e., all other columns or rows should be selected.
`relative`	[`logical(1)`] whether or not the values provided in `col` or `row` are relative to the cluster position(s) or whether they are absolute positions, i.e, refer to the overall table.

Details

This functions is basically a wild-card for when columns or rows are not known ad-hoc, but have to be assigned on the fly. This can be very helpful when several tables contain the same variables, but the arrangement may be slightly different.

Value

the index values where the target was found.

How does this work

The first step in using any schema is validating it via the function validateSchema. This happens by default in reorganise, but can also be done manually, for example when debugging complicated schema descriptions.

In case that function encounters a schema that wants to find columns or rows on the fly via .find, it combines all cells of columns and all cells of rows into one character string and matches the regular expression or function on those. Columns/rows that have a match are returned as the respective column/row value.

Examples

# use regular expressions to find cell positions
(input <- tabs2shift$clusters_messy)

schema <- setCluster(id = "territories",
                     left = .find(pattern = "comm*"), top = .find(pattern = "comm*")) %>%
  setIDVar(name = "territories", columns = c(1, 1, 4), rows = c(2, 9, 9)) %>%
  setIDVar(name = "year", columns = 4, rows = c(3:6), distinct = TRUE) %>%
  setIDVar(name = "commodities", columns = c(1, 1, 4)) %>%
  setObsVar(name = "harvested", columns = c(2, 2, 5)) %>%
  setObsVar(name = "production", columns = c(3, 3, 6))

schema
validateSchema(schema = schema, input = input)

# use a function to find rows
(input <- tabs2shift$messy_rows)

schema <-
  setFilter(rows = .find(fun = is.numeric, col = 1, invert = TRUE)) %>%
  setIDVar(name = "territories", columns = 1) %>%
  setIDVar(name = "year", columns = 2) %>%
  setIDVar(name = "commodities", columns = 3) %>%
  setObsVar(name = "harvested", columns = 5) %>%
  setObsVar(name = "production", columns = 6)

reorganise(schema = schema, input = input)

[Package tabshiftr version 0.4.1 Index]