as.lama_dictionary {labelmachine}R Documentation

Coerce to a lama_dictionary class object

Description

This function allows two types of arguments:

Usage

as.lama_dictionary(.data, ...)

## S3 method for class 'list'
as.lama_dictionary(.data, ...)

## S3 method for class 'lama_dictionary'
as.lama_dictionary(.data, ...)

## Default S3 method:
as.lama_dictionary(.data = NULL, ...)

## S3 method for class 'data.frame'
as.lama_dictionary(.data, translation, col_old,
  col_new, ordering = rep("row", length(translation)), ...)

Arguments

.data

An object holding the translations. .data can be of the following data types:

  • named list: A named list object, where each list entry is a translation (a named character vector)

  • data.frame: A data.frame holding one or more column pairs, where each column pair consists of one column holding the original variable values and a second column holding the new labels, which should be assigned to the original values.

...

Various arguments, depending on the data type of .data.

translation

A character vector holding the names of all translations

col_old

This argument is only used, if the argument given in .data is a data.frame. In this case, the argument col_old must be a character vector (same length as translation) holding the names of the columns in the data.frame (in the argument .data) which hold the original variable values. These columns can be of any type: character, logical, numerical or factor.

col_new

This argument is only used, if the argument given in .data is a data.frame. In this case, the argument col_old must be a character vector (same length as translation) holding the names of the columns in the data.frame (in the argument .data) which hold the new labels, which should be assigned to the original values. These columns can be character vectors or factors with character labels.

ordering

This argument is only used, if the argument given in .data is a data.frame. In this case, the argument ordering must be a character vector (same length as translation) holding one of the following configuration strings configuring the ordering of each corresponding translation:

  • "row": The corresponding translation will be ordered exactly in the same way as the rows are ordered in the data.frame .data.

  • "old": The corresponding translation will be ordered by the given original values which are contained in the corresponding column col_old. If the column contains a factor variable, then the ordering of the factor will be used. If it just contains a plain character variable, then it will be ordered alphanumerically.

  • "new": The corresponding translation will be ordered by the given new labels which are contained in the corresponding column col_new. If the column contains a factor variable, then the ordering of the factor will be used. If it just contains a plain character variable, then it will be ordered alphanumerically.

Value

A new lama_dictionary class object holding the passed in translations.

Translations

A translation is a named character vector of non zero length. This named character vector defines which labels (of type character) should be assigned to which values (can be of type character, logical or numeric) (e.g. the translation c("0" = "urban", "1" = "rural") assigns the label "urban" to the value 0 and "rural" to the value 1, for example the variable x = c(0, 0, 1) is translated to x_new = c("urban", "urban", "rural")). Therefore, a translation (named character vector) contains the following information:

The function lama_translate() is used in order to apply a translation on a variable. The resulting vector with the assigned labels can be of the following types:

The original variable can be of the following types:

Missing values

It is also possible to handle missing values with lama_translate(). Therefore, the used translation must contain a information that tells how to handle a missing value. In order to define such a translation the missing value (NA) can be escaped with the character string "NA_". This can be useful in two situations:

lama_dictionary class objects

Each lama_dictionary class object can contain multiple translations, each with a unique name under which the translation can be found. The function lama_translate() uses a lama_dictionary class object to translate a normal vector or to translate one or more columns in a data.frame. Sometimes it may be necessary to have different translations for the same variable, in this case it is best to have multiple translations with different names (e.g. area_short = c("0" = "urb", "1" = "rur") and area = c("0" = "urban", "1" = "rural")).

Examples

  ## Example-1: Initialize a lama-dictionary from a list oject
  ##            holding the translations
  obj <- list(
    country = c(uk = "United Kingdom", fr = "France", NA_ = "other countries"),
    language = c(en = "English", fr = "French")
  )
  dict <- as.lama_dictionary(obj)
  dict
  
  ## Example-2: Initialize a lama-dictionary from a data frame
  ##            holding the label assignment rules
  df_map <- data.frame(
    c_old = c("uk", "fr", NA),
    c_new = c("United Kingdom", "France", "other countries"),
    l_old = c("en", "fr", NA),
    l_new = factor(c("English", "French", NA), levels = c("French", "English"))
  )
  dict <- as.lama_dictionary(
    df_map,
    translation = c("country", "language"),
    col_old = c("c_old", "l_old"),
    col_new = c("c_new", "l_new"),
    ordering = c("row", "new")
  )
  # 'country' is ordered as in the 'df_map'
  # 'language' is ordered differently ("French" first)
  dict

[Package labelmachine version 1.0.0 Index]