apply_data_dictionary {basecamb}R Documentation

Clean column names, types and levels

Description

Use a data dictionary data.frame to apply the following tidying steps to your data.frame:

Usage

apply_data_dictionary(
  data,
  data_dictionary,
  na_action_default = "keep_NA",
  print_coerced_NA = TRUE
)

Arguments

data

data.frame to be cleaned

data_dictionary

data.frame with the following columns:

  • old_column_name : character with the old column name

  • new_data_type : character denoting the tidy data type. Supported types are:

    • character

    • integer

    • float

    • factor

    • date

  • new_column_name : tidy column name. Can be left blank to keep the old column name

  • coding (factor and date columns only):

    • factor columns: character denoting old value (key) and new value (value) in a standardised fashion:

      • key-value pairs are separated from other key-value-pairs by a comma (",")

      • key and value of the same pair are separated by an equal sign ("=")

      • quotations around individual keys and values are recommended for clarity, but do not affect functionality.

      • all values will be coerced to type character, with the exception of "NA" being parsed as type NA

      • using "default" as a key will assign the specified value to all current values that do not match any of the specified keys, excluding NA

      • using "NA" as a key will assign the specified value to all current NA values

      • example coding: "'key1' = 'val1', 'key2' = 'val2', 'default' = 'Other', 'NA' = NA"

      • if no coding is specified for a column, the coding remains unchanged

    • date columns: character denoting coding (see format argument in as.Date)

  • Optional other columns (do not affect behaviour)

na_action_default

character: Specify what to do with NA values. Defaults to 'keep_NA'. Options are:

  • 'keep_NA' NA values remain NA values

  • 'assign_default' NA values are assigned the value specified as 'default'. Requires a 'default' value to be specified Can be overwritten for individal columns by specifying a value for key 'NA'

print_coerced_NA

logical indicating whether a message specifying the location of NAs that are introduced by apply_data_dictionary() to data should be printed.

Value

clean data.frame

Author(s)

J. Peter Marquardt


[Package basecamb version 1.1.5 Index]