general_clean_directory {podcleaner}R Documentation

Mutate operation(s) in Scottish post office general directory data.frame column(s)

Description

Attempts to clean the provided Scottish post office general directory data.frame.

Usage

general_clean_directory(directory, progress = TRUE, verbose = FALSE)

Arguments

directory

A Scottish post office general directory in the form of a data.frame or other object that inherits from the data.frame class such as a tibble. Columns must at least include forename, surname, occupation and addresses.

progress

Whether progress should be shown (TRUE) or not (FALSE).

verbose

Whether the function should be executed silently (FALSE) or not (TRUE).

Value

A tibble; columns include at least forename, surname, occupation, address.trade.number, address.trade.body, address.house.number and address.house.body. "house" suffix in occupation column is move to addresses, occupation information is repatriated from addresses to occupation column; addresses is split into trade and house address columns; additional records are created for each extra trade address identified. Entries are further cleaned of optical character recognition (OCR) errors and subject to a number of standardisation operations.

Examples

pages <- rep("71", 2L)
surnames <- c("ABOT", "ABRCROMBIE")
forenames <- c("Wm.", "Alex")
occupations <- c("Wine and spirit mercht - See Advertisement in Appendix.", "")
addresses = c(
  "1S20 Londn rd; ho. 13<J Queun sq",
  "Bkr; I2 Dixon Street, & 29 Auderstn Qu.; res 2G5 Argul st."
)
directory <- tibble::tibble(
  page = pages, surname = surnames, forename = forenames,
  occupation = occupations, addresses = addresses
)
general_clean_directory(directory, progress = TRUE, verbose = FALSE)


[Package podcleaner version 0.1.2 Index]