record_format {naaccr} | R Documentation |
Define custom fields for NAACCR records
Description
Create a record_format
object, which is used to read NAACCR records.
Usage
record_format(
name,
item,
start_col = NA_integer_,
end_col = NA_integer_,
type = "character",
alignment = "left",
padding = " ",
parent = "Tumor",
cleaner = list(NULL),
unknown_finder = list(NULL),
name_literal = NA_character_,
width = NA_integer_
)
as.record_format(x, ...)
Arguments
name |
Item name appropriate for a |
item |
NAACCR item number. |
start_col |
First column of the field in a fixed-width record. |
end_col |
*Deprecated: Use the |
type |
Name of the column class. |
alignment |
Alignment of the field in fixed-width files. Either
|
padding |
Single-character strings to use for padding in fixed-width files. |
parent |
Name of the parent node to include this field under when
writing to an XML file.
Values can be |
cleaner |
(Optional) List of functions to handle special cases of
cleaning field data (e.g., convert all values to uppercase).
Values of |
unknown_finder |
(Optional) List of functions to detect when codes mean
the actual values are unknown or not applicable.
Values of |
name_literal |
(Optional) Item name in plain language. |
width |
(Optional) Item width in characters. |
x |
Object to be coerced to a |
... |
Other arguments passed to |
Details
To define registry-specific fields in addition to the standard fields, create
a record_format
object for the registry-specific fields and combine it
with one of the formats provided with the package using rbind
.
Value
An object of class "record_format"
which has the following
columns:
name
-
(
character
) XML field name. item
-
(
integer
) Field item number. start_col
-
(
integer
) First column of the field in a fixed-width text file. IfNA
, the field will not be read from or written to fixed-width files. They will included in XML files. end_col
-
(
integer
) (*Deprecated: Usewidth
instead.*) Last column of the field in a fixed-width text file. IfNA
, the field will not be read from or written to fixed-width files. This is the norm for fields only found in XML formats. type
-
(
factor
) R class for the column vector. alignment
-
(
factor
) Alignment of the field's values in a fixed-width text file. padding
-
(
character
) String used for padding field values in a fixed-width text file. parent
-
(
factor
) Parent XML node for the field. One of"NaaccrData"
,"Patient"
, or"Tumor"
. cleaner
-
(
list
offunction
objects) Function to prepare the field's values for analysis. Values ofNULL
will use the standard cleaner functions for thetype
(see below). unknown_finder
-
(
list
offunction
objects) Function to detect codes meaning the actual values are missing or unknown for the field. name_literal
-
(
character
) Field name in plain language. width
-
(
integer
) Character width of the field values. Mostly meant for reading and writing flat files.
Format Types
The levels type
can take, along with the functions used to process
them when reading a file:
address
-
(
clean_address_number_and_street
) Street number and street name parts of an address. age
-
(
clean_age
) Age in years. boolean01
-
(
naaccr_boolean
, withfalse_value = "0"
) True/false, where"0"
means false and"1"
means true. boolean12
-
(
naaccr_boolean
, withfalse_value = "1"
) True/false, where"1"
means false and"2"
means true. census_block
-
(
clean_census_block
) Census Block ID number. census_tract
-
(
clean_census_tract
) Census Tract ID number. character
-
(
clean_text
) Miscellaneous text. city
-
(
clean_address_city
) City name. count
-
(
clean_count
) Integer count. county
-
(
clean_county_fips
) County FIPS code. Date
-
(
as.Date
, withformat = "%Y%m%d"
) NAACCR-formatted date (YYYYMMDD). datetime
-
(
as.POSIXct
, withformat = "%Y%m%d%H%M%S"
) NAACCR-formatted datetime (YYYYMMDDHHMMSS) facility
-
(
clean_facility_id
) Facility ID number. icd_9
-
(
clean_icd_9_cm
) ICD-9-CM code. icd_code
-
(
clean_icd_code
) ICD-9 or ICD-10 code. integer
-
(
as.integer
) Miscellaneous whole number. numeric
-
(
as.numeric
) Miscellaneous decimal number. override
-
(
naaccr_override
) Field describing why another field's value was over-ridden. physician
-
(
clean_physician_id
) Physician ID number. postal
-
(
clean_postal
) Postal code for an address (a.k.a. ZIP code in the United States). ssn
-
(
clean_ssn
) Social Security Number. telephone
-
(
clean_telephone
) 10-digit telephone number.
Examples
my_fields <- record_format(
name = c("foo", "bar", "baz"),
item = c(2163, 1180, 1181),
start_col = c(975, 1381, NA),
width = c(1, 55, 4),
type = c("numeric", "facility", "character"),
parent = c("Patient", "Tumor", "Tumor"),
cleaner = list(NULL, NULL, trimws)
)
my_format <- rbind(naaccr_format_16, my_fields)